DocumentDB as a data sink for Azure Stream Analytics

The Internet of Things (IoT) has finally arrived (some time ago). An important part is to analyze sensor data in motion. For that a streaming system is necessary. You could use Apache Storm or Apache Spark Streaming to do that, but if you want to run it as a service on Azure without going through the pain to set up a cluster Azure Stream Analytics is a good choice.

In this post I’m going through the basics of Azure Stream Analytics and DocumentDB, which is used as a destination for our data after the streaming is done, and how you can create a simple Stream Analytics job using Blob storage as the input and DocumentDB as the output.

What is Azure Stream Analytics?

Azure Stream Analytics is a real-time processing service running on Azure. One of the points where it stands out is that it uses SQL as the language for the processing.

What is Azure DocumentDB?

Azure DocumentDB is a NoSQL database service on Azure which stores JSON data in a document-oriented way. If you want to know more about it, I wrote about Azure DocumentDB some time ago in Cool NoSQL on Azure with DocumentDB.

What is a data sink?

A data sink is a storage receving different kind of data. DocumentDB is perfect for that purpose, because it does not need the data to be in a relational fashion. Another advantage is that you can use SQL, which is not supported by some other storage systems on Azure.

Create the Azure Stream Analytics job

Let’s start by setting up the different services we need:

  • Azure Stream Analytics
  • Azure DocumentDB
  • Azure Blob Storage

Go to the Azure portal and click on NEW.

docdb-asa-01

Click on Data + Analytics and look out for the Stream Analytics job. Give the job a name, create a new resource group and choose a location.

docdb-asa-02

Next go to Data + Storage and create a new Azure DocumentDB account. We need an ID for the account, choose our existing resource group and put it in the same location as our Stream Analytics job.

docdb-asa-03

The last thing we need is a storage account as the input for our Azure Stream Analytics job.

docdb-asa-04

Do the same we have done before.

docdb-asa-05

Now everything is in place and we are ready to create our inputs and outputs. Let’s start with the input.

docdb-asa-06

We need to specify an alias we can use in our query, the type of source we want to use (in our case Blob storage) and the account credentials.

docdb-asa-07

The input also needs a format type. For our purpose JSON just works fine. That’s it. Let’s go for the output. For that we choose our DocumentDB account. To configure everything correctly we need a database, a collection and two fields from our source data to define the partition key and the document ID.

docdb-asa-08

The next thing we need is some data in our blob storage account. Create a data container and upload some files. These files are a variation of the following:

{
    "id": "0",
    "type": "train",
    "value": "0"
}

We have two types of sensor data: a train and a car. The value of the sensor can be everything. To keep things simple we stay with the values ‘0’ and ‘1’. Upload the data with your tool of choice (a stand-alone tool or Visual Studio).

docdb-asa-09

To test our query appropriately it is a good choice to sample the input data. Unfortunately you have to go back to the old party to do this. Go to your Azure Stream Analytics job and choose INPUTS. Click on the SAMPLE DATA button at the bottom.

docdb-asa-10

This will take a few seconds.

docdb-asa-11

After the sampling is done download the sample with Click here.

docdb-asa-12

Next we will prepare our query. Let’s start simple:

SELECT *
INTO streamoutput
FROM blob

To see the results click on TEST.

docdb-asa-13

The result should look similar to the following.

docdb-asa-14

Let’s play around with the query and filter our intput data.

SELECT *
INTO streamoutput
FROM blob
WHERE value = '0'

Now our result set should look like the following after clicking RERUN (we do that to prevent uploading the sample data again).

docdb-asa-15

We still have some noise in our data. Let’s change the query.

SELECT id, type, value
INTO streamoutput
FROM blob
WHERE value = '0'

Our new result set:

docdb-asa-16

The last thing we need is to start our Stream Analytics job. After the input is processed we should see some new documents in the Document Explorer of our DocumentDB account.

docdb-asa-17

Just to confirm our documents are created the right way we click on one of the documents.

docdb-asa-18

That’s it!

Summary

In this post we have gone through the steps needed to set up a Stream Analytics job with Blob storage as the input and DocumentDB as our data sink.

If you want to know a bit more DocumentDB check out my Introduction to Azure DocumentDB course from Opsgility.

Have fun!

Jan (@Horizon_Net)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s