lab | ||||
---|---|---|---|---|
|
In this module, students will learn how to process streaming data with Azure Stream Analytics. The student will ingest vehicle telemetry data into Event Hubs, then process that data in real time, using various windowing functions in Azure Stream Analytics. They will output the data to Azure Synapse Analytics. Finally, the student will learn how to scale the Stream Analytics job to increase throughput.
In this module, the student will be able to:
- Use Stream Analytics to process real-time data from Event Hubs
- Use Stream Analytics windowing functions to build aggregates and output to Synapse Analytics
- Scale the Azure Stream Analytics job to increase throughput through partitioning
- Repartition the stream input to optimize parallelization
- Module 10 - Real-time stream processing with Stream Analytics
As more and more data is generated from a variety of connected devices and sensors, transforming this data into actionable insights and predictions in near real-time is now an operational necessity. Azure Stream Analytics seamlessly integrates with your real-time application architecture to enable powerful, real-time analytics on your data no matter what the volume.
Azure Stream Analytics enables you to develop massively parallel Complex Event Processing (CEP) pipelines with simplicity. It allows you to author powerful, real-time analytics solutions using very simple, declarative SQL like language with embedded support for temporal logic. Extensive array of out-of-the-box connectors, advanced debugging and job monitoring capabilities help keep costs down by significantly lowering the developer skills required. Additionally, Azure Stream Analytics is highly extensible through support for custom code with JavaScript User Defined functions further extending the streaming logic written in SQL.
Getting started in seconds is easy with Azure Stream Analytics as there is no infrastructure to worry about, and no servers, virtual machines, or clusters to manage. You can instantly scale-out the processing power from one to hundreds of streaming units for any job. You only pay for the processing used per job.
Guaranteed event delivery and an enterprise grade SLA, provide the three 9's of availability, making sure that Azure Stream Analytics is suitable for mission critical workloads. Automated checkpoints enable fault tolerant operation with fast restarts with no data loss.
Azure Stream Analytics can be used to allow you to quickly build real-time dashboards with Power BI for a live command and control view. Real-time dashboards help transform live data into actionable and insightful visuals, and help you focus on what matters to you the most.
Azure Event Hubs is a big data pipeline that can ingest millions of events per second. It facilitates the capture, retention, and replay of telemetry and event stream data, using standard protocols such as HTTPS, AMQP, AMQP over websockets, and Kafka. The data can come from many concurrent sources and up to 20 consumer groups can allow applications to read entire event hub independently at their own pace.
Contoso Auto is collecting vehicle telemetry and wants to use Event Hubs to rapidly ingest and store the data in its raw form, then do some processing in near real-time. In the end, they want to create a dashboard that automatically updates with new data as it flows in after being processed. What they would like to see on the dashboard are various visualizations of detected anomalies, like engines overheating, abnormal oil pressure, and aggressive driving, using components such as a map to show anomalies related to cities, as well as various charts and graphs depicting this information in a clear way.
In this experience, you will use Azure Event Hubs to ingest streaming vehicle telemetry data as the entry point to a near real-time analytics pipeline built on Event Hubs, Azure Stream Analytics, and Azure Synapse Analytics. Azure Stream Analytics extracts the vehicle sensor data from Event Hubs, performs aggregations over windows of time, then sends the aggregated data to Azure Synapse Analytics for data analysis. A vehicle telemetry data generator will be used to send vehicle telemetry data to Event Hubs.
- Azure subscription
- You have successfully completed Module 0 to create your lab environment.
This lab uses the dedicated SQL pool. As a first step, make sure it is not paused. If so, start it by following these instructions:
-
Open Synapse Studio (https://web.azuresynapse.net/).
-
Select the Manage hub.
-
Select SQL pools in the left-hand menu (1). If the dedicated SQL pool is paused, hover over the name of the pool and select Resume (2).
-
When prompted, select Resume. It will take a minute or two to resume the pool.
Continue to the next exercise while the dedicated SQL pool resumes.
Azure Event Hubs is a Big Data streaming platform and event ingestion service, capable of receiving and processing millions of events per second. We are using it to temporarily store vehicle telemetry data that is processed and ready to be sent to the real-time dashboard. As data flows into Event Hubs, Azure Stream Analytics will query the data, applying aggregates and tagging anomalies, then send it to Azure Synapse Analytics and Power BI.
In this task, you will create and configure a new event hub within the provided Event Hubs namespace. This will be used to capture vehicle telemetry after it has been processed and enriched by the Azure function you will create later on.
-
Navigate to the Azure portal.
-
Select Resource groups from the left-hand menu. Then select the resource group named data-engineering-synapse.
-
Select the Event Hubs Namespace (
eventhubYOUR_UNIQUE_ID
) from the list of resources in your resource group. -
Within the Event Hubs Namespace blade, select Event Hubs within the left-hand menu.
-
Select the telemetry event hub from the list.
-
Select Shared access policies from the left-hand menu.
-
Select + Add in the top toolbar to create a new shared access policy.
-
In the Add SAS Policy blade, configure the following:
-
Select Create on the bottom of the form when you are finished entering the values.
-
Select + Add in the top toolbar to create a new shared access policy.
-
In the Add SAS Policy blade, configure the following:
-
Select Create on the bottom of the form when you are finished entering the values.
-
Select your Write policy from the list. Copy the Connection string - primary key value by selecting the Copy button to the right of the field. SAVE THIS VALUE in Notepad or similar text editor for later.
Azure Synapse is an end-to-end analytics platform which combines SQL data warehousing, big data analytics, and data integration into a single integrated environment. It empowers users to gain quick access and insights across all of their data, enabling a whole new level of performance and scale that is simply unmatched in the industry.
In this task, you will create a table in a Synapse dedicated SQL pool to store aggregate vehicle data provided by a Stream Analytics job that processes vehicle telemetry ingested by Event Hubs.
-
Navigate to the Azure portal.
-
Select Resource groups from the left-hand menu. Then select the resource group named data-engineering-synapse.
-
Select the Synapse workspace (
asaworkspaceYOUR_UNIQUE_ID
) from the list of resources in your resource group. -
Select Open within the Open Synapse Studio box inside the Overview pane.
-
Within Synapse Studio, select Data in the left-hand menu to navigate to the Data hub.
-
Select the Workspace tab (1), expand Databases and right-click SQLPool01 (2). Select New SQL script (3), then select Empty script (4).
-
Make sure the script is connected to
SQLPool01
, then replace the script with the following and select Run to create a new table:CREATE TABLE dbo.VehicleAverages ( [AverageEngineTemperature] [float] NOT NULL, [AverageSpeed] [float] NOT NULL, [Snapshot] [datetime] NOT NULL ) WITH ( DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX ) GO
Azure Stream Analytics is an event-processing engine that allows you to examine high volumes of data streaming from devices. Incoming data can be from devices, sensors, web sites, social media feeds, applications, and more. It also supports extracting information from data streams, identifying patterns, and relationships. You can then use these patterns to trigger other actions downstream, such as create alerts, feed information to a reporting tool, or store it for later use.
In this task, you will configure Stream Analytics to use the event hub you created as a source, query and analyze that data, then send it to Power BI for reporting and aggregated data to Azure Synapse Analytics.
-
Navigate to the Azure portal.
-
Select Resource groups from the left-hand menu. Then select the resource group named data-engineering-synapse.
-
Select the Stream Analytics job (
asaYOUR_UNIQUE_ID
) from the list of resources in your resource group. -
Within the Stream Analytics job, select Storage account settings in the left-hand menu, then select Add storage account. Since we will use Synapse Analytics as one of the outputs, we need to first configure the job storage account.
-
In the Storage account settings form, configure the following:
-
Select Save, then Yes when prompted to save the storage account settings.
-
Within the Stream Analytics job, select Inputs within the left-hand menu.
-
Select + Add stream input in the top toolbar, then select Event Hub to create a new Event Hub input.
-
In the New Input blade, configure the following:
-
Name: Enter "eventhub".
-
Select Event Hub from your subscriptions: Selected.
-
Subscription: Make sure the subscription you are using for this lab is selected.
-
Event Hub namespace: Select the Event Hub namespace you are using for this lab.
-
Event Hub name: Select Use existing, then select telemetry, which you created earlier.
-
Event Hub consumer group: Select Use existing, then select $Default.
-
Authentication mode: Select Connection string.
-
Event Hub policy name: Select Use existing, then select Read.
-
Leave all other values at their defaults.
-
-
Select Save on the bottom of the form when you are finished entering the values.
-
Within the Stream Analytics job blade, select Outputs within the left-hand menu.
-
Select + Add in the top toolbar, then select Azure Synapse Analytics to create a new Synapse Analytics output.
-
In the New Output blade, configure the following:
-
Output alias: Enter "synapse".
-
Select Azure Synapse Analytics from your subscriptions: Selected.
-
Subscription: Select the subscription you are using for this lab.
-
Database: Select "SQLPool01". Make sure your correct Synapse workspace name appears under "Server name".
-
Table: Enter
dbo.VehicleAverages
-
Authentication mode: Select "Connection string".
-
Username: Enter
asa.sql.admin
(or the dedicated SQL pool username provided to you for this lab) -
Password: Enter the SQL admin password value you entered when deploying the lab environment, or which was provided to you as part of your hosted lab environment. Note: This password is most likely not the same as the password you used to sign in to the Azure portal.
Note: If you are unsure about your SQL admin username, navigate to the Synapse workspace in the Azure resource group. The SQL admin username is shown in the Overview pane.
-
-
Select Save on the bottom of the form when you are finished entering the values.
-
Within the Stream Analytics job blade, select Query within the left-hand menu.
-
Clear the edit Query window and paste the following in its place:
WITH VehicleAverages AS ( select AVG(engineTemperature) averageEngineTemperature, AVG(speed) averageSpeed, System.TimeStamp() as snapshot FROM eventhub TIMESTAMP BY [timestamp] GROUP BY TumblingWindow(Duration(minute, 2)) ) -- INSERT INTO SYNAPSE ANALYTICS SELECT * INTO synapse FROM VehicleAverages
The query averages the engine temperature and speed over a two second duration. The query aggregates the average engine temperature and speed of all vehicles over the past two minutes, using
TumblingWindow(Duration(minute, 2))
, and outputs these fields to thesynapse
output. -
Select Save query in the top toolbar when you are finished updating the query.
-
Within the Stream Analytics job blade, select Overview within the left-hand menu. On top of the Overview blade, select Start.
-
In the Start job blade that appears, select Now for the job output start time, then select Start. This will start the Stream Analytics job so it will be ready to start processing and sending your events to Power BI later on.
The data generator console application creates and sends simulated vehicle sensor telemetry for an array of vehicles (denoted by VIN (vehicle identification number)) directly to Event Hubs. For this to happen, you first need to configure it with the Event Hub connection string.
In this task, you will configure and run the data generator. The data generator saves simulated vehicle telemetry data to Event Hubs, prompting your Stream Analytics job to aggregate and analyze the enriched data and send it to Power BI and Synapse Analytics. The final step will be to create the Power BI report in the task that follows.
-
On your lab VM or computer, download the TransactionGeneratorExecutable.zip file.
-
Extract the zip file to your machine, making note of the extraction location.
-
Open the folder containing the extracted files, then open either the
linux-x64
,osx-x64
, orwin-x64
subfolder, based on your environment. -
Within the appropriate subfolder, open the appsettings.json file. Paste your
telemetry
Event Hub connection string value next toEVENT_HUB_CONNECTION_STRING
. Make sure you have quotes ("") around the value, as shown. Save the file.Note: Make sure that the connection string ends with
EntityPath=telemetry
(eg.Endpoint=sb://YOUR_EVENTHUB_NAMESPACE.servicebus.windows.net/;SharedAccessKeyName=Write;SharedAccessKey=REDACTED/S/U=;EntityPath=telemetry
). If not, then you did not copy the connection string from theWrite
policy of your event hub.SECONDS_TO_LEAD
is the amount of time to wait before sending vehicle telemetry data. Default value is0
.SECONDS_TO_RUN
is the maximum amount of time to allow the generator to run before stopping transmission of data. The default value is1800
. Data will also stop transmitting when you enter Ctrl+C while the generator is running, or if you close the window. -
Execute the data generator using one of the following methods, based on your platform:
-
Windows:
- Simply execute TransactionGenerator.exe inside the
win-x64
folder.
- Simply execute TransactionGenerator.exe inside the
-
Linux:
- Navigate to the
linux-x64
folder. - Run
chmod 777 DataGenerator
to provide access to the binary. - Run
./DataGenerator
.
- Navigate to the
-
MacOS:
- Open a new terminal.
- Navigate to the
osx-x64
directory. - Run
./DataGenerator
.
-
-
If you are using Windows and receive a dialog after trying to execute the data generator, select More info, then Run anyway.
-
A new console window will open, and you should see it start to send data after a few seconds. Once you see that it is sending data to Event Hubs, minimize the window and keep it running in the background. Allow this to run for a mimimum of three minutes before moving onto the next step.
After every 500 records are requested to be sent, you will see output statistics.
As you recall, when you created the query in Stream Analytics, you aggregated the engine temperature and vehicle speed data over two-minute intervals and saved the data to Synapse Analytics. This capability demonstrates the Stream Analytics query's ability to write data to multiple outputs at varying intervals. Writing to a Synapse Analytics dedicated SQL pool enables us to retain the historic and current aggregate data as part of the data warehouse without requiring an ETL/ELT process.
In this task, you will view the anomaly data within Synapse Analytics.
-
If you have not yet done so, stop the TransactionGenerator.
-
Navigate to the Azure portal.
-
Select Resource groups from the left-hand menu. Then select the resource group named data-engineering-synapse.
-
Select the Synapse workspace (
asaworkspaceYOUR_UNIQUE_ID
) from the list of resources in your resource group. -
Select Open within the Open Synapse Studio box inside the Overview pane.
-
Within Synapse Studio, select Data in the left-hand menu to navigate to the Data hub.
-
Select the Workspace tab (1), expand the
SQLPool01
database, expandTables
, then right-click on the dbo.VehicleAverages table (2). If you do not see the table listed, refresh the tables list. Select New SQL script (3), then Select TOP 100 rows (4). -
View the query results and observe the aggregate data stored in
AverageEngineTemperature
andAverageSpeed
. TheSnapshot
value changes in two-minute intervals between these records. -
Select the Chart view in the Results output, then set the chart type to Area. This visualization shows the average engine temperature correlated with the average speed over time. Feel free to experiment with the chart settings. The longer the data generator runs the more data points are generated. The following visualization is for an example of a session that ran over 10 mins, and may not represent what you see on the screen.
Complete these steps to stop the data generator and free up resources you no longer need.
- Go back to the console/terminal window in which your data generator is running. Close the window to stop the generator.
-
Navigate to the Stream Analytics job in the Azure portal.
-
In the Overview pane, select Stop, then select Yes when prompted.
-
Open Synapse Studio (https://web.azuresynapse.net/).
-
Select the Manage hub.
-
Select SQL pools in the left-hand menu (1). Hover over the name of the dedicated SQL pool and select Pause (2).
-
When prompted, select Pause.