Synapse Real-Time Analytics: Discovering the best ways to get data into a KQL database
We recently launched Microsoft Fabric (preview) – an all-in-one analytics solution for enterprises that covers everything from data movement to data science, real-time analytics, and business intelligence. Microsoft Fabric brings together new and existing components from Power BI, Azure Synapse, and Azure Data Explorer into a single integrated environment. The platform is built on a foundation of Software as a Service (SaaS), which takes simplicity and integration to a whole new level.
One of the key components of Fabric is Synapse Real-Time Analytics, a fully managed big data analytics platform optimized for streaming and time-series data. It provides all the amazing query capabilities, performance and scale that customers are used to with Azure Synapse Data Explorer in a SaaSified experience.
The main items available in Real-Time Analytics include:
One of the key features of Real-Time Analytics is its ability to ingest data from a wide range of sources, making it easy to get data into the service and start analyzing it.
In this blog, we’ll focus on the different options for bringing data into a KQL database. We call it Get data experiences –
If you’re looking for a one-off data ingestion method or have a historical dataset which you’d like to ingest into your KQL database, you can use one of the following options depending on where your data resides.
1. Local File
This is the simplest way to ingest data into a KQL Database. You can upload a file from your local machine using the Get Data UX. This option is useful when you have a small amount of data that you want to analyze quickly.
OneLake is a single, unified, logical data lake for the whole organization. It’s the OneDrive for data. OneLake comes automatically with every Microsoft Fabric tenant and is designed to be the single place for all your analytics data.
You can seamlessly ingest data from OneLake into your KQL Database with just a few clicks. All you need is the URL of your data files. Learn more about Get data with OneLake.
3. Azure storage
If your data is stored in Azure Blob Storage or Azure Data Lake Storage Gen2, you can use built-in support for Azure storage to ingest it. This option is useful when you have a large amount of data that you want to analyze, and you want to take advantage of Blob storage’s scalability and low cost.
You can get data from two types of blobs:
- Azure blob: used to bring in individual files.
- Blob container: used to bring in multiple files available in a container. This is the preferred option for ingesting large amounts of data or historical backfill.
Learn more about Get data from a blob container.
4. Amazon S3
If your data is stored in Amazon S3, you can use Real-Time Analytics’ built-in S3 support to ingest it. This option is useful when you have data stored in both Azure and AWS, and you want to centralize your data in your KQL Database for analysis.
Fabric provides a seamless integration between all the available experiences. The Synapse Data Engineering experience provides an optimized Spark-based notebook environment. You can use Spark to load data into a KQL Database from a variety of sources, including Blob storage, OneLake, Amazon S3, Google Cloud storage, and more. This option is useful when you want to transform your data before ingesting it in Real-Time Analytics.
You can find samples and how-to instructions with the Tutorial: Use a notebook with Apache Spark to query a KQL database.
6. Sample Datasets
Lastly, if you want to explore the capabilities of Real-time analytics and start quickly, you might want to consider using one of the many sample datasets we provide. You can import the relevant dataset within seconds and start exploring.
If you’re looking to set up a continuous data ingestion pipeline, which means ingesting data as and when it’s available in real-time, you can choose one of the following options.
1. Event Hubs
You can ingest data directly from Azure Event Hubs, a fully managed, real-time data ingestion service. This option is useful when you want to analyze data in real-time, such as streaming telemetry data from IoT devices or log data from applications. Learn more about Get data from Azure Event Hubs.
2. Event Streams
The Event Streams feature in Microsoft Fabric is a centralized place in the Fabric platform to capture, transform, and route real-time events to various destinations with a no-code experience.
The Event Streams feature provides a variety of connectors to fetch the event data from diverse sources, such as Sample data, Azure Event Hubs, and Custom App, and allows you to send the data to a KQL Database destination.
Data Factory in Microsoft Fabric provides cloud-scale data movement and data transformation services that allow you to solve the most complex data factory and ETL scenarios. Data Factory pipelines provide a copy tool, which is a highly scalable Data Integration solution, that allows you to connect to different data sources and load the data into a KQL Database. Pipeline activities can be orchestrated in multiple ways, such as scheduling or triggering based on the event.
Dataflows provide a low-code interface for data ingestion from hundreds of data sources and transforms your data using 300+ data transformations. Dataflows are built using the familiar Power Query experience that’s available today across several Microsoft products and services such as Excel, Power BI, Power Platform, and more.
If you’re used to working with Power BI Dataflows, you can use the same familiar experience to push data into a KQL Database.
As you can see, Real-Time Analytics provides a variety of options for ingesting data from a range of sources, making it easy to get started with data analysis. Whether you’re analyzing streaming data in real-time or running batch analytics on large datasets, we have you covered.
Over the next few weeks, we’ll dive deeper into some of these options.
Checkout a demo that showcases some of the ingestion scenarios.