Microsoft Fabric Updates Blog

Automate Fabric Data Warehouse Queries and Commands with Data Factory

There are many different ways to query your data warehouse in Fabric from a pipeline including stored procedure, script, and lookup activities.
Pipeline activities for calling data warehouse queries

Microsoft Fabric provides a cloud-scale data warehouse to store your analytical data for dashboards and historical reporting. In Fabric Data Factory, there are a number of ways to query data, retrieve data, and execute commands from your warehouse using pipeline activities that can then be easily automated for unattended background execution. I’ll go left-to-right from the picture above and talk about 3 different pipeline activities in my example above used to access data stored in my Fabric Data Warehouse using a Fabric Data Factory pipeline.

  1. Stored Procedure

The stored procedure activity is useful when there are existing stored procs in the DW that you wish to call, or if you are able to create SPs and have permissions in the Fabric Data Warehouse, then this can be a useful mechanism that can perform quite well as the database engine will be called using a cache plan if available.

The stored procedure pipeline activity can take parameter valued for a dynamic pipeline.
The stored proc activity allows for parameterization

Notice in the screenshot above that the stored procedure supports parameters so that you can create very dynamic and generalized patterns in your pipeline when using SPs. The procs can be queries or commands, but keep in mind that any output from the SP will not be captured by the pipeline. If you need to capture the output of the stored procedure, a good solution is the Script activity below. A good use case for using the stored proc activity is for database operational commands and executions that do not require interogation of the output.

2. Script Activity

With the script activity, you can enter an ad-hoc query or command, so you do not need to create a stored procedure. The syntax of the query itself can be parameterized and dynamically created, so it is very flexible. To solve the problem of retrieving the output results from a stored procedure run, you can call a stored proc from the Script activity using the query syntax “exec myStoredProc”. The first 5,000 rows of the results of your query or stored proc will be returned in the resultsSet property of the activity ouptut which you can then use later in your pipeline logic. The best use case for the Script activity in general in pipelines is for logging using Insert and Update DML scripts. When using the Script activity in this way for queries and stored procedure execution, keep in mind the limited result set size.

3. Lookup activity

The Lookup activity is another control flow activity in a pipeline that can query your data warehouse. In the example below, I am showing the output from the activity run that can be seen from the Output tab on the pipeline designer. Here, I am debugging and testing my pipeline interactively and examining the output to see if the value I am getting back is correct.

When testing your pipeline interactively inside the Fabric Data Factory UI, you can use the Output field to view the results of the activity run.
Use the Output field to view the results when designing your pipeline activities

Notice that in my Lookup example, I am retreiving an aggregated value. That is because the Lookup activity also limits the result set to 5,000 rows and is intended to be a quick way to lookup values from databases, lakehouses, and files. Using the Lookup activity as a way to retrieve a small result set to use in an If condition or For Each and using aggregated values, is a good practice in data pipelines.

4. Next Steps

After you’ve completed your pipeline with your data warehouse activities and testing is complete, you can operationalize your pipeline by creating a schedule from the Fabric platform scheduler. This will automate the execution of your pipeline and DW activities.

The Fabric platform scheduler can be used to automate your data pipelines for background execution.
Use the Fabric platform scheduler to automate your pipelines

Because you are no longer interactively building and testing your pipeline, you will use the Monitoring Hub from the Fabric UI to monitor the progress and status of your scheduled pipelines (see below).

Monitor your operationalized pipelines from Monitoring Hub in the Fabric UI

Povezane objave na blogu

Automate Fabric Data Warehouse Queries and Commands with Data Factory

listopada 30, 2024 autor Patrick LeBlanc

Welcome to the October 2024 Update! Here are a few, select highlights of the many we have for Fabric this month. API for GraphQL support for Service Principal Names (SPNs). Introducing a powerful new feature in Lakehouses: Sorting, Filtering, and Searching capabilities. An addition to KQL Queryset that will revolutionize the way you interact with … Continue reading “Fabric October 2024 Monthly Update”

listopada 29, 2024 autor Leo Li

We’re excited to announce several powerful updates to the Virtual Network (VNET) Data Gateway, designed to further enhance performance and improve the overall user experience. These new features allow users to better manage increasing workloads, perform complex data transformations, and simplify log management. Expanded Cluster Size from 5 to 7 One of the key improvements … Continue reading “New Features and Enhancements for Virtual Network Data Gateway”