Microsoft Fabric Updates Blog

Automate Fabric Data Warehouse Queries and Commands with Data Factory

There are many different ways to query your data warehouse in Fabric from a pipeline including stored procedure, script, and lookup activities.
Pipeline activities for calling data warehouse queries

Microsoft Fabric provides a cloud-scale data warehouse to store your analytical data for dashboards and historical reporting. In Fabric Data Factory, there are a number of ways to query data, retrieve data, and execute commands from your warehouse using pipeline activities that can then be easily automated for unattended background execution. I’ll go left-to-right from the picture above and talk about 3 different pipeline activities in my example above used to access data stored in my Fabric Data Warehouse using a Fabric Data Factory pipeline.

  1. Stored Procedure

The stored procedure activity is useful when there are existing stored procs in the DW that you wish to call, or if you are able to create SPs and have permissions in the Fabric Data Warehouse, then this can be a useful mechanism that can perform quite well as the database engine will be called using a cache plan if available.

The stored procedure pipeline activity can take parameter valued for a dynamic pipeline.
The stored proc activity allows for parameterization

Notice in the screenshot above that the stored procedure supports parameters so that you can create very dynamic and generalized patterns in your pipeline when using SPs. The procs can be queries or commands, but keep in mind that any output from the SP will not be captured by the pipeline. If you need to capture the output of the stored procedure, a good solution is the Script activity below. A good use case for using the stored proc activity is for database operational commands and executions that do not require interogation of the output.

2. Script Activity

With the script activity, you can enter an ad-hoc query or command, so you do not need to create a stored procedure. The syntax of the query itself can be parameterized and dynamically created, so it is very flexible. To solve the problem of retrieving the output results from a stored procedure run, you can call a stored proc from the Script activity using the query syntax “exec myStoredProc”. The first 5,000 rows of the results of your query or stored proc will be returned in the resultsSet property of the activity ouptut which you can then use later in your pipeline logic. The best use case for the Script activity in general in pipelines is for logging using Insert and Update DML scripts. When using the Script activity in this way for queries and stored procedure execution, keep in mind the limited result set size.

3. Lookup activity

The Lookup activity is another control flow activity in a pipeline that can query your data warehouse. In the example below, I am showing the output from the activity run that can be seen from the Output tab on the pipeline designer. Here, I am debugging and testing my pipeline interactively and examining the output to see if the value I am getting back is correct.

When testing your pipeline interactively inside the Fabric Data Factory UI, you can use the Output field to view the results of the activity run.
Use the Output field to view the results when designing your pipeline activities

Notice that in my Lookup example, I am retreiving an aggregated value. That is because the Lookup activity also limits the result set to 5,000 rows and is intended to be a quick way to lookup values from databases, lakehouses, and files. Using the Lookup activity as a way to retrieve a small result set to use in an If condition or For Each and using aggregated values, is a good practice in data pipelines.

4. Next Steps

After you’ve completed your pipeline with your data warehouse activities and testing is complete, you can operationalize your pipeline by creating a schedule from the Fabric platform scheduler. This will automate the execution of your pipeline and DW activities.

The Fabric platform scheduler can be used to automate your data pipelines for background execution.
Use the Fabric platform scheduler to automate your pipelines

Because you are no longer interactively building and testing your pipeline, you will use the Monitoring Hub from the Fabric UI to monitor the progress and status of your scheduled pipelines (see below).

Monitor your operationalized pipelines from Monitoring Hub in the Fabric UI

Postagens relacionadas em blogs

Automate Fabric Data Warehouse Queries and Commands with Data Factory

outubro 29, 2024 de Leo Li

We’re excited to announce several powerful updates to the Virtual Network (VNET) Data Gateway, designed to further enhance performance and improve the overall user experience. These new features allow users to better manage increasing workloads, perform complex data transformations, and simplify log management. Expanded Cluster Size from 5 to 7 One of the key improvements … Continue reading “New Features and Enhancements for Virtual Network Data Gateway”

outubro 28, 2024 de Estera Kot

We’re thrilled to announce that the Native Execution Engine is now available at no additional cost, unlocking next-level performance and efficiency for your workloads. What’s New?  The Native Execution Engine now supports Fabric Runtime 1.3, which includes Apache Spark 3.5 and Delta Lake 3.2. This upgrade enhances Microsoft Fabric’s Data Engineering and Data Science workflows, … Continue reading “Native Execution Engine available at no additional cost!”