Microsoft Fabric Updates Blog

Automate Fabric Data Warehouse Queries and Commands with Data Factory

There are many different ways to query your data warehouse in Fabric from a pipeline including stored procedure, script, and lookup activities.
Pipeline activities for calling data warehouse queries

Microsoft Fabric provides a cloud-scale data warehouse to store your analytical data for dashboards and historical reporting. In Fabric Data Factory, there are a number of ways to query data, retrieve data, and execute commands from your warehouse using pipeline activities that can then be easily automated for unattended background execution. I’ll go left-to-right from the picture above and talk about 3 different pipeline activities in my example above used to access data stored in my Fabric Data Warehouse using a Fabric Data Factory pipeline.

  1. Stored Procedure

The stored procedure activity is useful when there are existing stored procs in the DW that you wish to call, or if you are able to create SPs and have permissions in the Fabric Data Warehouse, then this can be a useful mechanism that can perform quite well as the database engine will be called using a cache plan if available.

The stored procedure pipeline activity can take parameter valued for a dynamic pipeline.
The stored proc activity allows for parameterization

Notice in the screenshot above that the stored procedure supports parameters so that you can create very dynamic and generalized patterns in your pipeline when using SPs. The procs can be queries or commands, but keep in mind that any output from the SP will not be captured by the pipeline. If you need to capture the output of the stored procedure, a good solution is the Script activity below. A good use case for using the stored proc activity is for database operational commands and executions that do not require interogation of the output.

2. Script Activity

With the script activity, you can enter an ad-hoc query or command, so you do not need to create a stored procedure. The syntax of the query itself can be parameterized and dynamically created, so it is very flexible. To solve the problem of retrieving the output results from a stored procedure run, you can call a stored proc from the Script activity using the query syntax “exec myStoredProc”. The first 5,000 rows of the results of your query or stored proc will be returned in the resultsSet property of the activity ouptut which you can then use later in your pipeline logic. The best use case for the Script activity in general in pipelines is for logging using Insert and Update DML scripts. When using the Script activity in this way for queries and stored procedure execution, keep in mind the limited result set size.

3. Lookup activity

The Lookup activity is another control flow activity in a pipeline that can query your data warehouse. In the example below, I am showing the output from the activity run that can be seen from the Output tab on the pipeline designer. Here, I am debugging and testing my pipeline interactively and examining the output to see if the value I am getting back is correct.

When testing your pipeline interactively inside the Fabric Data Factory UI, you can use the Output field to view the results of the activity run.
Use the Output field to view the results when designing your pipeline activities

Notice that in my Lookup example, I am retreiving an aggregated value. That is because the Lookup activity also limits the result set to 5,000 rows and is intended to be a quick way to lookup values from databases, lakehouses, and files. Using the Lookup activity as a way to retrieve a small result set to use in an If condition or For Each and using aggregated values, is a good practice in data pipelines.

4. Next Steps

After you’ve completed your pipeline with your data warehouse activities and testing is complete, you can operationalize your pipeline by creating a schedule from the Fabric platform scheduler. This will automate the execution of your pipeline and DW activities.

The Fabric platform scheduler can be used to automate your data pipelines for background execution.
Use the Fabric platform scheduler to automate your pipelines

Because you are no longer interactively building and testing your pipeline, you will use the Monitoring Hub from the Fabric UI to monitor the progress and status of your scheduled pipelines (see below).

Monitor your operationalized pipelines from Monitoring Hub in the Fabric UI

Related blog posts

Automate Fabric Data Warehouse Queries and Commands with Data Factory

July 19, 2024 by Penny Zhou

We are thrilled to share that you can easily browse and connect to your Azure resources automatically with the modern get data experience of Data Pipelines. With the Browse Azure experience, you don’t need to manually fill in the endpoint, URL or server and database name when connecting to Azure resources. Instead, the connection will … Continue reading “Quickly Connect to your Azure Resources in Fabric with the Data Pipeline Modern Get Data Experience”

July 17, 2024 by Ye Xu

Fast Copy boosts the speed and cost-effectiveness of your Dataflows Gen2. We loaded a 6 GB CSV file to a Lakehouse table in Microsoft Fabric with 8x faster and 3x cheaper result. See our last post for details. Today, we’re excited to announce that Fast Copy in Dataflow Gen2 now supports high-performance data transfers from … Continue reading “Fast Copy with On-premises Data Gateway Support in Dataflow Gen2”