Microsoft Fabric Updates Blog

Construct a data analytics workflow with a Fabric Data Factory data pipeline

Microsoft Fabric Data Factory provides an easy way to build low-code data integration and ETL projects for building cloud-scale data analytics. Today, I want to focus on data pipelines in Data Factory and the advantages you’ll find by using pipelines to orchestrate your Fabric data analytics projects and activities.

What is a data pipeline?

For Azure Data Factory and Azure Synapse users, data pipelines will be very familiar as we’ve had data pipelines in those products for many years. Now that Data Factory and data pipelines are available in the SaaS orientation of Fabric, you will find the experience to be nearly identical. However, if you are primarily a Power BI or Power Platform user, you may not have experience with data pipelines. So, today, I’d like to take a few minutes to explain what a data pipeline is.

In the context of Fabric data analytics, you will use a data pipeline to build automated workflows that combine the different artifacts in your workspace that you’ve created as a way to build your analytics. As an example, in the screenshot below, you can see that I’ve built a pipeline that performs the following tasks:

  1. Find files in a storage folder
  2. Iterate over the files found
  3. Copy each file contents to the bronze layer in my Lakehouse
  4. After the data has been loaded to bronze, run a Spark Notebook to transform the data and load it into the silver layer
  5. If the Notebook was successful, send an email to the team and continue
  6. If the Notebook failed, notify the team via a Teams channel and then fail the pipeline
  7. Execute a Dataflow to combine and clean data, preparing for gold layer
  8. Finally, issue a Copy command to load the cleaned data into the gold layer for reporting

Why would you use a data pipeline?

I created that pipeline design entirely in the web UI in Fabric without writing any code. Now I can set a schedule to automate the execution of my logic on a regular cadence from the designer UI when I click on the Schedule button. The frequency with which you update your Lakehouse will depend upon the business requirements and the frequency with which new data arrives at your sources.

Separately, inside of Fabric, I can create and manage those artifacts that I just orchestrated above. My Notebook is created and tested in the Data Engineering app, while I used the Data Factory app to create a Dataflow. So now I use Data Factory data pipelines in Fabric to bring them all together into a single cohesive logical “pipeline”. In other words, I just created an end-to-end workflow that I can run on a schedule, fully automated and additionally … now I can use the central Monitoring Hub feature in Fabric to watch the execution of my pipelines, Notebooks, Dataflows, etc. all from a single pane of glass:

So as you build your analytics project in Fabric, you’ll use data pipelines to piece those artifacts together into an automated workflow to keep your Lakehouse (and subsequently, your business reporting users) updated, refreshed, and cleaned.

How to get started

I hope that this gives you a sense of the value that data pipelines from the Data Factory app inside of Microsoft Fabric can bring to your data analytics projects. To get started, switch over to Data Factory in Fabric and choose New > Data Pipeline. You’ll land on the page in the below screenshot when you can being adding activities to the low-code design surface and begin building your own workflows!

Other resources

  • Join the Fabric community to post your questions, share your feedback, and learn from others.
  • Visit Microsoft Fabric Ideas to submit feedback and suggestions for improvements and vote on your peers’ ideas!
  • Check our Known Issues page for up to date on product fixes!

Have any questions or feedback? Leave a comment below!

Related blog posts

Construct a data analytics workflow with a Fabric Data Factory data pipeline

May 16, 2024 by Jianlei Shen

To improve the flexibility for copying data in Fabric Data Factory, we are excited to announce that now you can edit destination table column types when copying data!  Supported scenarios This new feature allows you to edit the data type of the column for a new or auto-created destination table, if your data destination is … Continue reading “Edit the Destination Table Column Type when Copying Data to Lakehouse Table, Data Warehouse and SQL Data Stores “

May 16, 2024 by Dan Liu

Leverage the power of task flows to design and build your data solutions and manage workspace items in Microsoft Fabric. We’re thrilled to announce that the task flows feature is now in public preview and is enabled for all existing Microsoft Fabric users. Fabric is unifying everything needed to deliver end-to-end data and analytics solutions … Continue reading “Announcing the public preview of task flows in Microsoft Fabric”