Microsoft Fabric Updates Blog

Construct a data analytics workflow with a Fabric Data Factory data pipeline

Microsoft Fabric Data Factory provides an easy way to build low-code data integration and ETL projects for building cloud-scale data analytics. Today, I want to focus on data pipelines in Data Factory and the advantages you’ll find by using pipelines to orchestrate your Fabric data analytics projects and activities.

What is a data pipeline?

For Azure Data Factory and Azure Synapse users, data pipelines will be very familiar as we’ve had data pipelines in those products for many years. Now that Data Factory and data pipelines are available in the SaaS orientation of Fabric, you will find the experience to be nearly identical. However, if you are primarily a Power BI or Power Platform user, you may not have experience with data pipelines. So, today, I’d like to take a few minutes to explain what a data pipeline is.

In the context of Fabric data analytics, you will use a data pipeline to build automated workflows that combine the different artifacts in your workspace that you’ve created as a way to build your analytics. As an example, in the screenshot below, you can see that I’ve built a pipeline that performs the following tasks:

  1. Find files in a storage folder
  2. Iterate over the files found
  3. Copy each file contents to the bronze layer in my Lakehouse
  4. After the data has been loaded to bronze, run a Spark Notebook to transform the data and load it into the silver layer
  5. If the Notebook was successful, send an email to the team and continue
  6. If the Notebook failed, notify the team via a Teams channel and then fail the pipeline
  7. Execute a Dataflow to combine and clean data, preparing for gold layer
  8. Finally, issue a Copy command to load the cleaned data into the gold layer for reporting

Why would you use a data pipeline?

I created that pipeline design entirely in the web UI in Fabric without writing any code. Now I can set a schedule to automate the execution of my logic on a regular cadence from the designer UI when I click on the Schedule button. The frequency with which you update your Lakehouse will depend upon the business requirements and the frequency with which new data arrives at your sources.

Separately, inside of Fabric, I can create and manage those artifacts that I just orchestrated above. My Notebook is created and tested in the Data Engineering app, while I used the Data Factory app to create a Dataflow. So now I use Data Factory data pipelines in Fabric to bring them all together into a single cohesive logical “pipeline”. In other words, I just created an end-to-end workflow that I can run on a schedule, fully automated and additionally … now I can use the central Monitoring Hub feature in Fabric to watch the execution of my pipelines, Notebooks, Dataflows, etc. all from a single pane of glass:

So as you build your analytics project in Fabric, you’ll use data pipelines to piece those artifacts together into an automated workflow to keep your Lakehouse (and subsequently, your business reporting users) updated, refreshed, and cleaned.

How to get started

I hope that this gives you a sense of the value that data pipelines from the Data Factory app inside of Microsoft Fabric can bring to your data analytics projects. To get started, switch over to Data Factory in Fabric and choose New > Data Pipeline. You’ll land on the page in the below screenshot when you can being adding activities to the low-code design surface and begin building your own workflows!

Other resources

  • Join the Fabric community to post your questions, share your feedback, and learn from others.
  • Visit Microsoft Fabric Ideas to submit feedback and suggestions for improvements and vote on your peers’ ideas!
  • Check our Known Issues page for up to date on product fixes!

Have any questions or feedback? Leave a comment below!

Postagens relacionadas em blogs

Construct a data analytics workflow with a Fabric Data Factory data pipeline

outubro 29, 2024 de Leo Li

We’re excited to announce several powerful updates to the Virtual Network (VNET) Data Gateway, designed to further enhance performance and improve the overall user experience. These new features allow users to better manage increasing workloads, perform complex data transformations, and simplify log management. Expanded Cluster Size from 5 to 7 One of the key improvements … Continue reading “New Features and Enhancements for Virtual Network Data Gateway”

outubro 28, 2024 de Estera Kot

We’re thrilled to announce that the Native Execution Engine is now available at no additional cost, unlocking next-level performance and efficiency for your workloads. What’s New?  The Native Execution Engine now supports Fabric Runtime 1.3, which includes Apache Spark 3.5 and Delta Lake 3.2. This upgrade enhances Microsoft Fabric’s Data Engineering and Data Science workflows, … Continue reading “Native Execution Engine available at no additional cost!”