Microsoft Fabric Updates Blog

Exploring CI/CD Capabilities in Microsoft Fabric: A Focus on Data Pipelines

Author: Deepak Gundavaram

Exploration of Microsoft Fabric’s CI/CD Features

In today’s rapidly evolving digital and AI landscape, Microsoft Fabric plays a crucial role in managing and automating data integration and analytics processes. As a comprehensive data platform, Fabric offers robust CI/CD capabilities, particularly for data pipelines. These capabilities enable efficient tracking of changes, seamless collaboration, and easy reversion to previous states through Git. Continuous integration and continuous deployment frameworks further streamline code integration and deployment, minimizing human error and accelerating the delivery of updates and new features. Below is the list of currently supported CI/CD Artifacts:

  • Data Pipelines: These are crucial for orchestrating data flow and transformation processes, ensuring data consistency and operational efficiency.
  • Lakehouse: Supports automated updates and integrations across both structured and unstructured data storage.
  • Notebooks: Offers version control and automated deployment for Notebooks, enhancing reproducibility across data science and engineering projects.
  • Paginated Reports: Enables automated updating and distribution processes, ensuring that report outputs remain timely and reliable.
  • Reports: Currently supports reports that are not connected to external semantic models or hosted in MyWorkspace.
  • Semantic Models: Includes all except push datasets, live connections, model v1, and semantic models derived from data warehouses or Lakehouse’s.

While Fabric supports CI/CD for a broad spectrum of artifacts, our primary focus for this introductory post will remain on data pipelines. As the capabilities within Microsoft Fabric are continuously updated, staying informed through regular consultation of the official documentation is advisable. This is just the beginning, and future posts will delve into additional deployment patterns and advanced techniques.

Prerequisites

Setting Up Azure DevOps and Git Repository

To use Microsoft Fabric’s CI/CD features, setting up an Azure DevOps project and a Git repository is essential. GitHub support was recently added, but that is a topic for another day. Here are some general steps to get started:

Creating the Repository

  1. Sign in to Azure DevOps 
  2. Navigate to your organization and create a new project.
  3. Go to Repos.
  4. Click “New repository.”
  5. Choose Git as the repository type.
  6. Provide a name and optional description.
  7. Configure settings as needed.
  8. Create a repository.

These steps will set up the basic infrastructure required for managing and deploying code changes using Microsoft Fabric’s deployment pipelines.

Enable Git Integration in Microsoft Fabric

Navigate to the admin portal within Microsoft Fabric and click on Tenant settings. Look for the Git Integration and enable “Users can synchronize items with the Git repositories (preview)” setting.

Understanding Git and CI/CD

Git Workflow Essentials

Git is a distributed version control system that enhances collaborative software development by managing code changes:

  • Main Branch: The master or main branch holds production-ready code.
  • Feature Branches: Allow for isolated development, ensuring the main branch remains stable.
  • Pull Requests: Propose, review, and discuss changes before integration.
  • Merging: Integrates approved changes, continuously updating the project.

CI/CD Explained

Continuous Integration (CI) and Continuous Deployment (CD) are practices that automate software delivery:

  • Continuous Integration: Developers frequently commit to a Git-managed main branch, triggering automated tests and builds for integration.
  • Git’s Role in CI: Git tracks changes to enable automatic fetching and testing of new commits.
  • Continuous Deployment: Focuses on deploying verified changes to production environments through structured deployment stages within deployment pipelines.

Workspace setup

To follow the steps in the following sections, it is recommended to set up the following workspaces in Microsoft Fabric:

  1. Feature Workspace: Each feature branch should have a corresponding feature workspace for initial development and unit testing. This workspace is tied to Git, enabling seamless integration and version control.
  2. Dev Workspace: Once changes are merged into the main branch, the Fabric dev workspace, which is also tied to Git, is used for integration testing.
  3. Stage Workspace: After successful validation in the dev workspace, the code is promoted to the stage workspace for pre-production testing. This workspace is updated through Fabric deployment pipelines.
  4. Prod Workspace: Finally, the code is moved to the prod workspace for live deployment, using Fabric deployment pipelines.

Overview of Fabric CI/CD Platform

The Fabric CI/CD platform optimizes continuous integration and deployment by integrating seamlessly with Microsoft Fabric. Below is a table outlining its key features:

Features

Functionality

Git Integration

Facilitates efficient version control and collaborative workflows for distributed teams.

Deployment Pipelines

Automates transition from development to production with consistent, reliable deployments.

Fabric REST APIs

Enables automation and integration, allowing developers to programmatically manage and customize CI/CD processes.

High Level Overview of Fabric CICD Process

Initial Setup:

  • Git Repository Initialization: A primary ‘main’ branch is established in Azure DevOps to maintain stable, production-ready code.

Development Cycle:

  • Feature Branch Creation: Developers create ‘feature’ branches from the main branch for isolated and secure development activities.
  • Development and Testing: Each feature branch is linked to a corresponding feature workspace. Developers use these Git-integrated workspaces for initial development and unit testing before merging the code into the main branch.

Review and Integrate:

  • Pull Requests: Changes from the feature branches are proposed for merging into the main branch through pull requests.
  • Peer Review: These changes undergo thorough review to adhere to quality and compliance standards.

Deployment via Fabric Deployment Pipelines:

  • Fabric deployment pipelines automate building, testing, and deploying code across designated workspaces.

Fabric CICD Process Flow

A diagram of a software development process

Description automatically generated

Automated deployment process:

  1. Integration with Dev Workspace: Once changes are merged into the main branch, the Fabric dev workspace is updated with new artifacts through either using the UI or Fabric REST API.
  2. Dev Workspace: Used for integration testing after merging changes from the main branch.
  3. Staging Workspace: Following successful validation in the dev workspace, the code is promoted to the stage workspace for pre-production testing.
  4. Prod Workspace: Finally, the code is moved to the prod workspace for live deployment.

Only feature and dev workspaces are connected to Git, while the stage and prod workspaces are updated via Fabric deployment pipelines, not Azure DevOps pipelines.

This streamlined process ensures efficient development, testing, and deployment, leveraging robust integration and automation capabilities provided by Microsoft Fabric.

Step-by-Step Guide

Building on our discussion of the theoretical frameworks of CI/CD with Azure DevOps and Microsoft Fabric, we now move to a step-by-step guide. This section is designed to practically demonstrate the setup and operation of CI/CD processes. Follow along to see how these concepts are applied in real-world scenarios.

Continuous Integration

Integrating the Fabric Dev workspace with the Repo(Main Branch)

To integrate the Azure DevOps Repo’s main branch with Fabric Dev Workspace, we would have to navigate the appropriate workspace within Fabric and then to workspace settings. Once there click on GIT integration and provide the details to connect to the repo. Click the connect and sync button.

A screenshot of a computer

Once we do, the workspace gets synced with the Repo. As we do not have the artifacts in the repo or the workspace, there are no synced fabric artifacts in either direction.

Integrating the Fabric feature workspace with the Azure Devops Repo (Feature Branch)

When a development team starts working on a new feature, it is recommended to create a feature branch based off the main branch and integrate it with a feature workspace in Microsoft Fabric. This can be easily achieved by navigating to the workspace settings of the feature workspace.

A screenshot of a computer
A screenshot of a computer

Description automatically generated

All the artifacts which are developed by the developer would be committed to this feature branch and once the development is complete the pull request within Azure DevOps and associated approval process will be leveraged to merge the feature branch code base into main branch.

Creating Data Pipeline Artifacts and Merging Feature Branch into Main Branch

Now let’s create a simple Data Pipeline with a Wait activity and save the data pipeline.

A screenshot of a computer

Description automatically generated

When we navigate back to workspace, you will notice that source control indicates that there are changes that need to be committed. To do so, click on the Source control icon and it will bring up the source control pane and we will be commiting the changes, i.e. the newly developed data pipeline into the feature branch.

A screenshot of a computer
A screenshot of a computer

Once we commit, if we navigate back to the workspace, we can see that the Git status of the data pipeline artifact has changed from uncommitted to Synced.

A screenshot of a computer

In order for us to submit a pull request, we need to navigate to the Source control pane and then click on View repository.

A screenshot of a computer
A screenshot of a computer

During this process, we can add review processes by adding required reviewers and set up default reviewers based on the Azure DevOps policies.

A screenshot of a computer
A screenshot of a computer
A screenshot of a computer

Once we are done with the Merge process, we will see artifacts added to the main branch.

A screenshot of a computer

Updating the Dev Workspace with incoming changes

Now, let’s navigate to the Dev workspace created in the first step. Since the Dev workspace is integrated with the main branch, the source control icon in the workspace should indicate that there are updates in the main branch. We will now update the workspace with incoming changes using the UI. Note that this can also be achieved using the Fabric REST API, and we will provide links on how to leverage the Fabric REST API at the end of the document.

A screenshot of a computer

Description automatically generated

Click on Source control and then update all. This will update the workspace with incoming changes.

Deployment using fabric Deployment Pipelines

Before we move to the Deployment pipelines, we would have to create two new workspaces for stage and prod regions.

Stage Workspace:

A screenshot of a computer

Prod Workspace:

A screenshot of a computer

For us to create a deployment pipeline, we will first navigate to the Dev workspace and click on create deployment pipeline.

A screenshot of a computer

Name the deployment pipeline and click next.

A screenshot of a computer

Based on the number of stages we have, we can customize the stages accordingly. Fabric supports up to 10 workspaces. Here, we are going with three stages.

A screenshot of a computer

With in the deployment pipeline, we would have to assign workspaces for each of the stages.

A screenshot of a computer
A screenshot of a computer

Here are the assigned workspaces

A screenshot of a computer

Before deploying code from development to staging, the Fabric UI indicates any differences between the two workspaces by displaying an X mark inside a circle. If there are no differences, Fabric displays a check mark. You can click on the compare icon to identify which objects are different.

In this case, it is the newly created data pipeline i.e. Datapipeline1

A screenshot of a computer

Once we review the changes and are ready to deploy, we click on the “Deploy” button. Additionally, please note that we can cherry-pick which artifacts need to be deployed from dev to stage and from stage to prod if needed.

A screenshot of a computer
A screenshot of a computer

In the upper right corner, there is a Deployment History icon that allows us to view the deployment history.

A screenshot of a computer

Once deployed, we can go into the deployment section and review the deployments we have completed.

A screenshot of a computer

In a similar way, to move data pipeline artifact from stage to prod, follow the same steps used to deploy from dev to stage.

Conclusion

As mentioned in the introduction, this blog post serves as an introductory guide. Future posts will explore additional deployment patterns for transitioning between workspaces. Stay tuned for more in-depth insights and advanced techniques.

We value your feedback! Please leave a comment or share your thoughts and suggestions to help us improve: Provide Feedback

Further Reading

Known Limitations

Currently, there are some limitations in Microsoft Fabric when deploying data pipelines that users should be aware of. These limitations are expected to be addressed in future updates:

Data Pipeline Activities with OAuth Connectors: For MS Teams and Outlook connectors, when deploying to a higher environment, users must manually open each pipeline and sign into each activity, which is a limitation currently.

Data Pipelines Invoking Dataflows: When a data pipeline that invokes a dataflow is promoted, it will still reference the dataflow in the previous workspace, which is incorrect. This behavior occurs because dataflows are not currently supported in deployment pipelines.


Postagens relacionadas em blogs

Exploring CI/CD Capabilities in Microsoft Fabric: A Focus on Data Pipelines

outubro 29, 2024 de Leo Li

We’re excited to announce several powerful updates to the Virtual Network (VNET) Data Gateway, designed to further enhance performance and improve the overall user experience. These new features allow users to better manage increasing workloads, perform complex data transformations, and simplify log management. Expanded Cluster Size from 5 to 7 One of the key improvements … Continue reading “New Features and Enhancements for Virtual Network Data Gateway”

outubro 28, 2024 de Estera Kot

We’re thrilled to announce that the Native Execution Engine is now available at no additional cost, unlocking next-level performance and efficiency for your workloads. What’s New?  The Native Execution Engine now supports Fabric Runtime 1.3, which includes Apache Spark 3.5 and Delta Lake 3.2. This upgrade enhances Microsoft Fabric’s Data Engineering and Data Science workflows, … Continue reading “Native Execution Engine available at no additional cost!”