Microsoft Fabric Updates Blog

Optimizing for CI/CD in Microsoft Fabric

For nearly three years, Microsoft’s internal Azure Data team has been developing data engineering solutions using Microsoft Fabric. Throughout this journey, we’ve refined our Continuous Integration and Continuous Deployment (CI/CD) approach by experimenting with various branching models, workspace structures, and parameterization techniques. This article walks you through why we chose our strategy and how to implement it in a way that scales.

Key points you’ll learn:

    • Strategies for organizing Fabric workspaces across multiple environments.
    • A Git branching strategy that reduces production risk and simplifies merges.
    • A structured CI/CD process using the fabric-cicd Python library, including environment parameterization.
    • Item-specific tips for Notebooks, Data pipelines, Lakehouses, and semantic models.

Contents


The full deployment story: context & concepts

Our data engineering workflows revolve around the Notebook/Lakehouse paradigm within Microsoft Fabric, including items such as:

    • Notebooks for data transformation.
    • Data pipelines for data movement/orchestration.
    • Lakehouses for storage.
    • Semantic models (Direct Lake) to expose data.
    • Fabric SQL for metadata management.

Challenges that shaped our approach

Initially, we faced several common issues while deploying notebooks and pipelines:

    • Environment drift (Pre-Production vs Production references in code).
    • Unclear or inconsistent naming, leading to errors in deployment scripts.
    • Merge conflicts when multiple engineers worked on the same code base.
    • Manual overhead for reconfiguring connections.

Through workspace segmentation, branching discipline, and parameterizing environment references, we significantly improved development speed, reduced error rates, and made production deployments more predictable.


Workspace structure

We maintain six core workspace categories, each typically existing in two (or more) deployment environments: Pre-Production (PPE) and Production (PROD). Workspace isolation enables cleaner workspaces and allows us to concentrate on the most critical tasks. Additionally, the isolation enables necessary deployment patterns like deploying a Notebook that creates a new Lakehouse table, prior to deploying a semantic model. The specific workspace categories you decide to use should align to your workflow, and to your deployment patterns. 

Each category is related to a distinct workspace for a given deployment environment – for instance, the below would translate to 12 workspaces if using two environments. This is obviously a lot to manage for a single project, so we enforce strict naming conventions to streamline navigation, and assign distinct color-coded icons to each workspace.


Repository structure

We maintain one Git repository that corresponds to the core code base, with directories for our yaml deployment pipelines, deployment scripts, and a workspace directory containing subdirectories for each workspace category. This structure supports the ability to seamlessly add additional workspaces as needed, without the additional overhead of thinking through new repositories or branching strategies.

  • 📁 .deploy
  • 📁 .pipelines
  • 📁 workspace
    • 📁 engineering
    • 📁 insights
    • 📁 integration
    • 📁 orchestration
    • 📁 store
    • 📁 presentation

Branching workflow

Lowest branch as default branch

Unlike conventional repositories where main is the default branch, we opted to use ppe as the primary branch. This ensures that in-flight work doesn’t accidentally point to production resources. It encourages a safer, more deliberate process to move from PPE to PROD with explicit parameterization of production endpoints.

Initial configuration

The first time setting up a new workspace (or workspace category defined above):

  1. Start development in a clean workspace, name this PPE or equivalent.
  2. When ready to deploy, temporarily Git sync the workspace to a new branch, this will become the initial ppe branch.
  3. Disconnect the PPE workspace from Git. Any future changes to this workspace will be via a deployment.
  4. Cherry-pick from the ppe branch to initialize the main branch.
  5. Create the PROD workspace and initialize by deploying from the main branch.

Pull request flow

Each engineer sets up feature workspaces attached to specific capacities and configurations. Initially, we tried using a single workspace per engineer, but switching branches for multiple in-progress items proved inefficient.

  1. Create a feature branch from the ppe branch.
  2. Connect a workspace to the feature/name branch and develop the required work.
  3. Submit a PR to squash-merge changes into the ppe branch.
  4. On PR approval, CI/CD deploys these updates to the PPE workspace.
  5. Cherry-pick the same PR commit into the main branch and squash-merge.
  6. CI/CD deploys final changes to the PROD workspace.


Item-specific deployment considerations

Even with an ideal branching strategy and workspace structure, there are still important factors to consider during development. We’ve highlighted a few key considerations, though this list is not exhaustive and does not cover all item types. The main goal is to emphasize the importance of developers adopting a CI/CD mindset and thinking about how their code will be promoted from one deployment environment to another.

Notebook

    • To streamline deployments, centrally manage Azure Blob File System (ABFS) connections using a shared notebook or custom library. Each notebook either calls the shared notebook using %run or is attached to an Environment item with the custom library preloaded.

Sample connection dictionary in Util_Connection_Library Notebook


core_prod = "abfss://eng-prod-storage@onelake.dfs.fabric.microsoft.com/Core.Lakehouse"
core_default = f"abfss://eng-{env}-storage@onelake.dfs.fabric.microsoft.com/Core.Lakehouse"
connection = {
    "dataprod_default": f"{core_default}/Tables/Dataprod/",
    "curate_default": f"{core_default}/Tables/Curate/",
    "temp_default": f"{core_default}/Files/Temp/",
    "intake_prod": f"{core_prod}/Files/Intake/",
    "hr_prod": "abfss://data@contosohr.dfs.core.windows.net/",
    "finance_prod": "abfss://data@contosofinance.dfs.core.windows.net/",
    "marketing_prod": "abfss://data@contosomarketing.dfs.core.windows.net/"
}

In a Notebook, import the library, or %run the Notebook containing the library.


%run Util_Connection_Library

Once imported, refer to the connection dictionary directly in the reads and writes.


spark.read.format("delta").load(connection["dataprod_default"] + "DIM_Calendar")\
    .createOrReplaceTempView("vwCalendar")


Connection-based items

Data pipelines, Lakehouse Shortcuts, Dataflow Gen2, and semantic models rely on Fabric connections (found in ‘Manage connections and gateways’).

    • Developers must manually create PPE/PROD connections upfront so that they can be parameterized in source control.
    • Connections should be shared with a security group that includes all developers and deployment identities. This step is critical so that deployments and automated runs in production don’t fail.
    • Automating connections is possible, but we focus on automating what’s in source control to avoid redundant investments in areas with upcoming product improvements.

Lakehouse

    • Place Lakehouses in workspaces that are separate from their dependent items.
      • For example, avoid having a notebook attached to a Lakehouse in the same workspace. This feels a bit counterintuitive but avoids needing to rehydrate data in every feature branch workspace. Instead, the feature branch notebooks always point to the PPE Lakehouse.
    • Document all known endpoints centrally to maintain consistency across deployment environments – directly related to the Notebook connection dictionary suggestion above.

Environment

    • Attach environments needing custom pools to capacity pools, rather than workspace pools. This ensures that when environments are committed to source control, they are not linked to an unknown workspace pool.
      • For example, if a developer has their own private feature branch workspace and creates a new environment attached to a workspace pool, it will appear in source control as attached to a random GUID that does not exist outside the developer’s context. By using a capacity pool instead, the environment is linked to a known GUID that can be parameterized.
    • Use consistent naming for Capacity Pools across deployment environments to simplify deployments.

Semantic model

    • Initial creation of a semantic model requires manually configuring the connection. Without manual configuration, the first refresh after deployment will fail. However, this is only required for the initial creation of a semantic model, not each subsequent update.
    • Semantic models currently point to the original data source defined in source control, it’s important to create the Fabric connections defined above before deployment and parameterize the right connection at deployment time.

Deployment automation

We utilize the fabric-cicd Python library to automate deployments, providing a code-first solution for deploying Microsoft Fabric items from a repository into a workspace. Refer to this article, Introducing fabric-cicd Deployment Tool, for a more in-depth overview of the functionality.


Conclusion

By adopting these principles, your team can establish a robust and repeatable approach to data engineering in Microsoft Fabric. By following a safe branching model, dedicated deployment environment workspaces, thorough parameterization, and automated deployments with fabric-cicd, you can navigate the complexities of modern data solutions with confidence. Feel free to adapt the specifics to your organization’s needs or constraints. Keep in mind that there is never one right answer for the perfect CI/CD flow, this is simply one way to do it. Good luck, and happy deploying!

Next steps

    • Take a moment to decompress; there’s a lot to absorb!
    • Find the relevant areas to your workflow and make a plan.
    • Check out the fabric-cicd deployment tool.
    • Engage in the community with questions and comments.

Contributors: Jacob Knightley, Joe Muziki, Kiefer Sheldon, Will Crayger (Lucid)

Related blog posts

Optimizing for CI/CD in Microsoft Fabric

April 8, 2025 by Someleze Diko

Driving actions from real-time organizational data is important for making informed data-driven decisions and improving overall efficiency. By leveraging data effectively, organizations can gain insights into customer behaviour, operational performance, and market trends, enabling them to respond promptly to emerging issues and opportunities. Setting alerts on KQL queries can significantly enhance this proactive approach, especially … Continue reading “Implementing proactive monitoring with KQL query alerts with Activator”

April 1, 2025 by Hasan Abo Shally

TL; DR Give it a try. Break things. Tell us what you want next. Install the CLI and get started We’re excited to announce that the Fabric Command Line Interface (CLI) is now available in public preview — bringing a fast, flexible, and scriptable way to work with Microsoft Fabric from your terminal. What is … Continue reading “Introducing the Fabric CLI (Preview)”