Microsoft Fabric Updates Blog

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

Incremental Refresh in Dataflow Gen2 is now in public preview! This powerful feature is designed to optimize your data processing by ensuring that only the data that has changed since the last refresh is updated. This means faster dataflows and more efficient resource usage.

A drop down menu that highlights the incremental refresh option.

Key Features of Incremental Refresh

Incremental refresh in Dataflow Gen2 is a bit different from Power BI Dataflow. Since we now support data destinations we do not require you to setup a period of dates that you want to retain when configuring incremental refresh.

A config menu that allows the users to setup incremental refresh based on the settings the user provided.
  1. Easy Setup: Right click the query and select Incremental refresh to get started.
  2. Customizable Settings: Configure the necessary settings for incremental refresh.
    • Choose a Date/DateTime/DateTimeZone column to filter by.
    • Specify the data extraction period you want to check every refresh for updates
    • Define the bucket size, the bigger the bucket, the more data fits into it, but you give up some parallelism.
    • Provide a column that we can check for new or updated records in the bucket.
  3. Advanced Configuration: If needed, configure advanced settings to allow publishing even if the query does not fully folds.
  4. Data Destination Setup: Optionally, set up a data destination before the first incremental refresh to ensure it lands in the place you want to consume the data from.
  5. Publish and Automate: Publish your Dataflow Gen2 and let it automatically refresh incrementally with a pipeline or a schedule based on your settings.

How Incremental Refresh Works?

Incremental refresh divides data into buckets based on the Date, DateTime or DateTimeZone column.

Here’s a high-level overview of the process:

  1. Evaluate Changes: The dataflow compares the maximum value in the change detection column with the previous refresh. If the value has changed, the bucket is marked for processing.
  2. Retrieve Data: The dataflow retrieves data for the changed buckets in parallel, loading it into the staging area.
  3. Replace Data: The dataflow replaces the data in the destination with the new data, ensuring only the updated buckets are affected. Any historical data or data that is outside the range of buckets marked for processing is not touched or changed. This way you can retain long term history in your destination.

Benefits of Incremental Refresh

  • Efficiency: Only the data that has changed since the last refresh is processed, saving time and resources.
  • Performance: Faster dataflows due to reduced data processing and parallelism.
  • Scalability: Handle large datasets more effectively by processing data in smaller, manageable chunks.

Get Started Today!

We invite you to try out the public preview of Incremental Refresh in Dataflow Gen2. Follow the steps outlined above to set up incremental refresh and experience the benefits firsthand. Your feedback is invaluable to us as we continue to improve and enhance this feature.

Stay tuned for more updates and enhancements as we move towards general availability. Happy dataflowing!

Resources

Docs: https://aka.ms/DFgen2-IncrementalRefresh-DOCS  

Related blog posts

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

October 15, 2024 by Someleze Diko

This session is part of the Microsoft Fabric and AI Learning Hackathon which focuses on how you can leverage Copilot in Microsoft Fabric. It will guide you through the various capabilities that Copilot offers for you to use Microsoft Fabric, empowering you to enhance productivity and streamline your workflows. We will dive deep into practical … Continue reading “Microsoft Fabric and AI Learning Hackathon: Copilot in Fabric”

October 10, 2024 by Abhishek Narain

At the Microsoft Fabric Community Conference Europe 2024, we announced the General Availability (GA) of Copilot for Data Factory. It operates like a subject-matter expert (SME), collaborating with you to design your dataflows. Find our Copilot for Data Factory GA announcement blog.   Today, we all brainstorm ideas and draw sketches before formalizing them. As … Continue reading “Use Azure OpenAI to turn whiteboard sketches into data pipelines”