Microsoft Fabric Updates Blog

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

Incremental Refresh in Dataflow Gen2 is now in public preview! This powerful feature is designed to optimize your data processing by ensuring that only the data that has changed since the last refresh is updated. This means faster dataflows and more efficient resource usage.

A drop down menu that highlights the incremental refresh option.

Key Features of Incremental Refresh

Incremental refresh in Dataflow Gen2 is a bit different from Power BI Dataflow. Since we now support data destinations we do not require you to setup a period of dates that you want to retain when configuring incremental refresh.

A config menu that allows the users to setup incremental refresh based on the settings the user provided.
  1. Easy Setup: Right click the query and select Incremental refresh to get started.
  2. Customizable Settings: Configure the necessary settings for incremental refresh.
    • Choose a Date/DateTime/DateTimeZone column to filter by.
    • Specify the data extraction period you want to check every refresh for updates
    • Define the bucket size, the bigger the bucket, the more data fits into it, but you give up some parallelism.
    • Provide a column that we can check for new or updated records in the bucket.
  3. Advanced Configuration: If needed, configure advanced settings to allow publishing even if the query does not fully folds.
  4. Data Destination Setup: Optionally, set up a data destination before the first incremental refresh to ensure it lands in the place you want to consume the data from.
  5. Publish and Automate: Publish your Dataflow Gen2 and let it automatically refresh incrementally with a pipeline or a schedule based on your settings.

How Incremental Refresh Works?

Incremental refresh divides data into buckets based on the Date, DateTime or DateTimeZone column.

Here’s a high-level overview of the process:

  1. Evaluate Changes: The dataflow compares the maximum value in the change detection column with the previous refresh. If the value has changed, the bucket is marked for processing.
  2. Retrieve Data: The dataflow retrieves data for the changed buckets in parallel, loading it into the staging area.
  3. Replace Data: The dataflow replaces the data in the destination with the new data, ensuring only the updated buckets are affected. Any historical data or data that is outside the range of buckets marked for processing is not touched or changed. This way you can retain long term history in your destination.

Benefits of Incremental Refresh

  • Efficiency: Only the data that has changed since the last refresh is processed, saving time and resources.
  • Performance: Faster dataflows due to reduced data processing and parallelism.
  • Scalability: Handle large datasets more effectively by processing data in smaller, manageable chunks.

Get Started Today!

We invite you to try out the public preview of Incremental Refresh in Dataflow Gen2. Follow the steps outlined above to set up incremental refresh and experience the benefits firsthand. Your feedback is invaluable to us as we continue to improve and enhance this feature.

Stay tuned for more updates and enhancements as we move towards general availability. Happy dataflowing!

Resources

Docs: https://aka.ms/DFgen2-IncrementalRefresh-DOCS  

Postări de blog conexe

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

octombrie 4, 2024 de Jason Himmelstein

We had an incredible time in our host city of Stockholm for FabCon Europe! 3,300 attendees joined us from our international community, and it was wonderful to meet so many of you in person. Throughout the week of FabCon Europe, our teams published a wealth of valuable content, and we want to ensure you have … Continue reading “Fabric Community Conference Europe Recap”

octombrie 2, 2024 de Miguel Llopis

Last week was such an exciting week for Fabric during the Fabric Community Conference Europe, filled with several product announcements and sneak previews of upcoming new features. Thanks to all of you who participated in the conference, either in person or by being part of the many virtual conversations through blogs, Community forums, social media … Continue reading “Recap of Data Factory Announcements at Fabric Community Conference Europe”