Microsoft Fabric Updates Blog

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

Data Factory

Incremental Refresh in Dataflow Gen2 is now in public preview! This powerful feature is designed to optimize your data processing by ensuring that only the data that has changed since the last refresh is updated. This means faster dataflows and more efficient resource usage.

A drop down menu that highlights the incremental refresh option.

Key Features of Incremental Refresh

Incremental refresh in Dataflow Gen2 is a bit different from Power BI Dataflow. Since we now support data destinations we do not require you to setup a period of dates that you want to retain when configuring incremental refresh.

A config menu that allows the users to setup incremental refresh based on the settings the user provided.

Easy Setup: Right click the query and select Incremental refresh to get started.
Customizable Settings: Configure the necessary settings for incremental refresh.
- Choose a Date/DateTime/DateTimeZone column to filter by.
- Specify the data extraction period you want to check every refresh for updates
- Define the bucket size, the bigger the bucket, the more data fits into it, but you give up some parallelism.
- Provide a column that we can check for new or updated records in the bucket.
Advanced Configuration: If needed, configure advanced settings to allow publishing even if the query does not fully folds.
Data Destination Setup: Optionally, set up a data destination before the first incremental refresh to ensure it lands in the place you want to consume the data from.
Publish and Automate: Publish your Dataflow Gen2 and let it automatically refresh incrementally with a pipeline or a schedule based on your settings.

How Incremental Refresh Works?

Incremental refresh divides data into buckets based on the Date, DateTime or DateTimeZone column.

Here’s a high-level overview of the process:

Evaluate Changes: The dataflow compares the maximum value in the change detection column with the previous refresh. If the value has changed, the bucket is marked for processing.
Retrieve Data: The dataflow retrieves data for the changed buckets in parallel, loading it into the staging area.
Replace Data: The dataflow replaces the data in the destination with the new data, ensuring only the updated buckets are affected. Any historical data or data that is outside the range of buckets marked for processing is not touched or changed. This way you can retain long term history in your destination.

Benefits of Incremental Refresh

Efficiency: Only the data that has changed since the last refresh is processed, saving time and resources.
Performance: Faster dataflows due to reduced data processing and parallelism.
Scalability: Handle large datasets more effectively by processing data in smaller, manageable chunks.

Get Started Today!

We invite you to try out the public preview of Incremental Refresh in Dataflow Gen2. Follow the steps outlined above to set up incremental refresh and experience the benefits firsthand. Your feedback is invaluable to us as we continue to improve and enhance this feature.

Stay tuned for more updates and enhancements as we move towards general availability. Happy dataflowing!

Microsoft Fabric Updates Blog

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

Key Features of Incremental Refresh

How Incremental Refresh Works?

Benefits of Incremental Refresh

Get Started Today!

Resources

Related blog posts

Please enter your information to subscribe to the Microsoft Fabric Blog.

Microsoft Fabric Updates Blog

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

Key Features of Incremental Refresh

How Incremental Refresh Works?

Benefits of Incremental Refresh

Get Started Today!

Resources

Related blog posts

Microsoft Fabric and AI Learning Hackathon: Copilot in Fabric

Use Azure OpenAI to turn whiteboard sketches into data pipelines