Microsoft Fabric Updates Blog

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

Incremental Refresh in Dataflow Gen2 is now in public preview! This powerful feature is designed to optimize your data processing by ensuring that only the data that has changed since the last refresh is updated. This means faster dataflows and more efficient resource usage.

A drop down menu that highlights the incremental refresh option.

Key Features of Incremental Refresh

Incremental refresh in Dataflow Gen2 is a bit different from Power BI Dataflow. Since we now support data destinations we do not require you to setup a period of dates that you want to retain when configuring incremental refresh.

A config menu that allows the users to setup incremental refresh based on the settings the user provided.
  1. Easy Setup: Right click the query and select Incremental refresh to get started.
  2. Customizable Settings: Configure the necessary settings for incremental refresh.
    • Choose a Date/DateTime/DateTimeZone column to filter by.
    • Specify the data extraction period you want to check every refresh for updates
    • Define the bucket size, the bigger the bucket, the more data fits into it, but you give up some parallelism.
    • Provide a column that we can check for new or updated records in the bucket.
  3. Advanced Configuration: If needed, configure advanced settings to allow publishing even if the query does not fully folds.
  4. Data Destination Setup: Optionally, set up a data destination before the first incremental refresh to ensure it lands in the place you want to consume the data from.
  5. Publish and Automate: Publish your Dataflow Gen2 and let it automatically refresh incrementally with a pipeline or a schedule based on your settings.

How Incremental Refresh Works?

Incremental refresh divides data into buckets based on the Date, DateTime or DateTimeZone column.

Here’s a high-level overview of the process:

  1. Evaluate Changes: The dataflow compares the maximum value in the change detection column with the previous refresh. If the value has changed, the bucket is marked for processing.
  2. Retrieve Data: The dataflow retrieves data for the changed buckets in parallel, loading it into the staging area.
  3. Replace Data: The dataflow replaces the data in the destination with the new data, ensuring only the updated buckets are affected. Any historical data or data that is outside the range of buckets marked for processing is not touched or changed. This way you can retain long term history in your destination.

Benefits of Incremental Refresh

  • Efficiency: Only the data that has changed since the last refresh is processed, saving time and resources.
  • Performance: Faster dataflows due to reduced data processing and parallelism.
  • Scalability: Handle large datasets more effectively by processing data in smaller, manageable chunks.

Get Started Today!

We invite you to try out the public preview of Incremental Refresh in Dataflow Gen2. Follow the steps outlined above to set up incremental refresh and experience the benefits firsthand. Your feedback is invaluable to us as we continue to improve and enhance this feature.

Stay tuned for more updates and enhancements as we move towards general availability. Happy dataflowing!

Resources

Docs: https://aka.ms/DFgen2-IncrementalRefresh-DOCS  

相關部落格文章

Announcing Public Preview: Incremental Refresh in Dataflow Gen2

9月 26, 2024 作者: Ye Xu

Fast Copy in Dataflow Gen2 is now General Available! This powerful feature enables rapid and efficient ingestion of large data volumes, leveraging the same robust backend as the Copy Activity in Data pipelines. With Fast Copy, you can experience significantly shorter data processing times and improved cost efficiency for your Dataflow Gen2. Additionally, it boosts … Continue reading “Announcing the General Availability of Fast Copy in Dataflows Gen2”

9月 26, 2024 作者: Leo Li

Fabric Data Pipeline support in the On-Premises Data Gateway is now generally available! The on-premises data gateway allows you to seamlessly bring on-premises data to Microsoft Fabric. With data pipelines and the on-premises data gateway, you can perform high-scale data ingestion of your on-premises data into Fabric. Enhancements Over Self-Hosted Integration Runtime in Azure Data … Continue reading “Announcing the General Availability of Fabric Data Pipeline Support in the On-Premises Data Gateway”