Microsoft Fabric Updates Blog

Boost performance and save costs with Fast Copy in Dataflows Gen2

In this blog, you’ll learn how the Fast Copy feature helps to enhance the performance and cost-efficiency of your Dataflows Gen2.

In March, we announced the Public Preview of Fast Copy in Dataflows Gen2 within Microsoft Fabric. This feature allows you to ingest large amounts of data efficiently, leveraging the same backend as the Copy Activity in data pipelines. This helps to reduce data processing duration, and helps to improve cost efficiency. You also can find detailed instructions on enabling Fast Copy.

Let’s explore a real-world example to show case the benefits of enabling Fast Copy in Dataflows Gen2 within Microsoft Fabric. We used Dataflow Gen2 to load four CSV files, totaling 6GB, into a Lakehouse table. By comparing the performance and cost before and after enabling Fast Copy, we’ll demonstrate the significant improvements you can achieve.

Case 1: using native Dataflow Gen2 engine without Fast Copy

Configurations

To accomplish this scenario, create a Dataflow Gen2 by following these steps:

  • Upload 6GB of CSV files from ADLS Gen2 into Dataflow Gen2
  • Configure Power Query to combine the CSV files and
  • Set Lakehouse as the data output destination.

Performance result and cost

Dataflow Gen2 refresh of using native Dataflow Gen2 engine without Fast Copy

The Dataflow Gen2 Refresh operation consumed about 30 minutes with 28,816 CU seconds.

MetricCompute consumption
Dataflow Gen2 Refresh CU seconds 28,816 CU seconds

Total run cost at $0.18/CU hour = (28,816) / (60*60) CU-hours * ($0.18/CU hour) ~= $1.44

Case 2: using Dataflow Gen2 powered by Fast Copy

Configurations

To accomplish this scenario, you need to create a dataflow with the same steps from the previous test case. The only different step is to enable Fast Copy Feature as below:

Enable Fast Copy in Dataflow Gen2

Performance result and cost

Dataflow Gen2 refresh of using Dataflow Gen2 powered Fast Copy

The Dataflow Gen2 Refresh operation consumed almost about 4 minutes with 3,696 CU seconds on Dataflow Gen2 Refresh and 5,448 CU seconds on Data movement.

MetricCompute consumption
Dataflow Gen2 Refresh CU seconds3,696 CU seconds
Data movement CU seconds5,448 CU seconds

Total run cost at $0.18/CU hour = (3,696 + 5,448) / (60*60) CU-hours * ($0.18/CU hour) ~= $0.46

Summary

 FeaturePerformanceCostConclusion
Dataflow Gen2 without Fast CopyCopy Duration: 29:56$1.44 
Dataflow Gen2 powered by Fast CopyCopy Duration: 3:47$0.468x increase in performance
3x decrease in cost.

With Fast Copy in Dataflow Gen2, you will see significantly reduced data processing times and improved cost efficiency. From the example above, loading a 6 GB CSV file to a Lakehouse table in Microsoft Fabric results in an 8x increase in performance and a 3x decrease in cost.

More resources

Related blog posts

Boost performance and save costs with Fast Copy in Dataflows Gen2

June 21, 2024 by Marc Bushong

Developing ETLs/ELTs can be a complex process when you add in business logic, large amounts of data, and the high volume of table data that needs to be moved from source to target. This is especially true in analytical workloads involving relational data when there is a need to either fully reload a table or incrementally update a table. Traditionally this is easily completed in a flavor of SQL (or name your favorite relational database). But a question is, how can we execute a mature, dynamic, and scalable ETL/ELT utilizing T-SQL with Microsoft Fabric? The answer is with Fabric Pipelines and Data Warehouse.

June 18, 2024 by RK Iyer

✎ Co-Author – Abhishek Narain Overview Building an effective Lakehouse starts with establishing a robust ingestion layer. Ingestion refers to the process of collecting, importing, and processing raw data from various sources into the data lake. Data ingestion is fundamental to the success of a data lake as it enables the consolidation, exploration, and processing … Continue reading “Demystifying Data Ingestion in Fabric: Fundamental Components for Ingesting Data into a Fabric Lakehouse using Fabric Data Pipelines”