Microsoft Fabric Updates Blog

Fast copy in Dataflows Gen2

Dataflows help with ingesting and transforming data. With the introduction of dataflow scale-out with the SQL DW compute, we are able to transform your data at scale. However, to do this at scale, your data needs to be ingested first.

With the introduction of Fast copy, you can ingest terabytes of data with the easy experience of dataflows, but with the scalable backend of Pipeline’s Copy activity.

After enabling this capability. Dataflows will automatically switch the backend when data size exceeds 100 MB without needing to change anything during authoring of the dataflows.

After the refresh of the dataflow, you can easily check in the Refresh History experience if Fast copy was used during your run by looking at the entity status in the Refresh History.

With “Force Fast copy”, the user can start a ‘debugging’ session. When “Force Fast copy” is not used on the specific query, the dataflow refresh will be cancelled, and you do not have to wait until the refresh times out.

Using the Fast copy indicators in the Query Settings’ Steps pane, you can easily check if your query can run with Fast copy.

Data sources currently supported

Fast copy is currently only supported for the following data source connectors:

  • Azure Data Lake Storage Gen2
  • Azure Blob Storage
  • Azure SQL Database
  • Lakehouse
  • PostgreSQL

Note: On-premises data gateway and VNET data gateway are not supported yet. For Blob and Azure Data Lake Storage Gen2, only parquet and csv files are supported.

The Copy activity only supports a few transformations when connecting to a file source:

Additional transformations can be applied by splitting the ingestion and transformation steps into separate queries so that DW compute can be leveraged after your data has been ingested into OneLake.

For Azure SQL database and PostgeSQL as a source, any transformation that can fold into a native query is supported.

When directly loading the query to an output destination, the following is supported:

  • Lakehouse

If you want to use another output destination, you can stage the query first and reference the query.

Supported data types per storage location: DataflowStagingLakehouseFabric Lakehouse (LH) Output
ActionNN
AnyNN
BinaryNN
DateTimeZoneYN
DurationNN
FunctionNN
NoneNN
NullNN
TimeYY
TypeNN
Structured (List, Record, Table)NN

Prerequisites

  • Fabric capacity
  • Only .csv and .parquet files are supported.
  • 1 M rows if using Azure SQL db

How to use fast copy

Navigate to a premium workspace and create a Dataflow Gen2 using the appropriate Fabric endpoint.

Inside the Power Query editor, select the Options button and turn on Fast copy in the Scale tab.

select options from the Power Query ribbon
Enable the 'allow use of fast copy connectors' setting

Go to Get Data and select Azure Data Lake Storage Gen2 as a source and fill in the details for your container. Then use the Combine files functionality.

Use the combine files option

To ensure Fast copy can be leveraged, only apply supported transformations listed at the beginning of this article. If you want to apply other transformations, stage the data first and reference the query. Make any additional transformations on the dependent query.

Optionally, you can set the “Force fast copy” option on the query you want to test. To do so, right-click on the query and select “Require fast copy”.

Select the 'require fast copy' capability

Optionally, set Lakehouse as output destination. For any other destination, stage and reference your query first.

You can use the query folding indicators and see if the Fast copy indicators are in place so you can run your query with Fast Copy.

Check the fast copy indicators to see if fast copy can be used

Once ready, you can publish the dataflow. Once the refresh has completed, you can check if Fast copy was used in the Refresh history experience.

In the refresh history you can see if fast copy was used.

منشورات المدونات ذات الصلة

Fast copy in Dataflows Gen2

أكتوبر 29, 2024 بواسطة Leo Li

We’re excited to announce several powerful updates to the Virtual Network (VNET) Data Gateway, designed to further enhance performance and improve the overall user experience. These new features allow users to better manage increasing workloads, perform complex data transformations, and simplify log management. Expanded Cluster Size from 5 to 7 One of the key improvements … Continue reading “New Features and Enhancements for Virtual Network Data Gateway”

أكتوبر 28, 2024 بواسطة Estera Kot

We’re thrilled to announce that the Native Execution Engine is now available at no additional cost, unlocking next-level performance and efficiency for your workloads. What’s New?  The Native Execution Engine now supports Fabric Runtime 1.3, which includes Apache Spark 3.5 and Delta Lake 3.2. This upgrade enhances Microsoft Fabric’s Data Engineering and Data Science workflows, … Continue reading “Native Execution Engine available at no additional cost!”