Microsoft Fabric Updates Blog

Dataflows Gen2 data destinations and managed settings

We are very excited to announce a lot of new improvements to data destinations in Dataflows Gen2. Here is an overview of the new improvements and how to get started.

After you have cleaned and prepared your data with dataflows gen 2, you want to land your data into a destination. This is possible with the data destination capabilities in dataflows gen 2. With this capability you can pick from different destinations, like Azure SQL, Fabric Lakehouse and many more. Dataflows gen2 will write your data to the destination, and from there you can use your data for further analysis and reporting.

Supported data destinations.

  • Azure SQL databases
  • Azure data explorer (Kusto)
  • Fabric Lakehoue
  • Fabric Warehouse
  • Fabric KQL database

Entry points

Every data query in your Dataflow Gen2 can have a data destination. Functions and list are not supported. You can specify the data destination for every query individually and you can use multiple different destinations within the dataflow.

There are 3 main entry-points to specify the data destination.

  1. Through the top ribbon
Add data destination entry point inside of the Power Query ribbon
Select data destination from ribbon.
  • Through query settings
Add Data destination entry point inside of the applied steps section
Select data destination from query setting pane.
  • Through the diagram view
Add data destination inside of the query node of the diagram view
Select data destination from diagram view.

Connect to the data destination

Connecting to the data destination is similar to connecting to a data source. Connections can be used for both reading and writing your data, given that you have the right permissions on the data source. You need to create a new connection or pick an existing connection and click next.

Connect to data destination dialog when connecting to a lakehouse
Make a connection to the Fabric Lakehouse.

Create a new table or pick an existing table

When loading into your data destination, you have the possibility to either create a new table or pick an existing table.

  • New table

When choosing to create a new table, during the dataflow gen 2 refresh, a new table will be created in your data destination. If the table gets deleted in the future by manually going into the destination, the dataflow will recreate the table on the next dataflow refresh.

       By default, you table name will have the same name as your query name. If you have any invalid characters in your table name that are not supported by the destination, it will automatically be adjusted. For example, many destinations do not support spaces or special characters.

Choose destination target for the data destination feature when using a Lakehouse as a destination
Create a new table in the data destination.

       Next, you must select the destination container. If you chose any of the Fabric data destination, you could use the navigator to select the Fabric artifact you want to load your data into. For Azure destinations, you can either specify the database during connection creation, or select the database from the navigator experience.

  • Existing table

To choose an existing table, use the toggle at the top of the navigator. When choosing an existing table, you need to pick both the Fabric artifact/database and table using the navigator.

When using an existing table, the table will not be recreated in any scenario. If you delete the table manually from the data destination, the dataflow gen 2 will not recreate the table on the next refresh.

previewing a table when using the choose destination target table
Select an existing table from the data destination.

Managed settings for new tables

When loading into a new table, by default the automatic settings are on. Using the automatic settings, dataflows gen 2 manages the mapping for you. This will allow you the following behavior:

  • Update method replace: Data will be replaced at every dataflow refresh. Any data in the destination will be removed. The data in the destination will be replaced with the output data of the dataflow.
  • Managed mapping: Mapping is managed for you. When you need to make changes to your data/query to add an additional column or change a data type, mapping is automatically adjusted for this when you republish your dataflow. You do not have to go into the data destination experience every time you make changes to your dataflow, allowing you for easy schema changes when you republish the dataflow.
  • Drop and recreate table: To allow for these schema changes, on every dataflow refresh, the table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

NOTE: currently this is only supported for Lakehouse and Azure SQL database as data destination.

choose destination settings dialog and the automatic settings being enabled
Use automatic setting for data destination.

Manual settings

By un-toggle the use automatic setting, you get full control over how to load your data into the data destination. You can make any changes to the column mapping by changing the source type or excluding any column that you do not need in your data destination.

choose destination settings dialog when manually selecting the updated method and scehma options on publish
Use manual settings to control the data destination.
  • Update methods.

Most destinations support both Append and Replace as update methods. Fabric KQL databases and Azure data explorer do not support replace as update method.

Replace: On every dataflow refresh, your data will be dropped from the destination and replaced by the output data of the dataflow.

Append: On every dataflow refresh, the output data from the dataflow will be appended to the existing data in the data destination table.

  • Schema options on publish
    Schema options on publish only apply when the update method in replace. When appending data, changes to the schema are not possible.

Dynamic schema: When choosing dynamic schema, you allow for schema changes in the data destination when you republish the dataflow. Because you are not using managed mapping, you still need to update the column mapping in dataflow destination flow when you make any changes to your query. When the dataflow is refreshed, your table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

Fixed schema: When choosing fixed schema, schema changes are not possible. When the dataflow gets refreshed, only the rows in the table will be dropped and replaced with the output data from the dataflow. Any relationships or measures on the table will stay intact. If you make any changes to your query in the dataflow, the dataflow publish will fail if it detects that the query schema does not match the data destination schema. Use this setting when you do not plan to change the schema and have relationships or measure added to you destination table.

NOTE: when loading data into the warehouse, only fixed schema is supported:

schema options on publish section that define what can be accomplished during a change of the schema
Dynamic schema is not supported for Fabric Warehouse.

Supported data source types per destination

Supported data types per storage location:DataflowStagingLakehouseAzure DB (SQL) OutputAzure Data Explorer OutputFabric Lakehouse (LH) OutputFabric Warehouse (WH) Output
ActionNoNoNoNoNo
AnyNoNoNoNoNo
BinaryNoNoNoNoNo
CurrencyYesYesYesYesNo
DateTimeZoneYesYesYesNoNo
DurationNoNoYesNoNo
FunctionNoNoNoNoNo
NoneNoNoNoNoNo
NullNoNoNoNoNo
TimeYesYesNoYesYes
TypeNoNoNoNoNo
Structured (List, Record, Table)NoNoNoNoNo
List of supported data source types by destination types.

Advanced topics:

  • Using staging before loading to a destination

To enhance performance of query processing, staging can be used within dataflows gen 2 to leverage Fabric compute to execute your queries.

When staging is enabled on your queries (this is the default behavior), you data will be loading into the staging location, which is an internal Lakehouse only accessible by dataflows itself.

Using staging locations can be either beneficial or the opposite when loading data into the data destination.

       Loading data into the Lakehouse

       When you are loading data into the Lakehouse, it is advised to disable staging on the query to avoid loading twice into a similar destination, once for staging and once for data destination. To improve the dataflow performance, disable staging for any query that has Lakehouse as the data destination.

To disable staging, right-click on the query and disable staging, by clicking on the enable staging button. Your query will turn grey.

Enable staging flag enabled for the query named Query inside of the Diagram view
Enable staging through the context menu.

       Loading data into the Warehouse

       When loading data into the Warehouse, staging is required before the write operation to the data destination. This is to improve performance. Currently, only loading into the same warehouse as the dataflow is supported. Ensure staging is enabled for all queries that load into the warehouse.

When staging is disabled, and you choose warehouse as output destination, you will get a warning to enable staging first before you can configure the data destination:

Add data destination pop-up dialog to confirm that you want to enable the staging for the selected query
Warning to enable staging bore adding Fabric warehouse as a data destination.

If you already have a warehouse as destination and try to disable staging, a warning will be shown where you can either remove warehouse as the destination or dismiss the staging action:

Warning to remove warehouse as a destination before disabeling staging.
Warning to remove warehouse as a destination before disabling staging.

Nullable issues

When you have a nullable column, in some cases, it gets detected by Power Query as non-nullable and when writing to the data destination, the column type will be non-nullable. During refresh, the following error will occur:

E104100 Couldn’t refresh entity because of an issue with the mashup document MashupException.Error: DataFormat.Error: Error in replacing table’s content with new data in a version: #{0}., InnerException: We can’t insert null data into a non-nullable column., Underlying error: We can’t insert null data into a non-nullable column. Details: Reason = DataFormat.Error;Message = We can’t insert null data into a non-nullable column.; Message.Format = we can’t insert null data into a non-nullable column.

To force nullable columns you can try to do the following steps: 

  1. Delete the table from the data destination
  2. Remove the data destination from the dataflow
  3. Go into the dataflow and update the data types by leveraging the following PQ code:

    Table.TransformColumnTypes(#”PREVIOUS STEP”, {{“COLLUMNNAME1”, type nullable text}, {“COLLUMNNAME2”, type nullable Int64.Type}})
  4. Add the data destination

منشورات المدونات ذات الصلة

Dataflows Gen2 data destinations and managed settings

أكتوبر 29, 2024 بواسطة Leo Li

We’re excited to announce several powerful updates to the Virtual Network (VNET) Data Gateway, designed to further enhance performance and improve the overall user experience. These new features allow users to better manage increasing workloads, perform complex data transformations, and simplify log management. Expanded Cluster Size from 5 to 7 One of the key improvements … Continue reading “New Features and Enhancements for Virtual Network Data Gateway”

أكتوبر 28, 2024 بواسطة Estera Kot

We’re thrilled to announce that the Native Execution Engine is now available at no additional cost, unlocking next-level performance and efficiency for your workloads. What’s New?  The Native Execution Engine now supports Fabric Runtime 1.3, which includes Apache Spark 3.5 and Delta Lake 3.2. This upgrade enhances Microsoft Fabric’s Data Engineering and Data Science workflows, … Continue reading “Native Execution Engine available at no additional cost!”