Dataflows Gen2 data destinations and managed settings
We are very excited to announce a lot of new improvements to data destinations in Dataflows Gen2. Here is an overview of the new improvements and how to get started.
After you have cleaned and prepared your data with dataflows gen 2, you want to land your data into a destination. This is possible with the data destination capabilities in dataflows gen 2. With this capability you can pick from different destinations, like Azure SQL, Fabric Lakehouse and many more. Dataflows gen2 will write your data to the destination, and from there you can use your data for further analysis and reporting.
Supported data destinations.
- Azure SQL databases
- Azure data explorer (Kusto)
- Fabric Lakehoue
- Fabric Warehouse
- Fabric KQL database
Entry points
Every data query in your Dataflow Gen2 can have a data destination. Functions and list are not supported. You can specify the data destination for every query individually and you can use multiple different destinations within the dataflow.
There are 3 main entry-points to specify the data destination.
- Through the top ribbon
- Through query settings
- Through the diagram view
Connect to the data destination
Connecting to the data destination is similar to connecting to a data source. Connections can be used for both reading and writing your data, given that you have the right permissions on the data source. You need to create a new connection or pick an existing connection and click next.
Create a new table or pick an existing table
When loading into your data destination, you have the possibility to either create a new table or pick an existing table.
- New table
When choosing to create a new table, during the dataflow gen 2 refresh, a new table will be created in your data destination. If the table gets deleted in the future by manually going into the destination, the dataflow will recreate the table on the next dataflow refresh.
By default, you table name will have the same name as your query name. If you have any invalid characters in your table name that are not supported by the destination, it will automatically be adjusted. For example, many destinations do not support spaces or special characters.
Next, you must select the destination container. If you chose any of the Fabric data destination, you could use the navigator to select the Fabric artifact you want to load your data into. For Azure destinations, you can either specify the database during connection creation, or select the database from the navigator experience.
- Existing table
To choose an existing table, use the toggle at the top of the navigator. When choosing an existing table, you need to pick both the Fabric artifact/database and table using the navigator.
When using an existing table, the table will not be recreated in any scenario. If you delete the table manually from the data destination, the dataflow gen 2 will not recreate the table on the next refresh.
Managed settings for new tables
When loading into a new table, by default the automatic settings are on. Using the automatic settings, dataflows gen 2 manages the mapping for you. This will allow you the following behavior:
- Update method replace: Data will be replaced at every dataflow refresh. Any data in the destination will be removed. The data in the destination will be replaced with the output data of the dataflow.
- Managed mapping: Mapping is managed for you. When you need to make changes to your data/query to add an additional column or change a data type, mapping is automatically adjusted for this when you republish your dataflow. You do not have to go into the data destination experience every time you make changes to your dataflow, allowing you for easy schema changes when you republish the dataflow.
- Drop and recreate table: To allow for these schema changes, on every dataflow refresh, the table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.
NOTE: currently this is only supported for Lakehouse and Azure SQL database as data destination.
Manual settings
By un-toggle the use automatic setting, you get full control over how to load your data into the data destination. You can make any changes to the column mapping by changing the source type or excluding any column that you do not need in your data destination.
- Update methods.
Most destinations support both Append and Replace as update methods. Fabric KQL databases and Azure data explorer do not support replace as update method.
Replace: On every dataflow refresh, your data will be dropped from the destination and replaced by the output data of the dataflow.
Append: On every dataflow refresh, the output data from the dataflow will be appended to the existing data in the data destination table.
- Schema options on publish
Schema options on publish only apply when the update method in replace. When appending data, changes to the schema are not possible.
Dynamic schema: When choosing dynamic schema, you allow for schema changes in the data destination when you republish the dataflow. Because you are not using managed mapping, you still need to update the column mapping in dataflow destination flow when you make any changes to your query. When the dataflow is refreshed, your table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.
Fixed schema: When choosing fixed schema, schema changes are not possible. When the dataflow gets refreshed, only the rows in the table will be dropped and replaced with the output data from the dataflow. Any relationships or measures on the table will stay intact. If you make any changes to your query in the dataflow, the dataflow publish will fail if it detects that the query schema does not match the data destination schema. Use this setting when you do not plan to change the schema and have relationships or measure added to you destination table.
NOTE: when loading data into the warehouse, only fixed schema is supported:
Supported data source types per destination
Supported data types per storage location: | DataflowStagingLakehouse | Azure DB (SQL) Output | Azure Data Explorer Output | Fabric Lakehouse (LH) Output | Fabric Warehouse (WH) Output |
Action | No | No | No | No | No |
Any | No | No | No | No | No |
Binary | No | No | No | No | No |
Currency | Yes | Yes | Yes | Yes | No |
DateTimeZone | Yes | Yes | Yes | No | No |
Duration | No | No | Yes | No | No |
Function | No | No | No | No | No |
None | No | No | No | No | No |
Null | No | No | No | No | No |
Time | Yes | Yes | No | Yes | Yes |
Type | No | No | No | No | No |
Structured (List, Record, Table) | No | No | No | No | No |
Advanced topics:
- Using staging before loading to a destination
To enhance performance of query processing, staging can be used within dataflows gen 2 to leverage Fabric compute to execute your queries.
When staging is enabled on your queries (this is the default behavior), you data will be loading into the staging location, which is an internal Lakehouse only accessible by dataflows itself.
Using staging locations can be either beneficial or the opposite when loading data into the data destination.
Loading data into the Lakehouse
When you are loading data into the Lakehouse, it is advised to disable staging on the query to avoid loading twice into a similar destination, once for staging and once for data destination. To improve the dataflow performance, disable staging for any query that has Lakehouse as the data destination.
To disable staging, right-click on the query and disable staging, by clicking on the enable staging button. Your query will turn grey.
Loading data into the Warehouse
When loading data into the Warehouse, staging is required before the write operation to the data destination. This is to improve performance. Currently, only loading into the same warehouse as the dataflow is supported. Ensure staging is enabled for all queries that load into the warehouse.
When staging is disabled, and you choose warehouse as output destination, you will get a warning to enable staging first before you can configure the data destination:
If you already have a warehouse as destination and try to disable staging, a warning will be shown where you can either remove warehouse as the destination or dismiss the staging action:
Nullable issues
When you have a nullable column, in some cases, it gets detected by Power Query as non-nullable and when writing to the data destination, the column type will be non-nullable. During refresh, the following error will occur:
E104100 Couldn’t refresh entity because of an issue with the mashup document MashupException.Error: DataFormat.Error: Error in replacing table’s content with new data in a version: #{0}., InnerException: We can’t insert null data into a non-nullable column., Underlying error: We can’t insert null data into a non-nullable column. Details: Reason = DataFormat.Error;Message = We can’t insert null data into a non-nullable column.; Message.Format = we can’t insert null data into a non-nullable column.
To force nullable columns you can try to do the following steps:
- Delete the table from the data destination
- Remove the data destination from the dataflow
- Go into the dataflow and update the data types by leveraging the following PQ code:
Table.TransformColumnTypes(#”PREVIOUS STEP”, {{“COLLUMNNAME1”, type nullable text}, {“COLLUMNNAME2”, type nullable Int64.Type}}) - Add the data destination