Microsoft Fabric Updates Blog

Read data from Delta Lake tables with the DeltaLake.Table M function

We’re happy to announce a new function in Power Query’s M language for reading data from Delta Lake tables: the DeltaLake.Table function. This function is now available in Power Query in Power BI Desktop and in Dataflows Gen1 and Gen2 and replaces the need to use community-developed solutions like this one by Gerhard Brueckl. Let’s see some examples of how to use it.

Connecting to Delta Lake tables

DeltaLake.Table can connect to any folder that represents a Delta Lake table (which means, for example, that the folder you connect to should contain a subfolder called _delta_log) in either ADLSgen2 storage, a Fabric Lakehouse or a local file system. If you have a folder on your local PC with the path “C:\DeltaCoffee” that contains a DeltaLake table like so:

You can read the data in this Delta Lake table using the following M code:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

The output in Power Query is a table with the contents of the Delta Lake table:

You can connect to Delta Lake tables in ADLSgen2 or a Fabric Lakehouse in a very similar way, using the AzureStorage.DataLake function to connect to the DFS endpoint of the folder containing the Delta Lake table. Here’s an example of how to connect to a folder in a Fabric Lakehouse using AzureStorage.DataLake (which works because OneLake supports access via existing ADLSgen2 APIs):

let
    Source = AzureStorage.DataLake("https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/DeltaCoffee/", [HierarchicalNavigation=true]),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

Note that you need to set the HierarchicalNavigation option to true in the AzureStorage.DataLake function for this to work.

Nested structures

The DeltaLake.Table function can handle nested structures in Delta Lake tables in a similar way to how Power Query handles nested structures in other data sources. In the example shown above the coffee_profile column contains a nested structure. Here’s what it looks like when viewed inside a Fabric Lakehouse:

As shown in the first screenshot above the coffee_profile column in the table returned by DeltaLake.Table in Power Query contains Record values, which can be expanded by clicking on the icon in the top right-hand corner of the column:

Here’s what the table looks like in Power Query after the coffee_profile column has been expanded:

By default DeltaLake.Table will convert up to 32 levels of nested structures in this way. The number of levels of nesting that is converted can be controlled using the MaxDepth option, for example:

DeltaLake.Table(Source, [MaxDepth=10])

If the number of levels of nesting exceeds the maximum depth then an error will be raised.

Use of statistics

Power Query will use statistics to prune partitions and compute aggregates by default. If you need to stop it using statistics, perhaps because you know they are incorrect, you can do so by setting the UseStatistics option to false like so:

DeltaLake.Table(Source, [UseStatistics=false])

Time travel

Different versions of the Delta Lake table can be accessed using the Value.Versions function on the table returned by DeltaLake.Table, for example:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta)
in
    GetVersions

Returns the following table, with one row for each version of the table and an extra row with a value of null in the Version column for the latest version:

The Data column contains values of type Value, which return the data in the table for each version. In the screenshot above the latest version is version 3; the row with the version number null, the latest version, returns the same data as the row with the version number 3.

If you access a specific version of the data you can get the version number by using the Value.VersionIdentity function. For example, the following M query gets the data for version number 2 of the Delta Lake table in the GetVersionNumberTwo step, which means that when Value.VersionIdentity is used in the final step it returns the value 2:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta),
    GetVersionNumberTwo = GetVersions{2}[Data],
    GetVersionIdentity = Value.VersionIdentity(GetVersionNumberTwo)
in
    GetVersionIdentity

Related blog posts

Read data from Delta Lake tables with the DeltaLake.Table M function

May 16, 2024 by Jianlei Shen

To improve the flexibility for copying data in Fabric Data Factory, we are excited to announce that now you can edit destination table column types when copying data!  Supported scenarios This new feature allows you to edit the data type of the column for a new or auto-created destination table, if your data destination is … Continue reading “Edit the Destination Table Column Type when Copying Data to Lakehouse Table, Data Warehouse and SQL Data Stores “

May 16, 2024 by Dan Liu

Leverage the power of task flows to design and build your data solutions and manage workspace items in Microsoft Fabric. We’re thrilled to announce that the task flows feature is now in public preview and is enabled for all existing Microsoft Fabric users. Fabric is unifying everything needed to deliver end-to-end data and analytics solutions … Continue reading “Announcing the public preview of task flows in Microsoft Fabric”