Microsoft Fabric Updates Blog

Read data from Delta Lake tables with the DeltaLake.Table M function

We’re happy to announce a new function in Power Query’s M language for reading data from Delta Lake tables: the DeltaLake.Table function. This function is now available in Power Query in Power BI Desktop and in Dataflows Gen1 and Gen2 and replaces the need to use community-developed solutions like this one by Gerhard Brueckl. Let’s see some examples of how to use it.

Connecting to Delta Lake tables

DeltaLake.Table can connect to any folder that represents a Delta Lake table (which means, for example, that the folder you connect to should contain a subfolder called _delta_log) in either ADLSgen2 storage, a Fabric Lakehouse or a local file system. If you have a folder on your local PC with the path “C:\DeltaCoffee” that contains a DeltaLake table like so:

You can read the data in this Delta Lake table using the following M code:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

The output in Power Query is a table with the contents of the Delta Lake table:

You can connect to Delta Lake tables in ADLSgen2 or a Fabric Lakehouse in a very similar way, using the AzureStorage.DataLake function to connect to the DFS endpoint of the folder containing the Delta Lake table. Here’s an example of how to connect to a folder in a Fabric Lakehouse using AzureStorage.DataLake (which works because OneLake supports access via existing ADLSgen2 APIs):

let
    Source = AzureStorage.DataLake("https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/DeltaCoffee/", [HierarchicalNavigation=true]),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

Note that you need to set the HierarchicalNavigation option to true in the AzureStorage.DataLake function for this to work.

Nested structures

The DeltaLake.Table function can handle nested structures in Delta Lake tables in a similar way to how Power Query handles nested structures in other data sources. In the example shown above the coffee_profile column contains a nested structure. Here’s what it looks like when viewed inside a Fabric Lakehouse:

As shown in the first screenshot above the coffee_profile column in the table returned by DeltaLake.Table in Power Query contains Record values, which can be expanded by clicking on the icon in the top right-hand corner of the column:

Here’s what the table looks like in Power Query after the coffee_profile column has been expanded:

By default DeltaLake.Table will convert up to 32 levels of nested structures in this way. The number of levels of nesting that is converted can be controlled using the MaxDepth option, for example:

DeltaLake.Table(Source, [MaxDepth=10])

If the number of levels of nesting exceeds the maximum depth then an error will be raised.

Use of statistics

Power Query will use statistics to prune partitions and compute aggregates by default. If you need to stop it using statistics, perhaps because you know they are incorrect, you can do so by setting the UseStatistics option to false like so:

DeltaLake.Table(Source, [UseStatistics=false])

Time travel

Different versions of the Delta Lake table can be accessed using the Value.Versions function on the table returned by DeltaLake.Table, for example:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta)
in
    GetVersions

Returns the following table, with one row for each version of the table and an extra row with a value of null in the Version column for the latest version:

The Data column contains values of type Value, which return the data in the table for each version. In the screenshot above the latest version is version 3; the row with the version number null, the latest version, returns the same data as the row with the version number 3.

If you access a specific version of the data you can get the version number by using the Value.VersionIdentity function. For example, the following M query gets the data for version number 2 of the Delta Lake table in the GetVersionNumberTwo step, which means that when Value.VersionIdentity is used in the final step it returns the value 2:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta),
    GetVersionNumberTwo = GetVersions{2}[Data],
    GetVersionIdentity = Value.VersionIdentity(GetVersionNumberTwo)
in
    GetVersionIdentity

Related blog posts

Read data from Delta Lake tables with the DeltaLake.Table M function

September 11, 2024 by Jianlei Shen

We are excited to announce the release of several powerful new connectors in Fabric Data Factory in Q3 2024! These connectors will significantly enhance your data integration and transformation capabilities, enabling seamless workflows and expanding your data connectivity options. Here’s a detailed overview of the newly released connectors and a sneak peek at what’s coming … Continue reading “Announcing New Data Factory Connectors Released in Q3 2024”

September 10, 2024 by Jianlei Shen

Fabric Lakehouse supports the creation of custom schemas. Schemas allow users to group tables together for better data discovery, access control, and more. This is now a Preview feature in Fabric. Learn more here. We are excited to announce the latest enhancement in Fabric Data Factory that Lakehouse connector in data pipeline now supports schema. … Continue reading “Introducing the New Feature in Lakehouse Connector in Fabric Data Factory: Schema Support for Reading and Writing Data”