Microsoft Fabric Updates Blog

Read data from Delta Lake tables with the DeltaLake.Table M function

We’re happy to announce a new function in Power Query’s M language for reading data from Delta Lake tables: the DeltaLake.Table function. This function is now available in Power Query in Power BI Desktop and in Dataflows Gen1 and Gen2 and replaces the need to use community-developed solutions like this one by Gerhard Brueckl. Let’s see some examples of how to use it.

Connecting to Delta Lake tables

DeltaLake.Table can connect to any folder that represents a Delta Lake table (which means, for example, that the folder you connect to should contain a subfolder called _delta_log) in either ADLSgen2 storage, a Fabric Lakehouse or a local file system. If you have a folder on your local PC with the path “C:\DeltaCoffee” that contains a DeltaLake table like so:

You can read the data in this Delta Lake table using the following M code:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

The output in Power Query is a table with the contents of the Delta Lake table:

You can connect to Delta Lake tables in ADLSgen2 or a Fabric Lakehouse in a very similar way, using the AzureStorage.DataLake function to connect to the DFS endpoint of the folder containing the Delta Lake table. Here’s an example of how to connect to a folder in a Fabric Lakehouse using AzureStorage.DataLake (which works because OneLake supports access via existing ADLSgen2 APIs):

let
    Source = AzureStorage.DataLake("https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/DeltaCoffee/", [HierarchicalNavigation=true]),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

Note that you need to set the HierarchicalNavigation option to true in the AzureStorage.DataLake function for this to work.

Nested structures

The DeltaLake.Table function can handle nested structures in Delta Lake tables in a similar way to how Power Query handles nested structures in other data sources. In the example shown above the coffee_profile column contains a nested structure. Here’s what it looks like when viewed inside a Fabric Lakehouse:

As shown in the first screenshot above the coffee_profile column in the table returned by DeltaLake.Table in Power Query contains Record values, which can be expanded by clicking on the icon in the top right-hand corner of the column:

Here’s what the table looks like in Power Query after the coffee_profile column has been expanded:

By default DeltaLake.Table will convert up to 32 levels of nested structures in this way. The number of levels of nesting that is converted can be controlled using the MaxDepth option, for example:

DeltaLake.Table(Source, [MaxDepth=10])

If the number of levels of nesting exceeds the maximum depth then an error will be raised.

Use of statistics

Power Query will use statistics to prune partitions and compute aggregates by default. If you need to stop it using statistics, perhaps because you know they are incorrect, you can do so by setting the UseStatistics option to false like so:

DeltaLake.Table(Source, [UseStatistics=false])

Time travel

Different versions of the Delta Lake table can be accessed using the Value.Versions function on the table returned by DeltaLake.Table, for example:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta)
in
    GetVersions

Returns the following table, with one row for each version of the table and an extra row with a value of null in the Version column for the latest version:

The Data column contains values of type Value, which return the data in the table for each version. In the screenshot above the latest version is version 3; the row with the version number null, the latest version, returns the same data as the row with the version number 3.

If you access a specific version of the data you can get the version number by using the Value.VersionIdentity function. For example, the following M query gets the data for version number 2 of the Delta Lake table in the GetVersionNumberTwo step, which means that when Value.VersionIdentity is used in the final step it returns the value 2:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta),
    GetVersionNumberTwo = GetVersions{2}[Data],
    GetVersionIdentity = Value.VersionIdentity(GetVersionNumberTwo)
in
    GetVersionIdentity

Related blog posts

Read data from Delta Lake tables with the DeltaLake.Table M function

April 16, 2024 by Ruixin Xu

We are pleased to share a set of key updates regarding the Copilot in Microsoft Fabric experiences. The information in this blog post has also been shared with Fabric tenant administrators. Below are the highlights of the changes. This change is an important milestone to eventually allow Copilot to reach GA within this year. See … Continue reading “Copilot in MS Fabric: Soon available to more users in your organization”

April 15, 2024 by Santhosh Kumar Ravindran

Users orchestrate their data engineering or data science processes using notebooks and in most of the enterprise scenarios pipelines and job schedulers are used as a primary option to schedule and trigger these Spark jobs. We are thrilled to announce a new feature Job Queueing for Notebook Jobs in Microsoft Fabric. This feature aims to … Continue reading “Introducing Job Queueing for Notebook in Microsoft Fabric”