Microsoft Fabric Updates Blog

Organizing your tables with lakehouse schemas and more (Public Preview)

We’re thrilled to introduce a new feature in Fabric: Lakehouse schemas. This feature lets users arrange their lakehouse tables into a folder-like structure, enhancing data discovery and more. Many users will be familiar with schemas in Fabric using Data Warehouse, and we are bringing aligned capabilities into Lakehouse.

Schemas created in your lakehouses will also appear in SQL Analytics Endpoint, Semantic models, shortcuts, and elsewhere lakehouse data is referenced. Your data remains consistently organized across different engines.

With lakehouse schemas you can:

  • Organize your tables in a folder-like structure.
  • Reference your tables in Spark code using namespace ‘workspace.lakehouse.schema.table’.
  • Reference multiple tables with schema shortcut.

How to get started?

Lakehouse schemas are now available in Public Preview. Your current lakehouses without enabled schemas will operate as usual. You can activate schema support while setting up a new lakehouse by selecting “Lakehouse schemas (Public Preview)” adjacent to the lakehouse name field. Upon creation, a default schema, “dbo”, will appear within the “Tables” section and cannot be renamed or removed.

To create a new schema, click “Tables” and select “New schema”. After entering a schema name, you’ll see it immediately created and listed under “Tables” in alphabetical order.

Organizing your tables with schemas

After creating your schemas, you can populate them with tables. When using Notebook code, precede the table name with the schema, such as “marketing.promotions”. In the Pipelines Copy tool, you can select a schema while importing data. However, note that Dataflows only import data into the default “dbo” schema and don’t allow schema selection.

Another quick way to organize your tables is using Lakehouse Explorer. Simply drag a table name from your source schema to the target schema name, and the table will be instantly moved to it. Make sure to update all your references to the moved table, as its path has changed.

Referencing tables in Notebooks

As previously noted, referencing a table now requires including the schema name. With our latest update, you gain a more expansive feature: the ability to reference tables from outside your current workspace, such as “myworkspace.mylakehouse.schema.table.” This enables tasks like joining Spark SQL tables located in separate workspaces.

Take this scenario: Your customer data resides in the “Sales” lakehouse within the “Corporate” workspace, while employee information is housed in the “HRM” lakehouse within the “Internal” workspace. If you need to identify which employees are also customers, you can execute a query that joins the “Employees” and “Customers” tables to obtain your answer.

Remember, if the schema name isn’t specified, the system will default to the “dbo” schema. The same default setting applies to the lakehouse and workspace names.

Referencing all your data lake tables with five clicks

Yes, this is correct; it’s just five clicks and not just a marketing trick. You’ll get that using the Schema shortcut. It enables you to reference a folder that contains all your tables in a data lake, which could be stored in ADLS Gen2, AWS S3, or other sources supported by shortcuts, and instantly, you’ll see all tables available in your lakehouse without copying the data. Click on “Tables”, select “Schema shortcut”, select your target location, pick the folder that contains all tables, and click “Finish”.

What is coming up next?

As previously stated, this feature launched in its Public Preview phase. Our team is actively working to address limitations before General Availability.

In our upcoming updates, we plan to introduce a tool to enable the transition of existing lakehouses without schema support to ones that do. We are also committed to implementing further features to enhance data security in schemas.

Moreover, we are concentrating on improving metadata utilization and specification features. This will enable users to input their metadata within the lakehouse architecture, thus facilitating more sophisticated data discovery and automation and leveraging AI technologies.

More information

You can read more about lakehouse schemas on our documentation page, Lakehouse schemas (Preview)—Microsoft Fabric | Microsoft Learn.

We also encourage you to submit ideas about schemas or lakehouses in general in the Microsoft Fabric Ideas Portal.

If you want to express joy and happiness with the feature or provide a critical view, please tag us in your social posts using #fabriclakehouse.

İlgili blog gönderileri

Organizing your tables with lakehouse schemas and more (Public Preview)

Eylül 26, 2024 Yazar: Ye Xu

Fast Copy in Dataflow Gen2 is now General Available! This powerful feature enables rapid and efficient ingestion of large data volumes, leveraging the same robust backend as the Copy Activity in Data pipelines. With Fast Copy, you can experience significantly shorter data processing times and improved cost efficiency for your Dataflow Gen2. Additionally, it boosts … Continue reading “Announcing the General Availability of Fast Copy in Dataflows Gen2”

Eylül 26, 2024 Yazar: Guy Reginiano

Now you can set up Data Activator alerts directly on your KQL queries in KQL querysets.