Microsoft Fabric Updates Blog

Simplifying Medallion Implementation with Materialized Lake Views in Fabric

We are excited to announce Materialized Lake views (MLV) in Microsoft Fabric. Coming soon in preview, MLV is a new feature that allows you to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations. In essence, an MLV is a persisted, continuously updated view of your data that simplifies how you implement multi-stage Lakehouse processing, commonly referred to as medallion architecture.

Why are Materialized Lake views needed?

Setting up a robust data pipeline for a Lakehouse involved writing custom Spark jobs or notebooks for each stage, scheduling and orchestrating them, implementing data validation, and monitoring each stage through the Monitor Hub. This is complex to setup and operate.

MLVs address the key problem faced by many organizations: the complexity of managing large-scale data preparation in processing of raw data in the bronze through silver layers of a Lakehouse and keeping derived data up to date for analytics in the gold layer. By using MLVs, data engineers can define multi-stage transformations in one place and trust the system to keep everything in sync, solving the pain of manual ETL jobs and custom scripts that make medallion setups hard to build and operate.

Materialized Lake views simplify medallion architecture

MLVs streamline data workflows and enable developers to focus on business logic, not on infrastructural or data quality-related issues. Before MLVs, building a multi-stage data pipeline in Fabric (or similar big data platforms) was a manual, time-consuming endeavor.

Users had to create multiple ETL jobs, ensure data quality through custom validation logic, and monitor each stage manually. With MLVs, developers can now achieve in a few SQL statements what used to require an entire assembly of jobs, orchestration scripts and observability tools. Even with these in place, developers used to struggle with questions like “Did my Silver table update today?” or “Why is yesterday’s aggregation incomplete?” and found it hard to troubleshoot “Is my gold layer refresh delayed due to high data volume or due to a bug in my code?”

How do Materialized Lake Views Work?

MLVs simplify these fundamental needs via declarative pipelines that automatically infer dependencies from extended SQL definitions, visualize them using lineage view, and schedule periodic refreshes. Each Materialized Lake view can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view.

1. Defining a Materialized Lake view (Declarative pipeline)

Through a declarative SQL statement – CREATE MATERIALIZED LAKE VIEW, developers can define transformations like an aggregation, a projection (selecting certain columns), or filters which are typical scenarios for transformation at each layer.

For example, if you have a table with raw data in the bronze layer (say airline fares, weather data and flight operations), you can create a materialized view silver.flight_revenue_delay. to represent impact of weather induced delays on flight revenues:

In this statement, silver.fight_revenue_delay becomes a materialized lake view that will compute the revenue impact of weather induced flight delays. This SQL syntax is declarative – you describe what you want (Revenue impact of delays), and Fabric figures out how to produce and maintain it. You can attach a schedule to the Lakehouse to periodically refresh them. Once created, Fabric computes the initial results and stores them. Under the covers, the MLV is stored in the Lakehouse making it available to any workload in Fabric just like a regular table. It can be viewed in the Lakehouse catalog along with the associated metadata (In our example, silver.fight_revenue_delay could be queried by Power BI using Direct Lake capabilities or Spark just as if it were a normal Delta Lake table containing the filtered data). Fabric MLVs keep the view up-to-date based on the pre-defined refresh cycle.

Schedule-Materialized Lake view

2. Lineage and management

The system intelligently schedules these refreshes; it knows the dependency between the source table and the MLV. If you have a lineage of materialized views (e.g., Bronze -> Silver -> Gold layers all defined as MLVs), Fabric understands these dependencies as well. It will ensure, for instance, that if the bronze table is updated, the bronze→silver view runs first, then the silver→gold view runs, so that each layer is consistent. This lineage of transformations is tracked internally. It also generates a visual representation of lineage in your Lakehouse making it intuitive to observe how your data flows from one stage to the next.

Each MLV refreshes only when its source has new data; if there’s no change, it can skip running entirely (saving time and resources). The goal is to process only the new or changed data instead of reprocessing everything each time. In fact, Fabric’s implementation leverages Delta Lake’s Change Data Feed under the hood, which means it can update just the portions of data that changed rather than recompute the whole view from scratch [1]. (This behavior is evolving – currently, if no changes are detected in the source, a run is skipped, and future enhancements will make updates even more granular.)

3. Built-in Data Quality Constraints

One of the key features of MLVs is the ability to incorporate data quality constraints directly into your pipeline definition. When creating a MLV, you can specify certain checks that the data must meet.

For example, you might require a data quality check that validates if Flight distance or Seat count is non-negative or to ensure that Airport codes do not contain nulls. If incoming data violates these rules, Fabric can handle it according to your configuration – it might exclude those records from the MLV, flag the MLV as having errors, or stop the update.

Data Quality Constraints

The key point is that data quality constraints are not an afterthought; they are part of the pipeline. This was built because many customers struggled to enforce quality in a separate step. Now, it’s declarative and integrated – you declare the rule, and the system applies it during each refresh. In our sales example, we could enforce that OrderId cannot be null (which if it ever were, indicates an error in source data). These constraints improve trust in the data by catching issues early.

Every time a MLV runs (refreshes), Fabric automatically logs the status and quality metrics – whether it succeeded, how long it took, how many rows were processed or dropped, etc.

You get built-in monitoring of your MLV run without setting up anything extra. In the Manage materialized lake view section, you can check the status of each MLV, see when it last ran, and view any error messages if something failed. For instance, if the view encountered bad data that violated a quality rule, you’d see an alert or error indicating which data quality constraint was broken. The automatic monitoring means your pipeline is not a black box – you have transparency into its operations. This single pane view is an immense improvement over monitoring individual jobs without a view of the big picture.

Furthermore, MLVs automatically generate a visual report that shows trends on data quality constraints to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting.

With these capabilities shipped out of the box, it is easy to combine MLVs with the Shortcut Transformation feature for CSV ingestion to build an end-to-end Medallion architecture.

Looking ahead

MLVs present a first and important step to making implementation of multi-stage data transformation from raw data to business insights, and will be available in preview in the coming weeks. We plan to add support for PySpark and Incremental refresh in the upcoming updates.

MLVs bring a powerful paradigm shift to Microsoft Fabric’s data platform. They empower you to set up complex data pipelines with just a few SQL statements and then handle the rest automatically. This means faster development cycles for data engineers, more trustworthy data for analysts, and quicker insights for the business.

Stay tuned for our preview announcement in the coming weeks. We invite you to try out MLVs in your Microsoft Fabric environment and share your feedback with us. Join the community discussion, report issues and suggestions, and contribute to documentation and samples.

Entradas de blog relacionadas

Simplifying Medallion Implementation with Materialized Lake Views in Fabric

junio 17, 2025 por Akshay Dixit

The Eventstreams artifact in the Microsoft Fabric Real-Time Intelligence experience lets you bring real-time events into Fabric, transform them, and then route them to various destinations such as Eventhouse, without writing any code (no-code). You can ingest data from an Eventstream to Eventhouse seamlessly either from Eventstream artifact or Eventhouse Get Data Wizard. This capability … Continue reading “Fabric Eventhouse now supports Eventstream Derived Streams in Direct Ingestion mode (Preview)”

junio 17, 2025 por Dan Liu

Have you ever found yourself frustrated by inconsistent item creation? Maybe you’ve struggled to select the right workspace or folder when creating a new item or ended up with a cluttered workspace due to accidental item creation. We hear you—and we’re excited to introduce the new item creation experience in Fabric! This update is designed … Continue reading “Introducing new item creation experience in Fabric”