Microsoft Fabric Updates Blog

Semantic Link: Data validation using Great Expectations

Great Expectations Open Source(GX OSS) is a popular Python library that provides a framework for describing and validating the acceptable state of data. It helps data engineers and data scientists ensure that their data meets specific quality standards before using it for analysis, machine learning, or other data-driven tasks. With the recent integration of Microsoft Fabric semantic link, GX can now access semantic models, further enabling seamless collaboration between data scientists and business analysts.

Semantic link is a feature in Microsoft Fabric that establishes a connection between semantic models (aka Power BI datasets) and Synapse Data Science. It facilitates data connectivity, enables the propagation of semantic information, and seamlessly integrates with established tools used by data scientists, such as notebooks. Semantic link helps preserve domain knowledge about data semantics in a standardized way that can speed up data analysis and reduce errors.

Ensuring data quality in the semantic model, also known as the diamond layer, is crucial for organizations to make informed decisions based on accurate and reliable data. Data scientists play a vital role in this process by validating, cleaning, and transforming raw data into meaningful insights. With the integration of Microsoft Fabric semantic link and Great Expectations, data scientists can now leverage the power of both platforms to ensure the highest quality of data assets in the semantic model.

Here’s why ensuring data quality in the semantic model is important:

1. Trustworthy insights: high-quality data assets in the semantic model lead to more accurate and reliable insights, enabling organizations to make better-informed decisions. Data scientists can use GX to define and validate data quality standards, ensuring that the data used in the semantic model is consistent, complete, and accurate.

2. Improved collaboration: the integration of semantic link and GX allows data scientists and business analysts to work together seamlessly, sharing a common understanding of data quality standards. This collaboration ensures that both parties can efficiently and effectively use the data in the semantic model, maximizing the potential of their data-driven insights.

3. Reduced errors: by validating data quality in the semantic model, data scientists can identify and address potential issues before they impact downstream processes, such as reporting and analytics. This proactive approach helps reduce errors and minimize the risk of making decisions based on inaccurate or incomplete data.

In this blog post, we will explore the core concepts of GX, including Data Sources, Assets, Expectations, and Checkpoints. We will also discuss the new integration with Microsoft Fabric Semantic Link, which allows you to access semantic models and leverage the vast library of Expectations provided by GX and its community.

Core GX Concepts: Data Sources and Assets

GX revolves around four core components:

  1. Data Sources: Connect to your data, regardless of its format or location, and organize it for future use.
  2. Data Assets: Collections of records within a Data Source that can be further partitioned into Batches.
  3. Expectations: Verifiable assertions about your data that describe the standards it should conform to.
  4. Checkpoints: Validate a set of Expectations against a specific set of data.

GX provides a vast library of Expectations, which are classes that implement specific validations. These Expectations can be used to ensure that your data meets the required quality standards.

New Integration: Accessing Semantic Models with Semantic Link

The new integration allows GX to access Power BI datasets in Microsoft Fabric using Semantic Link. This is achieved through the addition of new methods in the GX API, such as add_fabric_powerbi, add_powerbi_table_asset, add_powerbi_measure_asset, and add_powerbi_dax_asset.

In the example below, we first create a GX Data Context and add a GX Data Source for a semantic model. We then add a GX Asset for a Power BI table, a GX Asset for Power BI measures, and a GX Asset for Power BI DAX queries.

A screenshot of a computer

Description automatically generated

We are now ready to define our Expectations, which verify our assertion about the data:

A screenshot of a computer

Description automatically generated

Expectations are reusable across Data Assets; thus, we need to specify which Expectations we want to apply to which Asset:

A screenshot of a computer

Description automatically generated

Finally, we run the Checkpoint and inspect the results or use them for further automation steps.

You can find the tutorial notebook with additional examples in our GitHub samples repository.


The new integration between Great Expectations and Microsoft Fabric unlocks the potential of semantic models for data validation and quality assurance. By enabling seamless access to Power BI data in the familiar GX environment, data scientists and business analysts can collaborate more effectively, ensuring that their data-driven insights are based on high-quality, reliable data.

Start leveraging the power of semantic models in Great Expectations today and unlock the full potential of your data-driven insights.

Related blog posts

Semantic Link: Data validation using Great Expectations

June 18, 2024 by RK Iyer

✎ Co-Author – Abhishek Narain Overview Building an effective Lakehouse starts with establishing a robust ingestion layer. Ingestion refers to the process of collecting, importing, and processing raw data from various sources into the data lake. Data ingestion is fundamental to the success of a data lake as it enables the consolidation, exploration, and processing … Continue reading “Demystifying Data Ingestion in Fabric: Fundamental Components for Ingesting Data into a Fabric Lakehouse using Fabric Data Pipelines”

June 12, 2024 by Estera Kot

The Native Execution Engine showcases our dedication to innovation and performance, transforming data processing in Microsoft Fabric. We are excited to announce that the Native Execution Engine for Fabric Runtime 1.2 is now available in public preview. The Native Execution Engine leverages technologies such as a columnar format and vectorized processing to boost query execution … Continue reading “Public Preview of Native Execution Engine for Apache Spark on Fabric Data Engineering and Data Science”