Microsoft Fabric Updates Blog

Semantic Link: Data validation using Great Expectations

Great Expectations Open Source(GX OSS) is a popular Python library that provides a framework for describing and validating the acceptable state of data. It helps data engineers and data scientists ensure that their data meets specific quality standards before using it for analysis, machine learning, or other data-driven tasks. With the recent integration of Microsoft Fabric semantic link, GX can now access semantic models, further enabling seamless collaboration between data scientists and business analysts.

Semantic link is a feature in Microsoft Fabric that establishes a connection between semantic models (aka Power BI datasets) and Synapse Data Science. It facilitates data connectivity, enables the propagation of semantic information, and seamlessly integrates with established tools used by data scientists, such as notebooks. Semantic link helps preserve domain knowledge about data semantics in a standardized way that can speed up data analysis and reduce errors.

Ensuring data quality in the semantic model, also known as the diamond layer, is crucial for organizations to make informed decisions based on accurate and reliable data. Data scientists play a vital role in this process by validating, cleaning, and transforming raw data into meaningful insights. With the integration of Microsoft Fabric semantic link and Great Expectations, data scientists can now leverage the power of both platforms to ensure the highest quality of data assets in the semantic model.

Here’s why ensuring data quality in the semantic model is important:

1. Trustworthy insights: high-quality data assets in the semantic model lead to more accurate and reliable insights, enabling organizations to make better-informed decisions. Data scientists can use GX to define and validate data quality standards, ensuring that the data used in the semantic model is consistent, complete, and accurate.

2. Improved collaboration: the integration of semantic link and GX allows data scientists and business analysts to work together seamlessly, sharing a common understanding of data quality standards. This collaboration ensures that both parties can efficiently and effectively use the data in the semantic model, maximizing the potential of their data-driven insights.

3. Reduced errors: by validating data quality in the semantic model, data scientists can identify and address potential issues before they impact downstream processes, such as reporting and analytics. This proactive approach helps reduce errors and minimize the risk of making decisions based on inaccurate or incomplete data.

In this blog post, we will explore the core concepts of GX, including Data Sources, Assets, Expectations, and Checkpoints. We will also discuss the new integration with Microsoft Fabric Semantic Link, which allows you to access semantic models and leverage the vast library of Expectations provided by GX and its community.

Core GX Concepts: Data Sources and Assets

GX revolves around four core components:

  1. Data Sources: Connect to your data, regardless of its format or location, and organize it for future use.
  2. Data Assets: Collections of records within a Data Source that can be further partitioned into Batches.
  3. Expectations: Verifiable assertions about your data that describe the standards it should conform to.
  4. Checkpoints: Validate a set of Expectations against a specific set of data.

GX provides a vast library of Expectations, which are classes that implement specific validations. These Expectations can be used to ensure that your data meets the required quality standards.

New Integration: Accessing Semantic Models with Semantic Link

The new integration allows GX to access Power BI datasets in Microsoft Fabric using Semantic Link. This is achieved through the addition of new methods in the GX API, such as add_fabric_powerbi, add_powerbi_table_asset, add_powerbi_measure_asset, and add_powerbi_dax_asset.

In the example below, we first create a GX Data Context and add a GX Data Source for a semantic model. We then add a GX Asset for a Power BI table, a GX Asset for Power BI measures, and a GX Asset for Power BI DAX queries.

A screenshot of a computer

Description automatically generated

We are now ready to define our Expectations, which verify our assertion about the data:

A screenshot of a computer

Description automatically generated

Expectations are reusable across Data Assets; thus, we need to specify which Expectations we want to apply to which Asset:

A screenshot of a computer

Description automatically generated

Finally, we run the Checkpoint and inspect the results or use them for further automation steps.

You can find the tutorial notebook with additional examples in our GitHub samples repository.

Conclusion

The new integration between Great Expectations and Microsoft Fabric unlocks the potential of semantic models for data validation and quality assurance. By enabling seamless access to Power BI data in the familiar GX environment, data scientists and business analysts can collaborate more effectively, ensuring that their data-driven insights are based on high-quality, reliable data.

Start leveraging the power of semantic models in Great Expectations today and unlock the full potential of your data-driven insights.

Related blog posts

Semantic Link: Data validation using Great Expectations

April 24, 2024 by Liliam C Leme

In this new post of our ongoing series, we’ll explore setting up Azure Cosmos DB for NoSQL, leveraging the Vector Search capabilities of AI Search Services through Microsoft Fabric’s Lakehouse features. Additionally, we’ll explore the integration of Cosmos DB Mirror, highlighting the seamless integration with Microsoft Fabric. It’s important to note that this approach harnesses … Continue reading “Fabric Change the Game: Embracing Azure Cosmos DB for NoSQL”

April 23, 2024 by Misha Desai

At the recent Fabric Conference, we announced that both code-first automated machine learning (AutoML) and hyperparameter tuning are now in Public Preview, a key step in making machine learning more complete and widely accessible in the Fabric Data Science. Our system seamlessly integrates the open-source Fast Library for Automated Machine Learning & Tuning (FLAML), offering … Continue reading “Introducing Code-First AutoML and Hyperparameter Tuning: Now in Public Preview for Fabric Data Science”