Semantic Link: Data validation using Great Expectations
Great Expectations Open Source(GX OSS) is a popular Python library that provides a framework for describing and validating the acceptable state of data. It helps data engineers and data scientists ensure that their data meets specific quality standards before using it for analysis, machine learning, or other data-driven tasks. With the recent integration of Microsoft Fabric semantic link, GX can now access semantic models, further enabling seamless collaboration between data scientists and business analysts.
Semantic link is a feature in Microsoft Fabric that establishes a connection between semantic models (aka Power BI datasets) and Synapse Data Science. It facilitates data connectivity, enables the propagation of semantic information, and seamlessly integrates with established tools used by data scientists, such as notebooks. Semantic link helps preserve domain knowledge about data semantics in a standardized way that can speed up data analysis and reduce errors.
Ensuring data quality in the semantic model, also known as the diamond layer, is crucial for organizations to make informed decisions based on accurate and reliable data. Data scientists play a vital role in this process by validating, cleaning, and transforming raw data into meaningful insights. With the integration of Microsoft Fabric semantic link and Great Expectations, data scientists can now leverage the power of both platforms to ensure the highest quality of data assets in the semantic model.
Here’s why ensuring data quality in the semantic model is important:
1. Trustworthy insights: high-quality data assets in the semantic model lead to more accurate and reliable insights, enabling organizations to make better-informed decisions. Data scientists can use GX to define and validate data quality standards, ensuring that the data used in the semantic model is consistent, complete, and accurate.
2. Improved collaboration: the integration of semantic link and GX allows data scientists and business analysts to work together seamlessly, sharing a common understanding of data quality standards. This collaboration ensures that both parties can efficiently and effectively use the data in the semantic model, maximizing the potential of their data-driven insights.
3. Reduced errors: by validating data quality in the semantic model, data scientists can identify and address potential issues before they impact downstream processes, such as reporting and analytics. This proactive approach helps reduce errors and minimize the risk of making decisions based on inaccurate or incomplete data.
In this blog post, we will explore the core concepts of GX, including Data Sources, Assets, Expectations, and Checkpoints. We will also discuss the new integration with Microsoft Fabric Semantic Link, which allows you to access semantic models and leverage the vast library of Expectations provided by GX and its community.
Core GX Concepts: Data Sources and Assets
GX revolves around four core components:
- Data Sources: Connect to your data, regardless of its format or location, and organize it for future use.
- Data Assets: Collections of records within a Data Source that can be further partitioned into Batches.
- Expectations: Verifiable assertions about your data that describe the standards it should conform to.
- Checkpoints: Validate a set of Expectations against a specific set of data.
GX provides a vast library of Expectations, which are classes that implement specific validations. These Expectations can be used to ensure that your data meets the required quality standards.
New Integration: Accessing Semantic Models with Semantic Link
The new integration allows GX to access Power BI datasets in Microsoft Fabric using Semantic Link. This is achieved through the addition of new methods in the GX API, such as add_fabric_powerbi, add_powerbi_table_asset, add_powerbi_measure_asset, and add_powerbi_dax_asset.
In the example below, we first create a GX Data Context and add a GX Data Source for a semantic model. We then add a GX Asset for a Power BI table, a GX Asset for Power BI measures, and a GX Asset for Power BI DAX queries.
We are now ready to define our Expectations, which verify our assertion about the data:
Expectations are reusable across Data Assets; thus, we need to specify which Expectations we want to apply to which Asset:
Finally, we run the Checkpoint and inspect the results or use them for further automation steps.
You can find the tutorial notebook with additional examples in our GitHub samples repository.
The new integration between Great Expectations and Microsoft Fabric unlocks the potential of semantic models for data validation and quality assurance. By enabling seamless access to Power BI data in the familiar GX environment, data scientists and business analysts can collaborate more effectively, ensuring that their data-driven insights are based on high-quality, reliable data.
Start leveraging the power of semantic models in Great Expectations today and unlock the full potential of your data-driven insights.