Microsoft Fabric Updates Blog

SQL Telemetry & Intelligence – How we built a Petabyte-scale Data Platform with Fabric

Over the last three years, the SQL Telemetry & Intelligence (T&I) Engineering team has built a 10+ Petabyte Data Lake on Fabric, processing real-time data from globally distributed SQL Server Engines and Control Plane/Data Plane Services.

A ~5 Petabyte Silver Zone – Table size (Terabytes) over time – as observed by ADLS
A ~5 Petabyte Silver zone (compressed Parquet) – Table size (Terabytes) over time – as observed by ADLS

This article reflects on Modern Data Engineering Best-Practices and foundations that has served us well so far:

SQL T&I – Data Platform Design Pillars on Fabric
SQL T&I – Data Platform Design Pillars

Architecture

In 2022, when deciding where to invest our codebase to express business logic, we performed an honest comparison of the various Fabric Engine APIs against our requirements:

Comparing Fabric Engine APIs to Modern Data Architecture pillars
Comparing Fabric Engine APIs to Modern Data Architecture pillars (this view is not authoritative or exhaustive)

Today, our Lakehouse Architecture follows the Lambda pattern with a heavy bias towards Real-time:

Bronze – reflects the source-system for backfills.

Silver – exploded schema to maximize columnar compression.

Gold – Kimball STAR schema with SCD2 (Dimension) tables, synchronous pre-commit quality checks in a Write-Audit-Publish pattern, with post-commit latency SLAs, Anomaly Detection checks for Natural/Business Keys.

SQL T&I – Medallion Lakehouse Architecture implemented on Fabric
SQL T&I – Medallion Lakehouse Architecture

1. OpenTelemetry – Our First Party (1P) services, as well as Customer-facing (3P) services are instrumented with OpenTelemetry.

2. Event Hub – Real-time Control plane events from Azure Resource Notifications.

3. Azure Data Explorer – Immutable time-series database.

4. Event Hub to Delta Lake Mirroring – Similar to Confluent’s Tableflow, we wrote a highly-performant C#/Rust service hosted on AKS with KEDA with 27+ GB/minute throughput. Refer to Data Ingestion: Introducing delta-dotnet for Delta Lake YouTube seminar.

5. Azure Data Explorer – APPEND-only to Delta Lake – written directly into ADLS using Continuous Export.

6. Open Mirroring – APPEND-only to Delta Lake – The OTeL Collector is written in Go –and does not have a matured Delta Lake SDK. Using Open Mirroring, we APPEND-only flush Parquet files into the Landing Zone with massive concurrency.

7. Spark Streaming – Applying schema-on-read by supplying versioned sample payloads to transform tables, we also use Spark Streaming to combine many small files from regional tables (maxFilesPerTrigger) to generate optimally sized parquet files.

The key benefits of using Spark Streaming across the board are:

8. Kimball SCD2 tables with Transaction grain

Our strict Kimball Dimensional Model offers these guarantees:

  • Unique Primary Keys.
  • Valid Foreign Key references.
  • Idempotent – replaying previously seen transactions are guaranteed to be no-op.
  • Reliable SCD2 columns based on source time (is_row_effective, start_date, end_date etc.).

Bringing relational-database style Primary Key enforcements into a large STAR schema was tricky – we designed our tables taking inspiration from Spark’s Storage Partition Join, learning from Hyperspace’s B-Tree indices and taking heavy advantage of BROADCAST joins.

9. Kimball Periodic Snapshot tables for DirectLake

A single DirectLake Semantic Model that describes a full-360 view of SQL Server’s business using:

  • Periodic Snapshot FACT tables at the daily/weekly/monthly grain.
  • Removing high-cardinality dimensions to make pre-aggregations more effective.

10. Eventhouse Acceleration

While most of our silver tables are stored as Delta Lake on OneLake, a second copy of popular tables are easily made available in Eventhouse using Query Acceleration Policy, where Azure Managed Grafana powers many of our live site KPI dashboards on top of the KQL endpoint.

11. Interactive Exploration with SSMS and Notebooks

Self-explanatory.

12. DirectLake Semantic Model

A mixture of Tabular Editor 3, Power BI Desktop, DAX Studio, and Fabric Web Editor are used to model relationships and measures – all contributions must go through Pull Requests via TMDL.


Problems & Solutions

Operating a hyperscale data platform inevitably brings complex scalability challenges. In this section – we discuss specific Data Engineering problems and solutions.

VSCode Devcontainer for local development

Spark has a reputation of being difficult to run locally.

We invested in designing a VSCode Devcontainer that contains all of our development dependencies, including pinned versions of JDK, Spark Engine, and VSCode Extensions. This enables every new developer to have a consistent development environment:

VSCode Devcontainer – 5 minutes to Debugger
VSCode Devcontainer – 5 minutes to Debugger

Once the code works locally, we rapidly build the whl/jar, upload and test it in Fabric Spark via the VS Code Extension to invoke a Spark Job Definition, all without leaving the IDE:

VSCode Devcontainer – Debugging capabilities
VSCode Devcontainer – In the Debugger after 5 minutes using Spark Java Debug Wire Protocol

Fabric CI/CD at scale

Each member of the team must spin up a private fork of our entire Fabric Production stack, on-demand, with zero pre-requisites.

Our observations:

For regions or restricted clouds where Fabric is unavailable today, we currently maintain 40+ Production Synapse Workspaces using the tremendously helpful synapse-workspace-deployment project.

Using inspiration from it, we authored a fabric-workspace-deployment app, that wraps the fabric-cli, fabric-cicd, and API models in a GitOps manifest to achieve 100% automation:

A consistent, GitOps-driven source of truth to configure every API used in Production in Fabric
A consistent, GitOps-driven source of truth to configure every API used in Production in Fabric
100% version control of every Fabric API in git
We even version control the Workspace PNG icon!

Within 10 minutes, the app spins up isolated capacities, workspaces, Autoscale billing etc. to give a new Engineer an exact functional replica of Production:

Hundreds of consistently deployed Fabric Workspaces and Capacities across T&I
Hundreds of consistently deployed Fabric Workspaces and Capacities across T&I

The application code is available for inspiration, but the API contract is not fit for external consumption yet, as we are still making many breaking changes as we move fast to standardize T&I’s Fabric consumption.


Rapid test coverage

The key design characteristic in being able to maintain robust Data Processing code that is near-regression proof is – to force all changes to data to flow through a simple interface that operates purely on the DataFrame API:

Encapsulating transformations in enclosed functions with transform chaining to make Spark Streaming trivial to adopt and test
Encapsulating transformations in enclosed functions with transform chaining to make Spark Streaming trivial to adopt and test

Testing is an entire discipline area, so we will not go into any level of detail, but we mention two golden tips that will yield huge dividends to host production workloads on Fabric confidently, and specifically Spark:

  • Parallelized testing significantly speeds up test execution while using sbt forking to eliminate noisy-neighbor tests.
  • To dramatically speed up single-node Spark execution speeds on smaller datasets, set spark.sql.shuffle.partitions = 1.

By making these two small changes we were able to reduce our test runtime by 67%:

Reducing test execution time by 67% by two fairly simple Spark Devcontainer architectural changes
Reducing test execution time by 67% by two fairly simple Spark Devcontainer architectural changes

Kimball STAR Schema

After referring to The Data Warehouse Toolkit, 3rd Edition, we modelled all relevant data modelling concepts using Object-Oriented Programming principles:

Kimball’s teachings, captured as Object Oriented Programming principles
Kimball’s teachings, captured as Object Oriented Programming principles (traits and base classes)

The additional benefit of consistent interfaces is, we’re able to automatically generate DBML (Database Markup Language), which allows consistent generation of ERDs:

Auto-generated OpenTelemetry schema ERDs using DBML (Database Markup Language)
Auto-generated OpenTelemetry schema ERDs using DBML (Database Markup Language)

By applying the following best practices, we’ve been able to significantly reduce the time it takes for loads to occur:

  • Liquid Clustering on Kimball key columns to reduce integrity enforcement runtime.
  • Instead of using JOINs to check for orphaned keys, use WHERE EXISTS.
An example of an integrity check for SCD2 columns
An example of an integrity check for SCD2 columns that significantly speeds up by clustering Natural Keys

OpenTelemetry with Custom Spark Metrics

Given our deep investments in OpenTelemetry, to instrument our Spark Cluster with Custom Metrics, we took advantage of Spark Plugins. The use case is the ability to instantly gain deep visibility into a problematic Spark Jobs (e.g. ERROR 137) and understanding the memory profile with pinpoint precision:

Visualizing exactly what occurs across a Spark Cluster moments before an ERROR 137
Pictured heap dump, this is what a JOIN explosion causing ERROR 137 looks like on a 32 GB Executor, about 32 GBs of keys on the heap!

For more information, refer to How to deeply instrument a Spark Cluster with OpenTelemetry (feat. real time Power BI report):

Using Spark Plugins to expose arbitrary OS-level Metrics as an OpenTelemetry Metric
Using Spark Plugins to expose arbitrary OS-level Metrics as an OpenTelemetry Metric

Data Quality and Anomaly Detection

We found Deequ to have the richest API coverage with highest performance, including Anomaly Detection, flexible Metric Stores (including Delta tables), and simple API constructs that can be extended to fit a particular codebase’s needs without requiring upstream changes.

Furthermore, recent features in Deequ to include Data Quality Definition Language (DQDL) have been phenomenal. DQDL allows us to define a terse set of constraints about a table in a few lines of declarative code:

A Data Quality Definition Language (DQDL) rule
A Data Quality Definition Language (DQDL) rule

This simple, easy to follow API allows you to declare and document data quality expectations about the table right alongside the table definition.


Data Platform SLAs

Our Data Platform is built on top of operational data arriving from every major Azure region on Earth. Several issues can occur at any time – It’s important to offer end users a single pane of glass with regards to:

  • What the expected SLAs are for a particular dataset.
  • If SLAs are breached, what the justifications and outliers are.
  • When the expected mitigation is coming.
  • A top-level trend of the entire Data Platform.

A simple YAML file that is git-checked in, with a simple YAML -> native object -> DataFrame converter:

YAML to define SLA's per GOLD table by dimensions (e.g. Azure regions)
YAML to define SLA’s per GOLD table by dimensions (e.g. Azure regions)

SLA failures are evaluated as a Spark DataFrame, and visible by dimension (in this case, region):

Standardizing SLA evaluation through Spark SQL/DataFrame allows us to have robust test coverage with edge-cases captured for complex, maintainable business logic that operates on arbitrary sizes of data.

In an SLA breach:

  • A health check job fails.
  • An incident is fired via Fabric Activator.
  • An on-call engineer investigates the alert, pushes an updated YAML file into OneLake via a PR. The stakeholders are kept up to date via a Delta table visualized in Power BI.
A Power BI report backed by SLA calculations combined with metadata from the YAML per data source per region
A Power BI report backed by SLA calculations combined with metadata from the YAML per data source per region

Fearless, idempotent backfills

Despite having all the unit test coverage in the world, it’s always possible to merge harmless looking changes—such as using an INNER rather than LEFT JOIN, that can cause unnatural drops in business metrics, such as unique natural keys.

Backfilling the lost data to undo the damage is tricky if the source tables are extremely large. To solve this:

  • We capture all transformation logic in self-contained functions allows us to chain them together to backfill a particular table.
  • Having Primary Keys enforced in the GOLD zone allows us to replay transactions from SILVER—without fearing repercussions of duplicates.
Visualizing a backfill job
A backfill job configuration
Enforcement of deterministic primary keys allows us to ensure replaying transactions during backfill does not insert duplicates or have side effects

This is idempotency in action—saving the Lakehouse from what would have been an extremely expensive full reprocess with downstream ripples that would have reset many Streaming Checkpoints:

Backfilling the damage from a PR containing a non-defensive INNER JOIN that impacted several FACT tables
Backfilling the damage from a PR containing a non-defensive INNER JOIN that impacted several FACT tables

Autoscale Billing for Spark enables large, unpredictable workloads

On an average day, we require about 8,000 cores at peak – as jobs are queued, they grab the cores, run, and exit. At other times, for example on November 23rd below, very large Spark Pools needed to be allocated for backfills over 2 days:

Spark peak vCore usage over time
Spark peak vCore usage over time

Before Fabric Spark Autoscale Billing, even multiple F2048s would not have been sufficient for this workload profile. With Autoscale Billing, we simply allocate a small SKU against the workspace and use Autoscale Billing in true PAYG fashion.


Incremental View Maintenance to avoid fully reprocessing Petabytes of data during aggregations

The pre-aggregated periodic-snapshot generation in our STAR schema is not incremental:

The red box represents the only part of the architecture that is not 100% incremental
The red box represents the only part of the architecture that is not 100% incremental

While it is technically possible to take any SQL query, study the AST (Abstract Syntax Tree) and rewrite it such that it’s computed incrementally (linkedin/coral on GitHub), this differential rewrite is a stateful operation, as it must keep track of the previous commit that was processed per table:

Taking a SQL AST and rewriting it to be incrementally evaluated using Linkedin Coral
Taking a SQL AST and rewriting it to be incrementally evaluated

Simply rewriting the query to be incremental isn’t guaranteed to be performant. Depending on the query, columns involved, clustering, table statistics, table/file layout, it’s possible for the original query plan that processes the full dataset to be more efficient than incremental.

It’s a fascinating optimization problem, that can sometimes benefit from Machine Learning over a historical heuristics feedback loop (i.e. evaluating cost over time to learn from the past). Fabric Materialized Lake View Optimal Refresh is precisely trying to solve this for all Fabric users and significantly improve COGS efficiency at scale.

For more information about this interesting problem space, refer to this academic paper: DBSP: Automatic Incremental View Maintenance for Rich Query Languages.


What’s Next?

In our experience, the best part about Fabric is how cohesively it brings together different, highly-specialized computes – such as Spark and the Analysis Services Engine – by unifying state on OneLake. Direct Lake and Spark go hand-in-hand.

Applying the best-practices that has been ingrained in us about Parquet from years of Data Engineering feels extremely intuitive – there’s a minimal amount of new things to learn.

Our Semantic Model is delightful – we’re able to enjoy incredible performance and squeeze fantastic scale (a single model with 100s of tables) out of the Analysis Services Engine via Direct Lake. The trick is to first apply Spark against the massive transaction grain fact tables and perform pre-aggregation using Kimball’s periodic snapshot modelling patterns.

With the core Production Medallion Architecture laid out end-to-end next, we want to polish our Data Platform further to enable a delightful self-serve model using dbt on Fabric, preferably with the fabric-spark adapter to keep local development and PR tests as consistent as possible.

We also aim to apply consistent AI/ML across our data estate, starting with rich, stateful Anomaly Detection on our well-established KPIs and Metrics.


If you’d like a deeper dive on any specific topic discussed in this post, please leave a comment!

Related blog posts

SQL Telemetry & Intelligence – How we built a Petabyte-scale Data Platform with Fabric

December 9, 2025 by Yitzhak Kesselman

Businesses and organizations are entering a new operational era defined by immediacy, intelligence, and continuous adaptation. AI is shifting expectations across every industry. Organizations now need to sense what is happening across their business the moment it occurs, understand its significance, and respond with confidence. Real-time data has become the foundation for how resilient, competitive … Continue reading “Microsoft Fabric Real-Time Intelligence: A Leader in the 2025 Forrester Streaming Data Wave”

November 20, 2025 by Janet Tseng

Fabric Activator is a powerful no-code solution designed for rapid event detection and response. It continuously monitors data sources for patterns or conditions and instantly triggers the right actions—keeping workflows fast and seamless. With Fabric Activator, you can automatically execute actions or send alerts whenever specific data conditions are met, ensuring timely and efficient operations. … Continue reading “What’s new with Fabric Activator?”