Microsoft Fabric Updates Blog

Boost performance effortlessly with Automated Table Statistics in Microsoft Fabric

We’re thrilled to introduce Automated Table Statistics in Microsoft Fabric Data Engineering — a major upgrade that helps you get blazing-fast query performance with zero manual effort.

Whether you’re running complex joins, large aggregations, or heavy filtering workloads, Fabric’s new automated statistics will help Spark make smarter decisions, saving you time, compute, and money.

What are Automated Table Statistics?

In simple terms, table statistics are summary metrics about your data that Spark uses to optimize queries which include:

  • Total row counts.
  • Null counts per column.
  • Minimum and maximum values per column.
  • Distinct value counts per column.
  • Average and maximum column lengths.

Until now, generating these statistics required running manual commands (ANALYZE TABLE) or setting up custom pipelines. With this release, Fabric collects them automatically when you create a new Delta table — no setup or tuning required.

Why does it matter?

Spark’s Cost-Based Optimizer (CBO) relies on good statistics to:

  • Choose the best join strategy (e.g., broadcast vs. shuffle joins).
  • Prune partitions effectively.
  • Reduce unnecessary data shuffles.
  • Optimize aggregations.

Without accurate stats, Spark can only guess, often leading to slow or expensive query plans. With automated stats, Spark becomes much smarter — and in our internal benchmarks, we’ve seen up to ~45% performance gains on complex workloads.


How does it work?

  • Enabled by default on all new Delta tables.
  • Tracks first 32 columns (including nested columns).
  • Stores stats separately in Parquet format to avoid bloating data files.
  • Fully integrated with Fabric Spark 3.5 and later.

You can also fine-tune behavior with configurations:

Enable/disable stats collection:

spark.conf.set("spark.microsoft.delta.stats.collect.extended", "true")

Enable/disable optimizer injection:

spark.conf.set("spark.microsoft.delta.stats.injection.enabled", "true")

How to Check or Recompute Statistics

Want to see what’s under the hood? Fabric lets you inspect collected stats and recompute them if needed:

Check current stats (Scala):

println(spark.read.table("tableName").queryExecution.optimizedPlan.stats)

Recompute after schema changes:

StatisticsStore.recomputeStatisticsWithCompaction(spark, "tableName")

For full control, you can also use the familiar ANALYZE TABLE command.


Important limitations to know

As with any advanced feature, it’s important to understand the current boundaries:

  • Stats are collected only at write time.
  • Updates from other engines won’t be included.
  • Only the first 32 columns are tracked.
  • Recomputing stats may require rewriting the table.
  • No fallback → in rare cases, stats may sometimes impact performance.

We’re actively working on expanding support — including improvements for existing tables and better recompute workflows.

Learn More

To learn more about automated statistics for tables please visit the Configure and manage Automated Table Statistics in Fabric Spark documentation.

Related blog posts

Boost performance effortlessly with Automated Table Statistics in Microsoft Fabric

April 6, 2026 by Arshad Ali

ADO.NET is a widely adopted data access technology in the .NET ecosystem that enables applications to connect to and work with data from databases and big data platforms. The Microsoft ADO.NET Driver for Fabric Data Engineering lets you connect, query, and manage Spark workloads in Microsoft Fabric with the reliability and simplicity of standard ADO.NET … Continue reading “Microsoft ADO.NET Driver for Microsoft Fabric Data Engineering (Preview)”

March 27, 2026 by Avinanda Chattapadday

The enterprise-grade JDBC driver enables secure, flexible, and performant connectivity to Spark SQL workloads running in Microsoft Fabric, using Fabric’s Livy APIs as the execution layer.