Microsoft Fabric Updates Blog

Boost performance effortlessly with Automated Table Statistics in Microsoft Fabric

We’re thrilled to introduce Automated Table Statistics in Microsoft Fabric Data Engineering — a major upgrade that helps you get blazing-fast query performance with zero manual effort.

Whether you’re running complex joins, large aggregations, or heavy filtering workloads, Fabric’s new automated statistics will help Spark make smarter decisions, saving you time, compute, and money.

What are Automated Table Statistics?

In simple terms, table statistics are summary metrics about your data that Spark uses to optimize queries which include:

  • Total row counts.
  • Null counts per column.
  • Minimum and maximum values per column.
  • Distinct value counts per column.
  • Average and maximum column lengths.

Until now, generating these statistics required running manual commands (ANALYZE TABLE) or setting up custom pipelines. With this release, Fabric collects them automatically when you create a new Delta table — no setup or tuning required.

Why does it matter?

Spark’s Cost-Based Optimizer (CBO) relies on good statistics to:

  • Choose the best join strategy (e.g., broadcast vs. shuffle joins).
  • Prune partitions effectively.
  • Reduce unnecessary data shuffles.
  • Optimize aggregations.

Without accurate stats, Spark can only guess, often leading to slow or expensive query plans. With automated stats, Spark becomes much smarter — and in our internal benchmarks, we’ve seen up to ~45% performance gains on complex workloads.


How does it work?

  • Enabled by default on all new Delta tables.
  • Tracks first 32 columns (including nested columns).
  • Stores stats separately in Parquet format to avoid bloating data files.
  • Fully integrated with Fabric Spark 3.5 and later.

You can also fine-tune behavior with configurations:

Enable/disable stats collection:

spark.conf.set("spark.microsoft.delta.stats.collect.extended", "true")

Enable/disable optimizer injection:

spark.conf.set("spark.microsoft.delta.stats.injection.enabled", "true")

How to Check or Recompute Statistics

Want to see what’s under the hood? Fabric lets you inspect collected stats and recompute them if needed:

Check current stats (Scala):

println(spark.read.table("tableName").queryExecution.optimizedPlan.stats)

Recompute after schema changes:

StatisticsStore.recomputeStatisticsWithCompaction(spark, "tableName")

For full control, you can also use the familiar ANALYZE TABLE command.


Important limitations to know

As with any advanced feature, it’s important to understand the current boundaries:

  • Stats are collected only at write time.
  • Updates from other engines won’t be included.
  • Only the first 32 columns are tracked.
  • Recomputing stats may require rewriting the table.
  • No fallback → in rare cases, stats may sometimes impact performance.

We’re actively working on expanding support — including improvements for existing tables and better recompute workflows.

Learn More

To learn more about automated statistics for tables please visit the Configure and manage Automated Table Statistics in Fabric Spark documentation.

Related blog posts

Boost performance effortlessly with Automated Table Statistics in Microsoft Fabric

June 12, 2025 by RK Iyer

Introduction Whether you’re building analytics pipelines or conversational AI systems, the risk of exposing sensitive data is real. AI models trained on unfiltered datasets can inadvertently memorize and regurgitate PII, leading to compliance violations and reputational damage. This blog explores how to build scalable, secure, and compliant data workflows using PySpark, Microsoft Presidio, and Faker—covering … Continue reading “Privacy by Design: PII Detection and Anonymization with PySpark on Microsoft Fabric”

June 11, 2025 by Eren Orbey

Earlier this year, we released AI functions in public preview, allowing Fabric customers to apply LLM-powered transformations to OneLake data simply and seamlessly, in a single line of code. Since then, we’ve continued iterating on AI functions in response to your feedback. Let’s explore the latest updates, which make AI functions more powerful, more cost-effective, … Continue reading “Introducing upgrades to AI functions for better performance—and lower costs”