Microsoft Fabric Updates Blog

Announcing: Automatic Data Compaction for Fabric Warehouse

We are excited to announce automatic data compaction for Data Warehouses!

One of our goals with the Data Warehouse is automate as much as possible to make it easier and cheaper for you to build and use them. This means you will be spending your time on adding and gaining insights from your data instead of spending it on tasks like maintenance. As a user, you should also expect great performance which is where Data Compaction comes in!

Why is Data Compaction important?

To understand what Data Compaction is and how it helps, we need to first talk about how Data Warehouse Tables are physically stored in OneLake.

When you create a table, it is physically stored as one or more Parquet files. Parquet files are immutable which means that they cannot be changed after they are created. When you perform DML (Data Manipulation Language), such as Inserts and Updates, each transaction will create new Parquet files. Over time, you could have 1000s of small files. Data Compaction will re-write many smaller files into a few larger files which will improve the performance of reading the table.

Another reason for Data Compaction, is to remove deleted rows from the files. When you delete a row, the row isn’t physically deleted in the parquet file. Instead, we use a Delta Lake feature called Delete Vectors which are read as part of the table and let us know which rows to ignore. Delete Vectors make it faster to perform Deletes because we do not need to re-write the existing parquet files. However, if we have many deleted rows in a parquet file, then it takes more resources to read that file and know which rows to ignore.

How does Data Compaction happen?

As you run queries in your Data Warehouse, the engine will generate system tasks to review tables that potentially could benefit from data compaction. Behind the scenes, we then evaluate those tables to see if they would indeed benefit from being compacted.

The compaction itself is actually very simple! It is basically just re-writing either the whole table or portions of the table to create new a parquet file or files that do not have any deleted rows and/or have more rows per file.

Conclusion

Data Compaction is one of the ways that we help your Data Warehouse to provide you with great performance and best of all, it involves no additional work from you! This helps give you more time to work on leveraging your Data Warehouse to gain more value and insights!

Please look forward to more announcements about more automated performance enhancements!

Gerelateerde blogberichten

Announcing: Automatic Data Compaction for Fabric Warehouse

april 14, 2026 door Tzvia Gitlin Troyna

As Microsoft Fabric continues to converge analytics experiences across workloads, one of the most important steps forward is reducing friction in how users move from raw data to insights. With the latest integrations, the Eventhouse Endpoint is now deeply embedded into the “Analyze data with” entry points across Lakehouse, Data Warehouse, and Eventhouse, bringing a … Continue reading “Unifying “Analyze data with” analytics across Fabric (Preview)”

april 13, 2026 door Twinkle Cyril

Schema evolution is a fact of life for modern analytics platforms. As data models grow, teams need to add columns, drop unused fields, and evolve constraints—often as part of tightly controlled deployment pipelines. Fabric DW supported transactional execution for key table‑focused DDLs like CREATE TABLE, DROP TABLE, TRUNCATE TABLE, CTAS and sp_rename—with this release, ALTER … Continue reading “ALTER TABLE inside explicit transactions in Fabric Data Warehouse (Generally Available)”