Microsoft Fabric Updates Blog

Adaptive Target File Size Management in Fabric Spark

Set It and Forget It Target File Size Optimization

What if you could enable a single setting and never worry about file size tuning again? Or if your tables automatically adjusted their optimal file sizes as they grew from megabytes to terabytes, without any manual intervention?

Today’s data teams face a familiar challenge. Too small, and query performance suffers from excessive metadata overhead. Too large, and you lose parallelism and waste I/O on selective queries. Worse yet, the optimal file size for a 10GB table is completely different from the optimal size for a 10TB table—and most tables don’t stay the same size forever.

We’re introducing two powerful file size management features in Microsoft Fabric Spark: use defined Target File Size and Adaptive Target File Size. These features eliminate the guesswork and ongoing maintenance of file size optimization, letting you focus on your data instead of spending time tuning Spark settings.

The File Size Dilemma: One Size Doesn’t Fit All

Traditional Delta table management requires you to make file size decisions upfront—and live with the consequences:

The Multi-Configuration Problem: Different operations (OPTIMIZE, Optimized Writes, and Auto Compaction) use different file size settings, leading to inconsistent layouts and suboptimal performance when the five different Spark configurations aren’t strategically set.

# Multiple scattered session configurations without table level support
spark.conf.set('spark.databricks.delta.optimize.maxFileSize' = '1g')
spark.conf.set('spark.databricks.delta.optimize.minFileSize' = '1g')
spark.conf.set('spark.databricks.delta.autoCompact.maxFileSize' = '128m')
spark.conf.set('spark.databricks.delta.autoCompact.minFileSize' = '64m')
spark.conf.set('spark.databricks.delta.optimizeWrite.binSize' = '128m')

The Growth Problem: A table that starts with 128MB target files might need 1GB files after growing 1000x in size. Manual reconfiguration becomes a recurring operational burden, or more likely, data teams don’t have the time or knowledge to manage these configurations over time, resulting in suboptimal performance.

The Context Problem: Different tables in the same lakehouse have wildly different optimal file sizes based on their size, access patterns, and query types. Managing individual configurations across hundreds or thousands of tables becomes unwieldy and unrealistic.

The Expertise Problem: Determining optimal file sizes requires deep understanding of the Spark engine, cluster configurations, and data distribution—knowledge that shouldn’t be required for effective data management.

The Hidden Performance Costs: Suboptimal file sizes create cascading performance problems that extend far beyond simple query speed:

  • Poor data skipping: Oversized files contain too much diverse data, reducing Delta’s ability to skip irrelevant files during query planning. If all data for a smaller table is in a single 1gb file, no files can be skipped.
  • Write amplification on updates: Files that are too large result in MERGE, UPDATE, and DELETE operations that touch a larger scope of data. If all data is in one large file and a single row is updated, the entire large file will be rewritten.
  • Parallelism bottlenecks: Wrong-sized files either waste parallelism (too many small files) or limit it (too few large files), reducing both read and write throughput.

To address these challenges, we’re introducing two complementary capabilities designed to make file size optimization automatic and intelligent.

Introducing Adaptive Target File Size for Fabric Spark: Simplicity Meets Performance

Our intelligent data layout features transform file size management from a complex ongoing concern into a simple, one-time configuration that automatically adapts to your data’s changing needs.

Adaptive Target File Size: Table-Level Intelligence

What it does: Adaptive Target File Size uses Delta table telemetry to estimate the ideal target file size based on heuristics like the size of the table—and automatically updates the target as conditions change, ensuring optimal performance without manual intervention. Once enabled, all data layout operations—OPTIMIZE, Optimized Writes, and Auto Compaction—align to a unified adaptive file size strategy, centrally managed through a single session configuration.

SET spark.microsoft.delta.targetFileSize.adaptive.enabled = True

The power of consistency: When all operations use the same target, your table maintains a predictable file distribution. Query optimizers can make better decisions, performance becomes more consistent, and troubleshooting becomes simpler.

Table-specific optimization: Different tables can have different targets based on their specific needs, without the complexity of managing multiple settings per table:

Cascading benefits that unlock performance multipliers: Right-sized files deliver performance benefits that compound across your entire data platform, beyond just eliminating the need to think about and manage various session configurations:

  • Enhanced data skipping: Properly sized files support optimal data clustering and skipping, allowing Delta’s file skipping protocol to eliminate more irrelevant files during query execution. A small table with 128MB files instead of 1GB files enables 8x more possible file skipping.
  • Reduced update costs: MERGE and UPDATE operations only rewrite the specific files they touch. Smaller, right-sized files mean it’s possible to touch less files with each operation and therefore rewrite less data. With Deletion Vectors enabled, this becomes even more critical as updates to large files will trigger expensive cleanup operations when compaction is run.
  • Optimized parallelism: Right-sized files enable Spark to achieve ideal task parallelism. Too many small files overwhelm the scheduler; too few large files underutilize your cluster. Optimal sizing maximizes both read and write throughput.

In a customer provided benchmark, enabling Adaptive Target File Size resulted in a 30% reduction in ELT cycle time. While compaction jobs accounted for a stunning ~2.8x reduction in execution time, other phases of the data lifecycle also benefited.

Adaptive target file size resulted in 2.8x faster compaction in a customer benchmark.

In the TPC-DS 1TB Power Test benchmark, when compaction was performed before the query phase, enabling Adaptive Target File Size resulted in the compaction phase running nearly 1.6x faster and the query phase running 1.2x faster.

TPC-DS queries were 1.2x faster when optimize was run before with adaptive target file size enabled.

Tables automatically evaluate and set the adaptive target file size as the table changes over time, i.e.:

  • Week 1: 1GB table → 128MB target files (optimal for small tables)
  • Month 6: 500GB table → 256MB target files (adapts to medium size)
  • Year 2: 4TB table → 512MB target files (scales with large table)
  • Year 5: 10TB table → 1GB target files

The following chart illustrates the relationship between table size and the optimal parquet file size. For tables below 10 GB, the Fabric Spark Runtime evaluates the target file size to be 128 MB. As the table size grows, the target file size scales linearly, reaching up to 1 GB for tables that exceed 10 TB.

As the size of the table increases so does the optimal parquet file size.

How it works: The system evaluates and applies optimal file sizes during:

  • Table creation operations that write data (CREATE TABLE AS SELECT, save as new table, etc.)
  • Start of OPTIMIZE operations when the session config is enabled

Hands-off optimization: In combination with a table compaction strategy like Auto Compaction, once enabled, adaptive target file sizing handles the complexity of file size management automatically. Your tables maintain optimal data layouts as they evolve, without requiring data engineering expertise or ongoing maintenance.

For users that want granular control, or want to override the target file size, a single unified table property is now available.

User Defined Target File Size: One Setting to Rule Them All

What it does: Provides a single user facing Delta table property to control target file sizes across all data layout operations: OPTIMIZE, Optimized Writes, and Auto Compaction.

Why it matters: No more juggling separate configurations. Set your target once at the table level, and every file-size-related operation respects that target, creating consistent, predictable table layouts.

ALTER TABLE sales_data SET TBLPROPERTIES (
 'delta.targetFileSize' = '128m' -- Controls everything!
 )

The user defined Target File Size takes precedence over Adaptive Target File Size allowing both to be used:

-- Session has adaptive sizing enabled
SET spark.microsoft.delta.targetFileSize.adaptive.enabled = true

-- But this specific table needs a user defined target size
ALTER TABLE compliance_reports SET TBLPROPERTIES (
'delta.targetFileSize' = '64MB' -- Takes precedence over adaptive sizing
)

-- This table continues using adaptive sizing 
OPTIMIZE user_analytics -- Uses automatically evaluated target size

Start Using Adaptive Target File Size on Existing Tables

Simply enable the Adaptive Target File Size session configuration and the first `OPTIMIZE` operation on existing tables will evaluate and set the baseline adaptive target file size before compaction continues with evaluation.

What This Means for Your Team

Data Engineers: Focus on building data pipelines instead of tuning file size configurations. The Fabric Spark Runtime will auto optimize your target file size over time.

Platform Teams: Reduce the operational burden of lakehouse management. Fewer configurations to maintain, more predictable performance characteristics.

Analytics Teams: Faster and more predictable query performance as tables grow over time.

Cost Management: Eliminate over-provisioning for worst-case file layouts. Tables automatically find the sweet spot between performance and resource usage.

Ready to Simplify Your File Size Strategy?

Adaptive Target File Size is available now in Fabric Spark Runtime 1.3. Enable it at the session level for immediate benefits or use the user defined target file size property to simplify existing table configurations.

Check out our comprehensive Adaptive target file size documentation for detailed configuration options, migration strategies, and best practices for different table types and access patterns.

Postagens relacionadas em blogs

Adaptive Target File Size Management in Fabric Spark

novembro 10, 2025 de Arun Ulagaratchagan

SQL is having its moment. From on-premises data centers to Azure Cloud Services to Microsoft Fabric, SQL has evolved into something far more powerful than many realize and it deserves the focused attention of a big stage.  That’s why I’m thrilled to announce SQLCon, a dedicated conference for database developers, database administrators, and database engineers. Co-located with FabCon for an unprecedented week of deep technical content … Continue reading “It’s Time! Announcing The Microsoft SQL Community Conference”

novembro 3, 2025 de Arshad Ali

Additional authors – Madhu Bhowal, Ashit Gosalia, Aniket Adnaik, Kevin Cheung, Sarah Battersby, Michael Park Esri is recognized as the global market leader in geographic information system (GIS) technology, location intelligence, and mapping, primarily through its flagship software, ArcGIS. Esri empowers businesses, governments, and communities to tackle the world’s most pressing challenges through spatial analysis. … Continue reading “ArcGIS GeoAnalytics for Microsoft Fabric Spark (Generally Available)”