Microsoft Fabric Updates Blog

Optimizing Spark Compute for Medallion Architectures in Microsoft Fabric

Guidance to Maximizing Productivity and Efficiency for your Data Engineering Workloads

Data engineering teams often grapple with the complexities of planning and configuring compute resources for their data platforms. This is especially true when working with large-scale, complex datasets and demanding downstream SLAs. A one-size-fits-all approach is rarely effective, as different data layers and datasets have unique requirements that necessitate tailored compute configurations.

To address these challenges, organizations often adopt a layered data architecture pattern. This pattern emphasizes modularity and scalability, allowing for better data management and governance. One popular approach is the Medallion architecture.

What is a Medallion Architecture?

Medallion architecture is a popular data engineering approach that emphasizes modularity. It organizes the data platform into three distinct layers: Bronze, Silver, and Gold. This layered structure aims to improve data quality and accessibility as data moves from left to right across the layers.

Enterprise data platforms often employ similar layered structures to ingest data from various on-premises and cloud sources, build entities, and ultimately serve these data assets through reports or applications for broader organizational insights. This common approach aligns well with the modular philosophy of the Medallion pattern.

Tailoring Compute for Medallion Layers

Its crucial to make sure that the compute configurations are custom fit for these different layers for achieving better performance and efficiency in your data engineering workloads.

Bronze Layer: The Bronze layer serves as the raw data landing zone, where unstructured data is ingested “as-is” from various sources (on-premises or cloud). This layer typically handles full or delta loads and acts as an archive of the original data. To minimize delays in subsequent processing stages, it’s essential to maintain a write-optimized configuration for the Bronze layer so the data ingestion jobs complete within your sla requirements.

Spark Configurations for Bronze Layer:

spark.sql("ALTER TABLE $tableName SET TBLPROPERTIES ('delta.checkpointInterval' = '25')"

delta.checkpointInterval: Checkpointing helps Delta Lake maintain data consistency during writes. Setting a higher interval reduces the frequency of checkpoints, allowing for faster data ingestion.

spark.databricks.delta.collect.stats: This collects statistics on Delta tables for faster future queries. However, for the bronze layer with frequent writes with potentially changing data and scenarios where there are no join, filter or aggregate queries on these tables, disabling stats would make the jobs more optimized.

Configurations for latest Fabric Runtime (1.3) spark.conf.set("spark.sql.parquet.vorder.enabled","false")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled","false")
spark.conf.set("spark.databricks.delta.collect.stats","false")

spark.sql.parquet.vorder.enabled and spark.databricks.delta.optimizeWrite.enabled configurations are primarily beneficial for sorting and data analysis (more relevant to Gold layer which is which is read-heavy). Disabling them in the Bronze layer avoids unnecessary overhead during writes (bypassing steps to reorganize data), which translates to faster data landing.

Silver Layer: In the Silver layer, data is cleaned, enriched, and subjected to data quality checks. While not directly consumable for business reporting, this layer is vital for applying business requirements and standardizing data into a common format.

The Silver layer serves as a valuable resource for data engineers, data scientists, and analysts who create data entities and business aggregates for the Gold layer. However, in the silver layer, where frequent data writes occur alongside occasional sorting, vorder can introduce some overhead due to additional steps involved to reorganize data during writes.

Disabling vorder prioritizes write performance for the silver layer. This is crucial because data transformation and cleaning in the silver layer often involve numerous write operations.

Spark Configurations for Silver Layer:

Configurations for latest Fabric Runtime (1.3) spark.conf.set("spark.sql.parquet.vorder.enabled","false")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled","false") 
                                  or 
spark.conf.set("spark.databricks.delta.optimizeWrite.binSize","157m")

Disabling delta.optimizeWrite helps eliminate pre-shuffle step, bypassing the effort to reorganize data during write leading to quicker write operations. Alternatively, you can fine-tune the optimization for Silver layers that are used for querying and data exploration, by enabling the optimizeWrite property and setting the binSize value between 150-250 MB can enhance parallelism for Spark jobs that create aggregates in the Gold layer.

Gold Layer: The Gold layer houses aggregated and curated data entities designed for business consumption. These entities are primarily accessed by business users and analysts to derive key insights. To ensure a low-latency experience for concurrent queries from multiple users, the Gold layer should be read-optimized.

Spark Configurations for Gold Layer:

Configurations for latest Fabric Runtime (1.3) spark.conf.set("spark.sql.parquet.vorder.enabled","true")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled","true")
spark.conf.set(spark.conf.set("spark.databricks.delta.optimizeWrite.binSize","1gb")

Gold layer tables are configured with vorder and optimizeWrite enabled, and a binSize of 1GB to minimize the number of files and optimize for read performance. Since the Gold layer is usually well partitioned and heavily queried, vorder can dramatically improve query execution times, especially for complex queries involving sorting and with optimizeWrite enabled Delta Lake automatically reorganizes data during writes, making it more efficient for queries to scan and filter data in the Gold layer to provide faster reads for Power BI reports accessed by business users.

You can configure all the above-mentioned properties using the environment item and attach them to your notebook or spark job definitions.

You can also use magic commands (%%configure) for notebook jobs in your Fabric workspace.

We hope these tips help you tailor your Spark compute configurations to meet the needs of your Medallion architecture and achieve optimal performance.

To learn more about how to create, configure and use an environment, please refer to our documentation Create, configure, and use an environment in Fabric – Microsoft Fabric | Microsoft Learn

To learn more about how to configure your sessions using magic commands, please refer to our documentation Develop, execute, and manage notebooks – Microsoft Fabric | Microsoft Learn

Related blog posts

Optimizing Spark Compute for Medallion Architectures in Microsoft Fabric

November 4, 2025 by Misha Desai

We’re introducing a set of new enhancements for Data Agent creators — designed to make it easier to debug, improve, and express your agent’s logic. Whether you’re tuning example queries, refining instructions, or validating performance, these updates make it faster to iterate and deliver high-quality experiences to your users. New Debugging Tools View referenced example … Continue reading “Creator Improvements in the Data Agent”

November 3, 2025 by Arshad Ali

Additional authors – Madhu Bhowal, Ashit Gosalia, Aniket Adnaik, Kevin Cheung, Sarah Battersby, Michael Park Esri is recognized as the global market leader in geographic information system (GIS) technology, location intelligence, and mapping, primarily through its flagship software, ArcGIS. Esri empowers businesses, governments, and communities to tackle the world’s most pressing challenges through spatial analysis. … Continue reading “ArcGIS GeoAnalytics for Microsoft Fabric Spark (Generally Available)”