Microsoft Fabric Updates Blog

Optimizing Spark Compute for Medallion Architectures in Microsoft Fabric

Guidance to Maximizing Productivity and Efficiency for your Data Engineering Workloads

Data engineering teams often grapple with the complexities of planning and configuring compute resources for their data platforms. This is especially true when working with large-scale, complex datasets and demanding downstream SLAs. A one-size-fits-all approach is rarely effective, as different data layers and datasets have unique requirements that necessitate tailored compute configurations.

To address these challenges, organizations often adopt a layered data architecture pattern. This pattern emphasizes modularity and scalability, allowing for better data management and governance. One popular approach is the Medallion architecture.

What is a Medallion Architecture?

Medallion architecture is a popular data engineering approach that emphasizes modularity. It organizes the data platform into three distinct layers: Bronze, Silver, and Gold. This layered structure aims to improve data quality and accessibility as data moves from left to right across the layers.

Enterprise data platforms often employ similar layered structures to ingest data from various on-premises and cloud sources, build entities, and ultimately serve these data assets through reports or applications for broader organizational insights. This common approach aligns well with the modular philosophy of the Medallion pattern.

Tailoring Compute for Medallion Layers

Its crucial to make sure that the compute configurations are custom fit for these different layers for achieving better performance and efficiency in your data engineering workloads.

Bronze Layer: The Bronze layer serves as the raw data landing zone, where unstructured data is ingested “as-is” from various sources (on-premises or cloud). This layer typically handles full or delta loads and acts as an archive of the original data. To minimize delays in subsequent processing stages, it’s essential to maintain a write-optimized configuration for the Bronze layer so the data ingestion jobs complete within your sla requirements.

Spark Configurations for Bronze Layer:

spark.sql("ALTER TABLE $tableName SET TBLPROPERTIES ('delta.checkpointInterval' = '25')"

delta.checkpointInterval: Checkpointing helps Delta Lake maintain data consistency during writes. Setting a higher interval reduces the frequency of checkpoints, allowing for faster data ingestion.

spark.databricks.delta.collect.stats: This collects statistics on Delta tables for faster future queries. However, for the bronze layer with frequent writes with potentially changing data and scenarios where there are no join, filter or aggregate queries on these tables, disabling stats would make the jobs more optimized.

Configurations for latest Fabric Runtime (1.3) spark.conf.set("spark.sql.parquet.vorder.enabled","false")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled","false")
spark.conf.set("spark.databricks.delta.collect.stats","false")

spark.sql.parquet.vorder.enabled and spark.databricks.delta.optimizeWrite.enabled configurations are primarily beneficial for sorting and data analysis (more relevant to Gold layer which is which is read-heavy). Disabling them in the Bronze layer avoids unnecessary overhead during writes (bypassing steps to reorganize data), which translates to faster data landing.

Silver Layer: In the Silver layer, data is cleaned, enriched, and subjected to data quality checks. While not directly consumable for business reporting, this layer is vital for applying business requirements and standardizing data into a common format.

The Silver layer serves as a valuable resource for data engineers, data scientists, and analysts who create data entities and business aggregates for the Gold layer. However, in the silver layer, where frequent data writes occur alongside occasional sorting, vorder can introduce some overhead due to additional steps involved to reorganize data during writes.

Disabling vorder prioritizes write performance for the silver layer. This is crucial because data transformation and cleaning in the silver layer often involve numerous write operations.

Spark Configurations for Silver Layer:

Configurations for latest Fabric Runtime (1.3) spark.conf.set("spark.sql.parquet.vorder.enabled","false")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled","false") 
                                  or 
spark.conf.set("spark.databricks.delta.optimizeWrite.binSize","157m")

Disabling delta.optimizeWrite helps eliminate pre-shuffle step, bypassing the effort to reorganize data during write leading to quicker write operations. Alternatively, you can fine-tune the optimization for Silver layers that are used for querying and data exploration, by enabling the optimizeWrite property and setting the binSize value between 150-250 MB can enhance parallelism for Spark jobs that create aggregates in the Gold layer.

Gold Layer: The Gold layer houses aggregated and curated data entities designed for business consumption. These entities are primarily accessed by business users and analysts to derive key insights. To ensure a low-latency experience for concurrent queries from multiple users, the Gold layer should be read-optimized.

Spark Configurations for Gold Layer:

Configurations for latest Fabric Runtime (1.3) spark.conf.set("spark.sql.parquet.vorder.enabled","true")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled","true")
spark.conf.set(spark.conf.set("spark.databricks.delta.optimizeWrite.binSize","1gb")

Gold layer tables are configured with vorder and optimizeWrite enabled, and a binSize of 1GB to minimize the number of files and optimize for read performance. Since the Gold layer is usually well partitioned and heavily queried, vorder can dramatically improve query execution times, especially for complex queries involving sorting and with optimizeWrite enabled Delta Lake automatically reorganizes data during writes, making it more efficient for queries to scan and filter data in the Gold layer to provide faster reads for Power BI reports accessed by business users.

You can configure all the above-mentioned properties using the environment item and attach them to your notebook or spark job definitions.

You can also use magic commands (%%configure) for notebook jobs in your Fabric workspace.

We hope these tips help you tailor your Spark compute configurations to meet the needs of your Medallion architecture and achieve optimal performance.

To learn more about how to create, configure and use an environment, please refer to our documentation Create, configure, and use an environment in Fabric – Microsoft Fabric | Microsoft Learn

To learn more about how to configure your sessions using magic commands, please refer to our documentation Develop, execute, and manage notebooks – Microsoft Fabric | Microsoft Learn

Related blog posts

Optimizing Spark Compute for Medallion Architectures in Microsoft Fabric

April 6, 2026 by Arshad Ali

ADO.NET is a widely adopted data access technology in the .NET ecosystem that enables applications to connect to and work with data from databases and big data platforms. The Microsoft ADO.NET Driver for Fabric Data Engineering lets you connect, query, and manage Spark workloads in Microsoft Fabric with the reliability and simplicity of standard ADO.NET … Continue reading “Microsoft ADO.NET Driver for Microsoft Fabric Data Engineering (Preview)”

March 27, 2026 by Avinanda Chattapadday

The enterprise-grade JDBC driver enables secure, flexible, and performant connectivity to Spark SQL workloads running in Microsoft Fabric, using Fabric’s Livy APIs as the execution layer.