Microsoft Fabric Updates Blog

Under the hood: an introduction to the Native Execution Engine for Microsoft Fabric

Introduction

In today’s data landscape, as organizations scale their analytical workloads, the demand for faster, more cost-efficient computation continues to rise. Apache Spark has long been the backbone of largescale data processing with its in‑memory processing and powerful APIs, but today’s workloads demand even better performance.

Microsoft Fabric addresses this challenge with the Native Execution Engine—a vectorized, C++ powered execution layer that accelerates Spark jobs with no code changes, reduced runtime, and at no additional compute cost. This blog post will take you behind the scenes to give an overview of how the engine works and how it delivers performance gains while preserving the familiar Spark developer experience users already know and love.

Why Spark needed a new execution approach

Across industries, we see similar patterns where data volumes continue to grow, refresh cycles tighten, and analytical pipelines inherited from prior years begin to strain under the weight of modern expectations. As organizations push toward more frequent insights with hourly metrics instead of daily and near‑real‑time scoring instead of batch windows, Spark workloads that once felt comfortably performant now take longer, cost more, and require increasing operational oversight.

Developers often find themselves tuning the same transformations repeatedly or scaling clusters just to maintain existing SLAs. These mounting pressures highlight a technical reality: to keep pace with evolving data demands, teams need an execution model that unlocks more performance from the same code and infrastructure, while avoiding costly rewrites and preventing uncontrolled compute costs.

Spark’s traditional execution stack runs on the Java Virtual Machine (JVM), which delivers portability and an excellent developer experience. However, several inherent limitations remain:

  • Garbage collection overhead introduces unpredictable pauses during memory clean up.
  • Row‑based processing is suboptimal for columnar formats such as Parquet and Delta Lake.
  • Minimal SIMD (vectorization) usage, which means modern CPUs are underutilized.

As data volumes grow and real‑time expectations intensify, these bottlenecks become more pronounced, and even highly optimized Spark runtimes eventually reach their practical limits. Addressing these challenges requires an execution model that fully leverages modern hardware while preserving everything users love about Spark.

The Native Execution Engine

Fabric’s Native Execution Engine introduces a new execution path to Spark by offloading compute‑intensive operations to native C++ while Spark still handles all user interfacing, planning, scheduling, distribution, and fault tolerance. The Native Execution Engine is powered by two open‑source technologies:

  • Velox, a vectorized C++ execution engine open-sourced by Meta that provides optimized processing kernels over columnar data
  • Apache Gluten, an Apache incubating project that bridges Spark and Velox by transforming Spark’s execution plan into a native plan executable by Velox

Together, these components enable the Native Execution Engine to preserve Spark’s scalability while delivering significant speedups through vectorized, columnar native execution, allowing users to continue running existing Spark notebooks and applications simply by enabling native execution with a single configuration.

It optimizes performance by using the native capabilities of underlying data sources and minimizing the overhead typically associated with data movement and serialization in traditional Spark environments. The engine supports various operators and data types, but to fully benefit from the engine’s capabilities, consider its optimal use cases:

  • The engine is effective when working with data in Parquet and Delta formats, which it can process natively and efficiently.
  • Queries that involve intricate transformations and aggregations benefit significantly from the columnar processing and vectorization capabilities of the engine.
  • Performance enhancement is most notable in scenarios where the queries don’t trigger the fallback mechanism by avoiding unsupported features or expressions.
  • The engine is well-suited for queries that are computationally intensive, rather than simple or I/O-bound.

For information on the operators and functions supported by the native execution engine, see Apache Gluten documentation.

The diagram illustrates a high-level overview of how Gluten and Velox integrate with Spark.

Figure: Integration of Gluten and Velox with Spark

How the Native Execution Engine accelerates workloads

The native engine accelerates Spark workloads by executing supported operators directly in a high‑performance, columnar, C++ engine rather than relying on the traditional JVM row‑based execution path. While the engine supports a broad range of operators and data types, it delivers the greatest benefit in its optimal use cases:

  • Parquet and Delta workflows — These are processed natively and efficiently, avoiding unnecessary conversions.
  • Complex transformations and aggregations — Columnar processing and vectorized execution shine when queries include heavy grouping, filtering, projections, or expression evaluation.
  • Queries that avoid fallbacks — Staying within supported operators maximizes native throughput and prevents the overhead of switching between Spark and native execution.
  • Compute‑intensive workloads — The engine is designed for CPU‑heavy analytical queries rather than simple scans or I/O‑bound workloads.

For a detailed list of supported operators and functions, see Apache Gluten documentation.

Benchmarking TPC‑DS at scale factor 1000 using Delta format shows up to six times faster performance compared to open-source Spark. On a fixed‑size Fabric cluster, this level of improvement would translate to about 83 percent compute-cost savings.

The clip displays a notebook, featuring a code cell where a function `time_query_native_vs_spark` is defined, and a query is executed to analyze NYC taxi trip data, comparing average trip duration and distance per vendor using both Native Execution Engine and Spark.The native version runs about 2.4 times faster.

Figure: A GIF comparing query performance with the Native Execution Engine.

The video clip demonstrates a SQL query executed twice on the NYC Yellow Taxi dataset, once with the Native Execution Engine disabled and then with it enabled by toggling the spark.native.enabled config on and off. The query analyzes trip data to compute per‑vendor aggregates. In native, the query completes approximately 2.4× faster than the non‑native run.

Row vs. columnar execution and SIMD

Traditional Spark executes over row‑oriented data, where values from different columns of a single row sit together in memory. While simple and flexible, this layout is inefficient for analytical workloads that repeatedly apply the same operation across millions of values in one column. Accessing values requires frequent jumps in memory, causing cache misses and limiting parallelism.

The native engine adopts a columnar execution model, where each column’s values are stored contiguously. This dramatically improves memory locality and enables the engine to operate on vectors of data at a time instead of row‑by‑row. Columnar layout becomes even more powerful when paired with SIMD (Single Instruction, Multiple Data) instructions: the CPU can apply one operation—such as addition, comparison, or hashing—to 8, 16, or even 32 values simultaneously depending on the hardware.

Velox executes computation over columnar batches using highly optimized vectorized operators. This boosts CPU cache efficiency, reduces branching, and increases throughput for filters, projections, aggregations, and expression evaluation. C++ execution also eliminates JVM JIT warm‑up and unnecessary serialization and gains fine‑grained control over memory allocation and layout.

The image illustrates the process of executing a query involving operations like addition, multiplication, and selection on row and columnar data each showing how columnar layouts enable batch execution for the same computation.

Figure: Scalar row-based execution vs. SIMD columnar execution

Integration with Spark optimizer

A key advantage of the Native Execution Engine is that it integrates after Spark’s logical and physical optimization phases, preserving all existing Fabric Spark query optimization rules and enhancements. This means all existing optimizations in Fabric Spark runtime—adaptive query execution, cost‑based rewrites, column pruning, predicate pushdown, and more—remain fully in effect. The native engine simply provides a faster execution path for supported operators, while requiring no changes to existing code.

The execution flow remains familiar:

  1. Spark builds the logical and optimized physical plan as usual.
  2. Gluten intercepts the physical plan, identifies which operators are natively supported, and replaces those nodes with their native equivalents while leaving unsupported nodes for Spark.
  3. Velox executes the native operators using vectorized, C++ kernels.
  4. Fallback handling ensures unsupported operators run on Spark, with efficient columnar‑to‑row and row‑to‑columnar conversions where needed.

This design preserves the Spark developer experience while unlocking significantly faster execution under the hood—no code changes required.

Real-time fallback visibility in notebooks

The Native Execution Engine in Fabric Spark offers performance advantages for many workloads, but not all Spark operations are currently supported. Users commonly encounter unexpected behavior when jobs silently fall back from native execution to JVM-based Spark due to unsupported features, which can cause performance degradation from columnar-to-row and row-to-columnar conversions at the JVM/native engine boundary. To ease troubleshooting native engine incompatibilities, the Apache Spark advisor has been upgraded to report real-time, in-context alerts during notebook cell execution and in the Spark Advisor view when operators fall back to JVM-based execution.

This capability is intended to allow users to better understand unexpected performance changes when the Native Execution Engine is enabled and use the generated alerts to quickly identify workloads best suited for native acceleration. If the advisor shows frequent fallbacks, consider small refactors that keep data columnar longer, or replace edge‑case expressions with supported equivalents. Re‑run and confirm improvements.

The clip displays a code snippet in an interactive Spark session indicating that part of the query fell back, highlighting the affected operator and the fallback reason.

Figure: Spark Advisor fallback alerts for the Native Execution Engine

The video clip demonstrates a Spark query executed with the Native Execution Engine. A fallback warning is raised due to .show(), which internally calls collectLimit and toprettystring—operations not yet supported natively. Using .collect() instead avoids the unsupported path and runs the query without any fallbacks.

Ready to enable native execution?

The Native Execution Engine represents a leap forward in Spark performance on Microsoft Fabric. By blending the scalability and richness of Spark with the power of C++ vectorized execution, Fabric provides an enhanced Spark runtime at no additional cost.

You can enable it globally using the acceleration option at the environment settings and toggle it on and off at any point in Spark application code using the documented instructions.

To experience the impact for yourself, run the Apache Spark run series to compare results before and after enabling native execution. Use tools like Apache Spark advisor, Dataframe explain, and Spark History Server to investigate Fallbacks and validate execution behavior as you tune your workloads.

Now is the time to enable it and measure the gains. Try it in your environment, observe the performance improvements, and start optimizing your Spark pipelines with native execution today.

Related blog posts

Under the hood: an introduction to the Native Execution Engine for Microsoft Fabric

February 3, 2026 by Bogdan Crivat

As executives plan the next phase of their data and AI transformation, the bar for analytics infrastructure continues to rise. Enterprises are expected to support traditional business intelligence, increasingly complex analytics, and a new generation of AI-driven workloads—often on the same data, at the same time, and with far greater expectations for speed and cost … Continue reading “A turning point for enterprise data warehousing “

February 2, 2026 by Arindam Chatterjee

Coauthored by QiXiao Wang Building event-driven, real-time applications using Fabric Eventstreams and Spark Notebooks just got a whole lot easier. With the Preview of Spark Notebooks and Real-Time Intelligence integration — a new capability that brings together the open-source community supported richness of Spark Structured Streaming with the real-time stream processing power of Fabric Eventstreams … Continue reading “Bringing together Fabric Real-time Intelligence, Notebook and Spark Structured Streaming (Preview)”