Microsoft Fabric Updates Blog

How does Fabric make Spark Notebooks Instant?

Discover how Microsoft Fabric’s Forecasting Service system reduces Spark startup latency and cloud costs through proactive AI and ML-driven resource provisioning.

Context & Relevance

Waiting minutes for a Spark cluster to become available can throttle analytics velocity, delay insights, and drive-up cloud spend. In a world where data teams expect near‐instant execution and seamless burst capacity, that latency ultimately limits innovation.

Within Microsoft Fabric, a unified platform that supports integrated data engineering, analytics, and AI workloads, reducing startup latency while optimizing cost is mission critical. To address this challenge and enable virtuous scaling at cloud-optimized cost, we built Fabric Forecasting Service: a machine-learning backed, optimization-driven system for proactively managing starter pools so that compute is available just in time, and idle waste is minimized.

In this blog we explain the technical architecture, algorithms, implementation details and observed outcomes of Forecasting Service which is designed to serve scalable data science workloads in production at Microsoft scale.

If you use the default Starter Pool, a Spark session usually starts in few seconds. That’s not luck. Behind the scenes, Fabric keeps a small fleet of Spark clusters already running and continuously right-sizes that fleet so most requests land on a warm cluster. When traffic spikes, we refill the starter pool quickly. If the starter pool is briefly drained or your workspace needs special networking or environments, we fall back to on-demand start.

Why it Matters

  • For Data Engineers: Faster cluster spin-up and consistent execution times.
  • For Cloud Operators: Lower operational cost through predictive pooling.
  • For Product Teams: Improved SLA compliance and system resilience.

By integrating ML-driven provisioning into Fabric’s compute layer, Forecasting Service redefines how large-scale data platforms manage elasticity and performance at scale.

What you’ll notice as a user

  • Fast starts by default- With the Starter Pool and no extra libraries, notebooks typically start in a few seconds because the cluster and session already exist.
  • When it takes longer- Adding custom libraries or Spark properties, requires a short personalization step. If a starter pool is momentarily fully used, we create a new cluster.
  • Private Link or Managed VNet- Workspaces that don’t use Starter Pools (they run in dedicated networks), so starts are on-demand.

Typical cold-start ranges in these cases are ~2–5 minutes (plus time to install libraries if any).

For example:

  • Finance analysts experience inconsistent latency during market-hour data refreshes.
  • Product telemetry pipelines face SLA breaches due to cluster warm-up lag.

Traditional “static pooling” keeps clusters pre-warmed but wastes massive compute when demand dips. Forecasting Service closes this gap by balancing performance and cost dynamically.

A diagram of a computer mechanism

AI-generated content may be incorrect.

Solution Overview: What is Forecasting Service? 

Forecasting Service is Microsoft Fabric’s proactive resource provisioning engine that is built directly into big data infrastructure platform.

It uses a hybrid ML + optimization pipeline to predict demand patterns and auto-tune starter pools, maintaining optimal starter pool size based on real-time workloads.

Think of this as inventory management for clusters:

1. Keep starter pool of ready-to-use clusters/sessions. When you start a notebook and grab one, we immediately request another to re-hydrate the starter pool. That’s how we preserve the instant start.

2. Continuously right-size the starter pool. We forecast near-term demand from recent telemetry and then compute the target starter pool size that balances experience (no wait) against cost (idle time). The decision is a small, fast linear program that explicitly trades wait time vs idle time, so it’s explainable and easy to tune.

3. Act fast, recover fast. A pool worker recommends the latest target: if usage rises, we scale; when a starter pool instance is consumed, we re-hydrate without delay. The worker talks to our existing services that create clusters and sessions.

Pool hit you get a running starter pool instance.

Pool miss we create one; you see a short cold-start.

Architecture Overview- What runs behind the scenes

Architecture Overview- What runs behind the scenes

  • Starter Pool+ Re-hydration- We maintain a target number of ready clusters/sessions. Each time one is used, we immediately submit a create request to top the starter pool back up. The algorithm explicitly minimizes both customer wait and cluster idle time.
  • Predict, then optimize- A lightweight time‑series forecaster predicts the short‑term request rate. We use a hybrid (SSA+) approach centered on Singular Spectrum Analysis (SSA) with deep‑model enhancements and a cost‑aware loss; the predicted demand feeds a Sample Average Approximation (SAA) linear program that picks the target starter pool size. The end‑to‑end loop runs frequently and refreshes the resource recommendation.
  • Production architecture- Recommendations are stored centrally and read by a Pool Worker that calls our Big Data Infra Platform Services (orchestrates jobs/sessions and provisions and stitches VMs) to create/delete starter pool instances. Telemetry flows into the predictor; a simple hyper-parameter tuning loop runs less frequently to keep the cost – experience trade-off healthy.

Please refer to Intelligent Pooling paper for getting more information on design, comparison of models and ML algorithm choices published on VLDB 2024.

Key Innovations

  • Hybrid AI/ML Forecasting (SSA+)- Combines time-series forecasting (Singular Spectrum Analysis) with a shallow neural network to predict demand spikes with high accuracy and low latency.
  • Optimization Engine (SAA Optimizer)- Uses linear programming to minimize total idle (cost) and wait (latency) time, delivering Pareto-efficient balance between performance and COGS.
  • Self-Adaptive Hyperparameter Tuning- Continuously adjusts sensitivity thresholds to maintain SLA under shifting workload conditions.
  • Seamless Integration with Fabric Services- Tightly integrated with Big Data Infrastructure Platform Services for automatic starter pool creation, rehydration, and telemetry monitoring.

Components

  • ML Predictor- Fetches time-series data from Azure Data Explorer and predicts resource request rate.
  • SAA Optimizer- Computes target starter pool size using linear programming.
  • Forecasting Worker- Runs inference pipelines and persists recommendations to Azure Cosmos DB.
  • Pool Worker- Executes cluster creation/deletion via Big Data Infrastructure Platform and maintains starter pool equilibrium.
  • Telemetry Dashboard- Tracks pool hit rate, COGS, and latency metrics in real-time.

Results at Fabric Scale

Targeting a high pool-hit rate, this approach has shown reduction in idle cluster time versus static pre-provisioning, keeping experiences snappy while optimizing cutting COGS. It’s been deployed across all Fabric regions since Nov 2023.

A graph of a financial graph

AI-generated content may be incorrect.

Conclusion

Fabric Forecasting Service brings infrastructure intelligence to the heart of the analytics platform. Through forecasting, optimization and feedback-driven automation, Fabric unlocks near-instant compute availability while driving down cost.

The underlying principle: treat compute capacity as a first-class elastic resource, one that learns and adapts automatically, rather than remain a manual dial. This architecture empowers scalable data science and data engineering teams to iterate faster, reduce waste and deliver business impact more reliably.

References

Post Authors

Kunal Parekh, Senior Product Manager, Azure Data, Microsoft

Yiwen Zhu, Principal Researcher, Azure Data, Microsoft Research

Subru Krishnan, Principal Architect, Azure Data, Microsoft Spain

Aditya Lakra, Software Engineer, Azure Data, Microsoft

Harsha Nagulapalli, Principal Engineering Manager, Azure Data, Microsoft

Sumeet Khushalani, Princiapal Engineering Manager, Azure Data, Microsoft

Arijit Tarafdar, Principal Group Engineering Manager, Azure Data, Microsoft

Billets de blog associés

How does Fabric make Spark Notebooks Instant?

janvier 21, 2026 par Michal Bar

Turning questions into KQL queries just became part of Real-Time Dashboard tile editing experience, using Copilot. This new feature brings the power of AI directly into the tile editing workflow. When editing a tile, you’ll now see the Copilot assistant pane ready to help you turn natural language into actionable queries. Whether you’re new to … Continue reading “Introducing Copilot for Real-Time Dashboards: Write KQL with natural language”

janvier 8, 2026 par Adi Eldar

What if generating embeddings in Eventhouse didn’t require an external endpoint, callout policies, throttling management, or per‑request costs? That’s exactly what slm_embeddings_fl() delivers: a new user-defined function (UDF) that generates text embeddings using local Small Language Models (SLMs) from within the Kusto Python sandbox, returning vectors that you can immediately use for semantic search, similarity … Continue reading “Create Embeddings in Fabric Eventhouse with built-in Small Language Models (SLMs)”