Microsoft Fabric Updates Blog

Introducing High Concurrency Mode for Notebooks in Pipelines for Fabric Spark

We’re excited to introduce high concurrency mode for notebooks in pipelines, bringing session sharing to one of the most popular orchestration mechanisms for enterprise data ingestion and transformation. Notebooks will now automatically be packed into an active high concurrency session without compromising performance or security, while paying for a single session.

Key Benefits:

  • Faster Session Start: High concurrency mode offers a significantly faster session start experience, reducing time to ~5 seconds for shared notebooks. This is approximately 30 times faster than traditional methods, resulting in substantial performance gains in pipeline execution.
  • Session Tags: We’ve also introduced support for session tags, allowing users to target notebooks to specific high concurrency sessions for better session management.

Why Use High Concurrency Mode?

  • Rapid Spark Session Start: Notebook steps no longer need to wait for on-demand Spark pool spin-up when using custom pool configurations. By leveraging pre-warmed high concurrency sessions, notebook steps can quickly attach to an existing Spark session, significantly boosting overall pipeline performance.
  • Cost Savings: Achieve better compute cost savings by sharing a single session across multiple notebooks for your data engineering or data science workloads. You’ll only be billed for the single session, preventing potential queuing issues during peak usage hours.  

Example: Consider a pipeline with five notebook steps, each taking 5 minutes to execute. With traditional methods, starting a Spark session (3 minutes) for each step would result in a total runtime of approximately 40 minutes. Using high concurrency mode, this time can be reduced to 28 minutes, a 30% performance improvement.

How to Enable High Concurrency Mode for Notebooks in Pipelines?

To enable high concurrency mode for your Fabric Spark workspace, you need to follow these steps:

  1. Go to the workspace settings in your Fabric workspace.
A screenshot of a computer

Description automatically generated

2. Navigate to the Data Engineer/Science section.

3. Select the Spark Compute menu.

4. Navigate to the High concurrency tab.

5. Enable the option “For pipeline running multiple notebooks”.

6. Save your changes.

Once you enable high concurrency mode for pipelines in a workspace, all Spark sessions triggered by notebook steps in a pipeline will be High Concurrency sessions and the system automatically starts packing notebooks into the shared session.

High Concurrency Mode for Notebook Steps in Pipelines

By adopting high concurrency mode, you can enjoy faster pipeline execution, reduced costs, and improved overall efficiency for your data-driven workloads.

To learn more about using high concurrency for notebooks in pipelines please refer to our documentation, High Concurrency Mode for Notebooks in Pipelines

For more information on high concurrency mode, please read Overview of High Concurrency Mode in Microsoft Fabric

Relaterte blogginnlegg

Introducing High Concurrency Mode for Notebooks in Pipelines for Fabric Spark

oktober 31, 2024 av Jovan Popovic

Fabric Data Warehouse is a modern data warehouse optimized for analytical data models, primarily focused on the smaller numeric, datetime, and string types that are suitable for analytics. For the textual data, Fabric DW supports the VARCHAR type that can store up to 8KB of text, which is suitable for most of the textual values … Continue reading “Announcing public preview of VARCHAR(MAX) and VARBINARY(MAX) types in Fabric Data Warehouse”

oktober 29, 2024 av Dandan Zhang

Managed private endpoints allow Fabric experiences to securely access data sources without exposing them to the public network or requiring complex network configurations. We announced General Availability for Managed Private Endpoint in Fabric in May of this year. Learn more here: Announcing General Availability of Fabric Private Links, Trusted Workspace Access, and Managed Private Endpoints. … Continue reading “APIs for Managed Private Endpoint are now available”