Microsoft Fabric Updates Blog

Welcome to the September 2024 Update!

Announcements

We have a lot of exciting announcements to share with you for FabCon Europe! We’ve brought Copilot to Dataflows Gen2 and a richer Copilot experience when building and consuming Power BI Reports. With Real-Time Intelligence we have redesigned and enhanced user experience in the Real-Time hub.

We announced the general availability of Fabric Git integration. You can sync Fabric workspaces with Git repositories, leverage version control, and collaborate seamlessly using Azure DevOps or GitHub. We now have an enhanced and redesigned left navigation experience in Real-time Intelligence with the new Real-Time hub user experience.

In AI, we have released Copilot in Fabric experience for Dataflows Gen2 into general availability; allowing everyone to design dataflows with the help of an AI-powered expert. We also released Copilot in Fabric experience for Data Warehouse into preview. This AI assistant experience can help developers generate T-SQL queries for data analysis, explain and add in-line code comments for existing T-SQL queries, fix broken T-SQL code, and answer questions about general data warehousing tasks and operations.

To learn more, read about all these announcements, and more in Arun’s blog post Building an AI-powered data platform.

Contents

 

Monthly Update Video

 

Power BI

You can now choose from a variety of themes Power BI Desktop, including the most requested Dark Mode! You can personalize your data visualization experience to match your preferences and working environment. In addition, we’ve now consolidated similar options in the menu bar and streamlined the button text for better readability and responsive screen sizing.

The Copilot chat pane will now automatically provide text-based answers and summaries across all pages in a report. Previously, users had to specifically request cross-page summaries or click a “base summary on the entire report” button. With this update, cross-page summaries and answers are now the default setting, streamlining the exploration process.

Power BI has a transformative new feature designed to redefine how organizations manage and consume metrics and features visuals and Copilot insights Called Metrics Hub. Metrics Hub is an innovative metric layer within Fabric, aimed at helping organizations define, discover, and reuse trusted metrics with ease. This feature allows trusted creators within an organization to develop standardized metrics that incorporate essential business logic, ensuring consistency across the organization.

To learn more about these, and all the other new features this month, read the Power BI September 2024 Feature Summary.

 

Core

Announcing the availability of Trusted workspace access and Managed private endpoints in any Fabric capacity.

We’d like to share an update for the Fabric network security features that were announced in general availability earlier this year. Trusted workspace access, and Managed Private endpoints enable you to secure and optimize your data access and connectivity with Fabric and protect your business-critical data from unauthorized or unwanted access.

However, these features were available only in F64 or higher capacities. Based on your feedback, we are now making these features available in all F capacities. You can now use these features with any F capacity that suits your business needs. We are also making Managed Private endpoints available in Trial capacities as part of this release.

Here’s a quick recap of what these features do and how they can help you:

  • Trusted workspace access allows seamless and secure access to firewall enabled Azure storage accounts. It is designed to help you securely and easily access data stored in Storage accounts from Fabric workspaces, without compromising on performance or functionality. This feature extends the power and flexibility of OneLake shortcuts to work with data in protected storage accounts in place without compromising on security. You can also use this capability with Data pipelines and the COPY INTO feature of Fabric warehouses to ingest data securely and easily into Fabric workspaces. To get started with this feature and to learn about limitations, see Trusted workspace access in Microsoft Fabric – Microsoft Fabric | Microsoft Learn. This feature can be used in any F capacity.
  • Managed private endpoints provide secure connectivity from Fabric to data sources that are behind a firewall or not accessible from the public internet. Managed Private Endpoints enable Fabric Data Engineering items to access data sources securely without exposing them to the public network or requiring complex network configurations. Managed private endpoints are supported for various data sources, such as Azure Storage, Azure SQL Databases, and many others – the most recent addition being Azure Event Hub and Azure IOT Hub. To learn more about Managed Private Endpoints and supported data sources see Overview of managed private endpoints for Microsoft Fabric – Microsoft Fabric | Microsoft Learn. This feature can be used in any F capacity as well as Trial.

Multitenant organization (MTO) (public preview)

Fabric now supports Entra Id Multitenant Organizations (MTO). Many larger organizations have multiple Entra Id tenants for various reasons such as mergers and acquisitions, compliance and security boundaries, or due to complex organizational structure. The multitenant organizations capability in Entra Id synchronizes users across multiple tenants, adding them as users of type external member.

We are excited to announce public preview support for MTO. External members can now sign in to Fabric to consume and create content. MTO users can bring their own licenses from their home tenants. Users that have been assigned Power BI Pro or PPU licenses in their home tenants will not need to acquire an additional license for the other MTO tenants.

Click here for more information.

 

Announcing Git integration (generally available)

Fabric Git integration is now generally available! This feature allows you to sync workspaces with Git repositories, leverage version control, and collaborate seamlessly using Azure DevOps or GitHub. Though some items are still in preview, additional items will become generally available for Git integration over time.

Learn more about what’s available now and what’s coming next.

 

A new design for Deployment pipeline (preview)

Deployment Pipeline Redesign is now in preview! We are thrilled to announce the launch of Deployment pipelines redesign. This major update brings many improvements and new features designed to make your deployment process more efficient and user-friendly. We’ve reimagined the deployment pipeline from the ground up, ensuring that every aspect of your deployment workflow is optimized for performance and ease of use.

A screenshot of a computer

Description automatically generated

 

Homepage improvements

We are excited to share the latest enhancements to our homepage that are designed to streamline your experience and boost your productivity.

1. Enhanced Workspace Focus

The first improvement we’ve made is putting workspaces and related actions in the premium real estate on the homepage. The primary call to action is now dedicated to creating a workspace and navigating to recent ones.

In the Get started section, you can now create a workspace with a predesigned template called task flow. The task flow guides you to create specific items in a workspace. The idea is to encourage you to think in terms of projects and what you are trying to build end-to-end, instead of having to think about which workloads or items you need to use.

A screenshot of a computer

Description automatically generated

If you want to revisit any of your recent workspaces and continue where you left off, the Quick Access section is now optimized for efficient workspace navigation.

A screen shot of a phone

Description automatically generated

2. Collapsible Learn Section

We’ve also refined our Recommended section, transforming it into the Learn section. This new area is packed with sample materials and learning resources designed to help new users get up to speed quickly. Recognizing that these resources are particularly useful for newcomers, we’ve made the Learn section collapsible. For our experienced users, collapsing this section provides more space for the Quick Access area, ensuring a more streamlined experience.

A screenshot of a computer

Description automatically generated

 

Watch a demo for Core

 

OneLake

Access Databricks Unity Catalog tables from Fabric (public preview)

You can now access Databricks Unity Catalog tables directly from Fabric. In Fabric, you can now create a new data item called “Mirrored Azure Databricks Catalog”.

A screenshot of a computer

Description automatically generated

When creating this item, you simply provide your Azure Databricks workspace URL and select the catalog you want to make available in Fabric. Rather than making a copy of the data, Fabric creates a shortcut for every table in the selected catalog. It also keeps the Fabric data item in sync. So, if a table is added or removed from UC, the change is automatically reflected in Fabric.

Once your Azure Databricks Catalog item is created, it behaves the same as any other item in Fabric. Seamlessly access tables through the SQL endpoint, utilize Spark with Fabric notebooks and take full advantage of Direct Lake mode with Power BI reports.

To learn more about Databricks integration with Fabric, see our documentation here.

 

Google Cloud Storage shortcuts and S3 Compatible shortcuts (generally available)

GCS shortcuts and S3 Compatible shortcuts are now generally available. Utilize shortcuts in OneLake to quickly and easily make data accessible in Fabric. No need to set up pipelines or copy jobs, just create a shortcut and your data is immediately available in Fabric.

Don’t forget to enable shortcut caching, GCS and S3 Compatible shortcuts both support caching. This can be enabled in your workspace setting. By enabling shortcut caching, you can reduce your egress costs when accessing data across clouds or service providers.

GCS and S3 Compatible shortcuts also support the On-Premises Gateway. You can utilize the gateway to connect to your on-prem S3 compatible sources as well as GCS buckets that are protected by a virtual private cloud.

To learn more about shortcuts see our documentation here.

 

REST APIs for OneLake shortcuts (generally available)

We recently made big improvements to the REST APIs for OneLake shortcuts, including adding support for all current shortcut types and introducing a new List operation. With these improvements, you can programmatically create and manage your OneLake shortcuts. We’re excited to announce that these APIs are now Generally Available!

 

OneLake SAS (public preview)

Support for short-lived, user-delegated OneLake SAS is now in public preview. This functionality allows applications to request a User Delegation Key backed by an Entra ID, and then use this key to construct a OneLake SAS token. This token can be handed off to provide delegated access to another tool, node, or user, ensuring secure and controlled access.

 

Data Warehouse

Copilot for Data Warehouse (public preview)

Copilot for Data Warehouse in public preview! Copilot for Data Warehouse is an AI assistant that helps developers generate insights through T-SQL exploratory analysis. Copilot is contextualized to your warehouse’s schema. With this feature, data engineers and data analysts can use Copilot to:

  • Generate T-SQL queries for data analysis.
  • Explain and add in-line code comments for existing T-SQL queries.
  • Fix broken T-SQL code.
  • Receive answers regarding general data warehousing tasks and operations.

Learn more about Copilot for Data Warehouse. Copilot for Data Warehouse is currently only available in the Warehouse. Make sure you have Copilot enabled in your tenant and capacity settings to take advantage of these capabilities. Copilot in the SQL analytics endpoint is coming soon.

 

Delta column mapping in the SQL analytics endpoint (public preview)

SQL analytics endpoint now supports Delta tables with column mapping enabled for public preview. Column mapping is a feature of Delta tables that allows users to include spaces, as well as any of these characters, in the table’s column names: ,;{}()\n\t=. The extra characters in the column names are shown in the Lakehouse, brought through into the SQL Analytics endpoint, the semantic model, and into Power BI reports.

 

Enabling SQL analytics endpoint on schema enabled Lakehouse’s (public preview)

We are enabling the SQL analytics endpoint on schema enabled lakehouses. This allows delta tables in schemas to be queried in the SQL analytics endpoint.

 

New editor improvements for Data Warehouse and SQL analytics endpoint

We are excited to share key improvements to our new editor in Fabric Data Warehouse and SQL analytics endpoint to improve the consistency and efficiency of SQL developers’ experiences!

Starting with the ribbon, the actions presented would change depending on your context in our previous editor. This lacked consistency with other Fabric tools and required extra click stops to launch a certain experience from the ribbon. To make the ribbon more intuitive, we have improved our ribbon to be unified by consolidating all dev tools in a single location split by two tabs for a streamline workflow.

The Home and Reporting tab now consists of features for an end-to-end developer experience with no overlaps and no added actions depending on your context.

Our new data grid within data preview and displaying query results now provides added capabilities at the column level including sort by ascending and descending order, select, and specify values to filter out per column along with many more features to come. These capabilities can help you quickly analyze and filter your data without running any T-SQL.

A screenshot of a computer

Description automatically generated

We’ve listened to your feedback on scenarios where you need to look at multiple experiences at once when writing queries. For example, checking on the data in the tables and the relationships between tables in the BI models to decide how to better structure the query. Within the editor, multitasking between dynamic tabs is now supported between different experiences such as data preview, querying, and modeling for a more efficient data analyzing experience.

The multitasking navigation between warehouses and SQL analytics endpoints has also been improved so that whether you’re on a data preview, SQL query, or modeling tab, you can smoothly transition between warehouses/SQL analytics endpoints, and it persists in your last activity.

A screenshot of a computer

Description automatically generated

 

Database Migration Experience (private preview)

We are excited to announce the opening of a Private Preview for a new Migration Experience. Designed to accelerate the migration of SQL Server, Synapse dedicated SQL pools, and other warehouses to the Fabric Data Warehouse, users will be able to migrate the code and data from the source database, automatically converting the source schema and code to Fabric Data Warehouse, helping with data migration, and providing AI powered assistance.

Please contact your Microsoft account team if you are interested in joining the preview.

 

TSQL Notebook (public preview)

You can now use Fabric Notebooks to develop your Fabric warehouse and consume data from your warehouse or SQL analytics endpoint. The ability to create a new notebook item from the warehouse editor lets you carry over your warehouse context into the notebook and use rich capabilities of notebook to run T-SQL queries. T-SQL notebook enables you to execute complex T-SQL queries, visualize results in real-time, and document your analytical process within a single, cohesive interface. The embedded rich T-SQL IntelliSense and easy gestures like Save as table, Save as view or Run selected code provides familiar experiences in the notebook experience to increase your productivity.

Learn more here.

A screenshot of a computer

Description automatically generated

 

 

Nested Common Table Expression (public preview)

Fabric Warehouse customers now can use Nested Common Table Expression (NCTE) to deconstruct ordinarily complex queries into smaller reusable blocks. NCTE simplifies complex query code, improves query readability, and query code investigation. With this addition, Fabric Warehouse now supports three types of CTE. They are standard, sequential, and nested CTE.

  • A standard CTE doesn’t reference or define another CTE in its definition.
  • A nested CTE’s definition includes defining another CTE.
  • A sequential CTE’s definition can reference an existing CTE but can’t define another CTE.

Watch the Data Warehouse demo

 

Data Engineering

High Concurrency mode for Notebooks in Pipelines (public preview)

We are excited to announce the public preview of High Concurrency mode for Notebooks in Pipeline. This new feature in Microsoft Fabric enables users to share Spark sessions across multiple notebooks within a pipeline.

Pipelines are primarily used for orchestrating data engineering tasks for production workloads and scheduled jobs. For enterprise data teams, optimizing resource utilization and achieving the best price-performance ratio is crucial for faster job start times and efficient compute usage.

With High Concurrency Mode, users can trigger pipeline jobs, and these jobs are automatically packed into existing high concurrency sessions. Subsequent notebook steps benefit from a 5-second session start experience, even with custom compute configurations and custom pools, resulting in a 30x performance boost and instant session start.

A screenshot of a computer

Description automatically generated

Note: Session sharing is always confined to a single user, workspace, and pipeline boundary. Sessions are selected based on matching compute configurations, library management dependencies, and file system dependencies.

Learn more about high concurrency mode for notebooks in pipelines from our documentation.

 

Workspace Level Setting to Reserve Maximum Cores for Jobs in Fabric Data Engineering (public preview)

We’re pleased to introduce a new workspace-level setting that allows you to reserve maximum cores for your active jobs for Spark workloads. By default, optimistic job admission is enabled in all workspaces, and it allows jobs to start with their minimum node configuration and scale up to multiple executors based on available capacity.

A close-up of a computer screen

Description automatically generated

In cases where there are excessive jobs running, pushing the capacity to its maximum limits, scale-up requests may be rejected.

For enterprise customers requiring absolute maximum core reservations for critical jobs, you can now enable a compute reservation model. Workspace administrators can enable this option by navigating to the Data Engineering/Science section of the workspace settings and activating the “Reserve maximum cores for active Spark jobs” setting.

A screenshot of a computer

Description automatically generated

Once enabled, the maximum auto scale size of the Spark pool is considered during job admission. Even if a job is currently running with 2 nodes, if the pool’s maximum limit is 5, the other 3 nodes will be reserved for the job’s potential growth throughout its lifetime. This will ensure that each job that’s submitted has the cores available for it to grow its maximum scale. Users now have the flexibility to choose between optimistic job admission and compute reserved mode, tailoring their workspace settings to meet specific workload requirements.

Learn more about settings to reserve maximum cores for your Fabric data engineering jobs from our documentation.

 

Session Expiry Control in Workspace Settings for Notebook Interactive Runs (public preview)

We’re pleased to introduce a new session expiry control in Data Engineering/Science workspace settings. This feature empowers administrators to set the maximum expiration time limit for notebook interactive sessions. Notebooks are a popular choice for interactive querying, and by default, sessions expire after 20 minutes.

A screenshot of a computer

Description automatically generated

With this new setting, you can now customize the maximum expiration duration, helping to prevent unused sessions from consuming unnecessary capacity and potentially impacting the performance of new incoming job requests.

A screenshot of a computer

Description automatically generated

If users require additional time, they can extend the session duration using the “extended session time” option available in the monitoring status view within the notebook experience.

Learn more about session expiry settings for interactive notebooks sessions from our documentation.

 

Spark Connector for Fabric DW – New Features

Recently, we launched Fabric Spark connector for Fabric Data Warehouse (DW) in Fabric Runtime to empower Spark developers or data scientists to access and work on data from Fabric DW and SQL analytics endpoint of the lakehouse (either from within the same workspace or from across workspaces) with a simplified Spark API. This initial version supported reading data from a table or view only in Scala. We are happy to announce that we have released these additional capabilities:

  • Support for custom or pass-through query
  • Support for PySpark
  • Support for Fabric Runtime 1.3 (Spark 3.5)

To learn more about Spark Connector for Fabric Data Warehouse (DW) with its recent updates, please refer to the documentation at: Spark connector for Fabric Data Warehouse.

 

 

T-SQL Notebook (public preview)

T-SQL notebook is now available for public preview. This enhancement broadens our language support, extending from a Spark-centric approach to including T-SQL as well. With this update, T-SQL developers can now utilize Notebook for crafting their T-SQL queries to develop a warehouse. They can organize extensive queries into separate code cells and use Markdown cells for enhanced documentation, offering more comprehensive documentation experience.

Just like we can add Lakehouse into the notebook, it’s enabled to add a Data Warehouse or SQL analytics endpoint into the notebook. This allows you to run T-SQL code directly against the connected warehouse or SQL analytics endpoint. BI Analysts can also take advantage of the update by utilizing T-SQL Notebook to execute cross-database queries. This will enable them to compile business insights from various warehouses and SQL analytics endpoints.

Most of the existing features are directly accessible for T-SQL notebooks. For instance, T-SQL developers can take advantage of comprehensive charting tools to visualize the results of their T-SQL queries, as well as collaborate with colleagues to jointly develop the notebook. (collaborate in a notebook)

To create a T-SQL notebook from an existing data warehouse, a new entry named “New SQL query in notebook” is added under the “New SQL query” menu group. This action generates a new Notebook in the same workspace, with the data warehouse automatically added into it.

A screenshot of a computer

Description automatically generated

You can create a notebook code cell with T-SQL as the language and run the query against the connected Data Warehouse.

A screenshot of a computer

Description automatically generated

 

Fabric Spark Diagnostic Emitter: Collect Logs and Metrics (public preview)

Fabric Apache Spark Diagnostic Emitter is now in public preview, a powerful new feature that allows Apache Spark users to collect logs, event logs, and metrics from their Spark applications and send them to various destinations, including Azure Event Hubs, Azure Storage, and Azure Log Analytics. This feature provides robust support for monitoring and troubleshooting Spark applications, enhancing your visibility into application performance.

What Does the Diagnostic Emitter Do?

The Fabric Apache Spark Diagnostic Emitter enables Apache Spark applications to emit critical logs and metrics that can be used for real-time monitoring, analysis, and troubleshooting. Whether you’re sending logs to Azure Event Hubs, Azure Storage, or Azure Log Analytics, this emitter simplifies the process, allowing you to collect data seamlessly and store it in the destination that best suits your needs.

Key Benefits of the Apache Spark Diagnostic Emitter

  • Centralized Monitoring: Send logs and metrics to Azure Event Hubs, Azure Log Analytics, or Azure Storage for real-time data streaming, deep analysis and querying, as well as long-term retention.
  • Flexible Configuration: Easily configure Spark to emit logs and metrics to one or more destinations, with support for connection strings, Azure Key Vault integration, and more.
  • Comprehensive Metrics: Collect a wide range of logs and metrics, including driver and executor logs, event logs, and detailed Spark application metrics.

To learn more, see:

Environment Artifact integration with Synapse VS Code extension

Microsoft Fabric environments is a consolidated item for all your hardware and software settings. In an environment, you can select different Spark runtimes, configure your compute resources, install libraries from public repositories or local directories and more.

To learn more, see Create, configure, and use an environment in Fabric.

By supporting the Environment item within the Synapse VS Code extension, you can explore and manage the Environment from VS Code side. Expanding the new node of the Environment within VS Code, you can see all the Environment items from the selected workspace and easily identify the workspace default one.

Screenshot showing environment artifact list.

You can switch the workspace default to some other item by hovering over the environment and selecting the Set Default Workspace Environment button.

Screenshot showing change workspace default environment.

Hovering the environment and selecting the Inspect button, the environment details should be displayed in the right panel in JSON format.

Screenshot showing inspect environment artifact.

You can find out the association between the code item, such as Notebook, and the environment from the code item detail property panel.

Screenshot showing check environment association.

 

Notebook debug within vscode.dev (public preview)

Vscode.dev is a lightweight version of VS Code running fully in the browser. We released the extension Synapse VS Code -Remote last year to enable the edit and run Fabric Notebook within vscode.dev. We are excited to announce the introduction of the debug feature for Notebook today. This update brings us closer to achieving a pro-dev experience with Notebook.

Once you’ve opened the Notebook through the VS Code (Web) interface, you can place breakpoints and debug your Notebook code, which is helpful for troubleshooting.

This update first starts with the Fabric Runtime 1.3

A screenshot of a computer

Description automatically generated

A screenshot of a computer program

Description automatically generated

 

Adding Python support in Fabric User Data Functions

Introducing support for Python functions in Fabric User Data Functions. Private preview users can now create and run functions that will run on Python version 3.11. This enables users to leverage powerful Python libraries such as pandas, numpy, seaborn and more. These functions are compatible with the existing Fabric data source integrations, so users can interact with Data Warehouses as well as Lakehouses.

A screenshot of a computer

Description automatically generated

This feature is available for users of the Private Preview of User Data Functions. To participate in this preview, please fill out the application form.

 

Invoke Fabric User Data Functions in Notebook

You can now invoke User Defined Functions (UDFs) in your PySpark code directly from Microsoft Fabric Notebooks or Spark jobs. This integration makes it easy to call reusable functions across different workspaces, reducing the need to rewrite or refactor code.

With NotebookUtils integration, invoking UDFs is as simple as writing a few lines of code. Whether you’re working in a Notebook or running batch jobs using Spark Job Definitions, the process is incredibly straightforward.

Below is an example of invoking UDFs in Notebooks.

A computer screen shot of a code

Description automatically generated

 

Functions Hub is now available in Fabric User Data Functions

Functions Hub provides a single location to view, access and manage your User Data Functions. Users can access this feature in the left navigation bar, or in the Data Engineering experience.

A screenshot of a computer

Description automatically generated

In Functions Hub, users can see all the functions they have access to and perform actions such as manage access permissions, create a new function, view lineage and more. This is helpful to manage User Data Functions artifacts on scale.

A screenshot of a computer

Description automatically generated

This feature is available for users of the Private Preview of User Data Functions. To participate in this preview, please fill out the application form.

 

Support for spaces in Lakehouse Delta table names

We are pleased to announce the support for spaces in Lakehouse Delta table names, enhancing usability for all users across Fabric. This feature allows you to create and query Delta tables with spaces in their names, such as “Sales by Region” or “Customer Feedback”. This aligns with the expectations of Power BI and SQL customers, who are used to working with spaces in table names. Now, you can seamlessly use spaces in table names across Lakehouse Explorer and its experiences, Spark, Shortcuts, and other experiences.

All Fabric Runtimes and Spark authoring experiences support table names with spaces. Code completion authoring works as expected, properly escaping the table names with the required backtick characters.

 

Fabric Runtime 1.3 GA

We are extremely excited to announce the advancement of Fabric Runtime 1.3 from Public Preview to General Availability. Our Apache Spark-based big data execution engine, optimized for both data engineering and data science workflows, has been fully updated and seamlessly integrated into the Fabric platform, along with all our components.

With this announcement, all new workspaces will, by default, be based on the latest GA runtime version, which is now Fabric Runtime 1.3. Existing workspaces will remain in their current version; however, we strongly encourage you to migrate to Runtime 1.3 as soon as possible to take full advantage of the newest functionalities.

A screenshot of a computer

Description automatically generated

The latest enhancements in Fabric Runtime 1.3 include the upgrade to Delta Lake 3.2, updates and upgrades to Python libraries, and improvements to the R language. Additionally, we’ve implemented query-specific optimizations to further enhance performance and efficiency.

Read more here.

 

Native Execution Engine on Runtime 1.3 (public preview)

Native Execution Engine for Fabric Runtime 1.3 is now available in public preview. Previously, the Native Execution Engine supported earlier versions of Runtime 1.3, but it is now fully compatible with the latest GA runtime version, which is also based on Delta Lake 3.2.

The Native Execution Engine can significantly enhance the performance of your Spark jobs and queries. This engine has been completely rewritten in C++, operates in columnar mode, and utilizes vectorized processing. It offers superior query performance across data processing, ETL, data science, and interactive queries—all directly on your data lake. Importantly, this engine is fully compatible with Apache Spark™ APIs, including the Spark SQL API.

The current release of the Native Execution Engine excels particularly in the following scenarios:

  • Working with data in Parquet and Delta formats.
  • Handling queries that involve complex transformations and aggregations, leveraging the engine’s columnar processing and vectorization capabilities.
  • Running computationally intensive queries, as opposed to simple or I/O-bound operations.

Best of all, no code changes are required to take advantage of the Native Execution Engine.

Read more here.

 

Acceleration tab and UI enablement for the Native Execution Engine

No code changes are required to speed up the execution of your Apache Spark jobs when using the Native Execution Engine.

You have the flexibility to activate the Native Execution Engine either through your environment settings or selectively for an individual notebook or job. By enabling this feature within your environment settings, all subsequent jobs and notebooks associated with that environment will automatically inherit this configuration.

A screenshot of a computer

Description automatically generated

We are also excited to share that enabling the Native Execution Engine is now even more accessible. You can activate it not only through Spark settings but also via the Environment tab, specifically under the “Acceleration” section in the UI, ensuring a seamless and straightforward enablement process.

A screenshot of a computer

Description automatically generated

 

Fabric Spark Runtimes Release Notes

We are committed to ensuring that our Microsoft Fabric Spark runtimes remain at the forefront of performance and security. In our ongoing efforts, we occasionally introduce updates that may affect your experience. While we understand the critical nature of these updates, we recognize the challenges in keeping up with changes, identifying affected components, and understanding the rationale behind these modifications.

To enhance your experience, we are now providing detailed, automated release notes for each Apache Spark-based runtime we deliver. These notes will be readily accessible, offering complete transparency and clarity regarding the updates, including their purpose. This will empower you to seamlessly adapt your workloads and fully leverage the benefits of each update.

A screenshot of a computer program

Description automatically generated

For the most current information, including a comprehensive list of changes and specific release notes for each runtime version, we encourage you to check and subscribe to Spark Runtimes Releases and Updates.

 

Enable/Disable Functionality in API for GraphQL

The Enable/Disable feature for queries and mutations in Microsoft Fabric’s GraphQL API provides administrators and developers with granular control over API access and usage. This functionality allows you to selectively activate or deactivate specific queries and mutations within your GraphQL schema, giving you the ability to manage API capabilities dynamically without altering the underlying code or deploying changes.

By leveraging this feature, you can permanently or temporarily disable certain operations for maintenance, gradually roll out new functionality, or restrict access to sensitive data operations as needed. This level of control enhances security, aids in API versioning, and provides flexibility in managing your GraphQL API’s behavior to align with your application’s evolving requirements and operational needs.

To disable a query or mutation within a GraphQL item, simply select the ellipses next to the query or mutation you want to disable. Then select the Disable option from the pop-up menu.

A screenshot of a computer

Description automatically generated

If a query or mutation has been disabled, you will see it grayed out on the Schema Explorer. You can enable it by simply choosing Enable from the entries’ pop-up menu.

A screenshot of a computer

Description automatically generated

 

Public REST API of Livy Endpoint

The Fabric Livy endpoint lets users submit and execute their Spark code on the Spark compute within a designated Fabric workspace, eliminating the need to create any Notebook or Spark Job Definition artifacts. This integration with a specific Lakehouse artifact ensures straightforward access to data stored on OneLake. Additionally, Livy API offers the ability to customize the execution environment through its integration with the Environment artifact.

When a request is sent to the Fabric Livy endpoint, the user-submitted code can be executed in two different modes:

Session Job:

  • A Livy session job entails establishing a Spark session that remains active throughout the interaction with the Livy API. This is particularly useful for interactive and iterative workloads.
  • A Spark session starts when a job is submitted and lasts until the user ends it or the system terminates it after 20 minutes of inactivity. Throughout the session, multiple jobs can run, sharing state and cached data between runs.

Batch Job:

  • A Livy batch job entails submitting a Spark application for a single execution. In contrast to a Livy session job, a batch job does not sustain an ongoing Spark session.
  • With Livy batch jobs each job initiates a new Spark session, which ends when the job finishes. This approach works well for tasks that don’t rely on previous computations or require maintaining state between jobs.

The endpoint of the session job API would look like:

https://api.fabric.microsoft.com/v1/workspaces/ws_id/lakehouses/lakehouse_id /livyapi/versions/2023-12-01/ batches

The endpoint of the session job API would look like: https://api.fabric.microsoft.com/v1/workspaces/ws_id/lakehouses/lakehouse_id /livyapi/versions/2023-12-01/sessions

ws_id: this is the workspace in which the hosting Lakehouse artifact belongs to. lakehouse_id: this is the artifact id of the hosting Lakehouse. All the capacity consumption history from this batch API call will be associated with this artifact.

To access the Livy API endpoint, you need to create a Lakehouse artifact. After it’s set up, you’ll find the Livy API endpoint in the settings panel.

A screenshot of a computer

Description automatically generated

The “Recent Run” section of the Lakehouse will now display all submitted requests. Click on the “Application name” column to open the detailed monitoring page and view additional logs.

Typically, the Livy request is directed to the workspace starter pool by default. However, users can customize the execution by including the environment artifact ID in the payload body, utilizing the settings from the specified environment artifact.

In this example, the value of the id is the artifact id of the environment, the workspaceId is the id of the current workspace. In the current version, the environment and Lakehouse artifacts should belong to the same workspace.

{

"conf": {

"spark.fabric.environmentDetails": "{\"id\":\"558bd4a3-5107-413e-ad6d-48a4b80678f6\",\"workspaceId\":\"ea0f47a3-8f30-4cf9-bfb6-0b3af37d9eb8\"}"

}

}

 

Watch the Data Engineering demos

Data Science

Announcing Public Preview: Share Feature for Fabric AI Skill

The highly anticipated feature for Fabric AI Skill, the “Share” capability is now in public preview. This powerful addition allows you to share the AI Skill with others using a variety of permission models, providing you with complete control over how your AI Skill is accessed and utilized.

With this new feature, you can:

  • Co-create: Invite others to collaborate on the development of your AI Skill, enabling joint efforts in refining and enhancing its functionality.
  • View Configuration: Allow others to view the configuration of your AI Skill without making any changes.
  • Query: Enable others to interact with the AI Skill to obtain answers to their queries.

A screenshot of a computer

Description automatically generated

Additionally, we are introducing flexibility in managing versions. You can now switch between the published version and the current version you are working on. This feature facilitates performance comparison by running the same set of queries, providing valuable insights into how your changes impact the AI Skill’s effectiveness.

We’ve also refined the publishing process. You can now include a description that outlines what your AI Skill does. This description will be visible to users, helping them understand the purpose and functionality of your AI Skill.

We are excited for you to explore these new capabilities and look forward to hearing about your experience. Your feedback is crucial as we continue to enhance your experience with Fabric AI Skill.

 

Data Wrangler now supports Spark DataFrames and PySpark code generation (generally available)

Data Wrangler, a notebook-based tool for exploratory data analysis, now works for both pandas DataFrames and Spark DataFrames in general availability. You can use the tool to explore and transform data in your notebook with an immersive visual interface, generating either Python code or PySpark code in just a few steps. If you’ve used Data Wrangler before, the steps will be familiar. You can begin by opening any active pandas or Spark DataFrame from the “Data Wrangler” prompt in the notebook ribbon.

A screenshot of a computer

Description automatically generated

Data Wrangler will automatically convert Spark DataFrames to pandas samples for performance reasons. However, all the generated code will ultimately be translated to PySpark when you save it back to your notebook. (You can also modify the sample size and sampling method by selecting “Choose custom sample” from the Data Wrangler dropdown.) As with a pandas DataFrame, Data Wrangler will show you descriptive insights about your data in the right-hand “Summary” panel and in each column’s header.

A screenshot of a computer

Description automatically generated

You can use the left-hand “Operations” panel to browse and apply common data-cleaning transformations. Selecting one will prompt you to provide a target column or columns, along with any necessary parameters. As you fill in those values, a preview of the applied operation, along with the corresponding code, will be automatically generated.

A screenshot of a computer

Description automatically generated

If you apply the previewed step, Data Wrangler’s display grid and summary statistics update to reflect the results. The code appears in a running list of committed operations.

A screenshot of a computer

Description automatically generated

After applying a series of steps, you can copy or save the generated code using options in the toolbar above the display grid.

A screenshot of a computer

Description automatically generated

For Spark DataFrames, all the code generated on the pandas sample is translated to PySpark before it lands back in the notebook. Data Wrangler will display a preview of the translated PySpark code and provide an option to export the pandas code as well.

A screenshot of a computer

Description automatically generated

 

Announcing new usability improvements for Data Wrangler

You may have noticed that Data Wrangler has a new entry point under the “Home” tab of the Fabric notebook ribbon. We’re excited to share additional usability updates designed to improve your experience.

Launch Data Wrangler directly from a cell
You can now open Data Wrangler directly from a notebook cell when you use the Fabric display() command to print a pandas or PySpark DataFrame. Just click on the “Data Wrangler” prompt in the interactive display output.

A screenshot of a computer

Description automatically generated

Work faster with improved performance
Thanks to new engineering updates, you’ll notice that Data Wrangler takes less time to apply an operation once you’ve loaded the preview. We’ve cut down the latency to speed up your workflow.

A screenshot of a computer

Description automatically generated

Modify your display using the new “Views” prompt
Data Wrangler works best on large monitors, but you can now use the “Views” dropdown above the display grid to minimize or hide parts of the interface based on your preferences or screen size.

A screenshot of a computer

Description automatically generated

 

File editor in Notebook

We are excited to share the launch of the file editor feature in Fabric Notebook. This new feature allows users to view and edit files directly within the notebook’s resource folder and environment resource folder in notebook. Supported file types include CSV, TXT, HTML, YML, PY, SQL, and more.

With the file editor, you can:

  • View and edit files: Easily access and modify files within the notebook.
  • Keyword highlighting for module files: Provide necessary language service when opening and editing code files like .py and .sql.
  • Enhanced user experience: Integrated with the notebook pane switcher to help you easily navigate different features inside notebook.

A screenshot of a computer

Description automatically generated

 

Watch the Data Science demos

 

Real-time Intelligence

Real-Time Intelligence is a powerful service that empowers everyone in your organization to extract insights and visualize their data in motion. With capabilities across ingestion, processing, transformation, analytics, visualization and action, Real-Time Intelligence transforms your data into a dynamic, actionable resource that drives value across the entire organization.

Learn more here.

 

Creating a Real time Dashboard by Copilot

From the list of tables in Real-Time hub, users can click on the three dots menu and select create real-time dashboard. Copilot will review the table and automatically create a dashboard with two pages, one with insights about the data in the table and one page that contains a profile of the data with a sample, the table schema and more details about the values in each column. This can be further explored and edited to make it easy for users to find insights on their time-series data without having to write a single line of code.

A screenshot of a computer

Description automatically generated

Insight page:

A screenshot of a graph

Description automatically generated

Profile page:

A screenshot of a computer

Description automatically generated

 

Adding a real-time dashboard to an org app

Real-time dashboards can be included in the new version of org app. The org app can include PBI reports, PBI paginated reports, Notebooks and Real time dashboards. This makes it very easy to distribute RTD to large user cohorts outside a specific Workspace.

An org app with a PB report and an RTD:

A screenshot of a graph

Description automatically generated

 

Introducing A New Real-Time Hub User Experience

We are thrilled to announce the launch of the new Real-Time Hub user experience (UX), a redesigned and enhanced experience that will help you get the most out of your real-time data. Whether you want to monitor, process, analyze, or act on your data streams, the new UX will make it easier and faster than ever.

What’s new?

A new left navigation provides you with seamless access to all your data-in-motion. This includes both the data streams that you have access to and the ones that you have integrated into Fabric, as well as Fabric events and Azure resources. With just a few clicks, you can now effortlessly navigate through these resources in a more user-friendly way.

new left navigation for Real-Time hub

  • A new page called “My Streams” where you can bring in external events from various sources and integrate them with your real-time data. You can use My Streams to create custom streams that combine data from different sources, including Microsoft sources, Database CDC, and Fabric events.

My Streams menu item on the new left navigation of Real-Time hub page

  • “Get Events” button is now rebranded as “Add source”, which aligns with the value proposition of Real-Time hub to allow customers to not only discover, manage, and consume events, but also all the other types of data-in-motion.

rebranded "Add source" buttons for previously known "Get Events"

  • Four new Eventstream connectors have been introduced into the Real-Time hub. Now you can stream data from Azure SQL MI DB (CDC), SQL Server on VM DB (CDC), Apache Kafka, and Amazon MSK Kafka. These sources will enable you to enrich your real-time data with historical and analytical data from your data warehouse or database. To learn more about these four connectors, continue reading ‘New Stream Sources Now Available’ below.

A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated

Simply sign in to Microsoft Fabric and start experiencing the new Real-Time hub UX!

 

New Streaming Sources Now Available via Eventstream Connectors

We’re excited to announce the expansion of our Eventstream Connectors with four powerful new streaming sources, enabling even more seamless data streaming across your ecosystem:

  • Azure SQL MI CDC: Capture data changes in your Azure SQL Managed Instance (MI) database and stream them into Eventstream for real-time processing, analysis, and monitoring.
  • SQL Server on VM CDC: Capture data changes in your SQL Server on Virtual Machines and stream them directly into Eventstream for processing, analysis, and monitoring.
  • Apache Kafka: Easily stream data from Apache Kafka cluster into your Eventstream, enabling unified processing and analysis across platforms.
  • Amazon MSK Kafka: Seamlessly stream data from Amazon Managed Streaming for Apache Kafka (MSK) into Eventstream and perform real-time processing.

These new sources empower you to build richer, more dynamic Eventstreams, ensuring that your real-time streaming applications can fully leverage the reporting and analysis capabilities in Fabric.

A screenshot of a computer

Description automatically generated

 

Introducing Eventhouse as a new Destination in Eventstream

As part of our ongoing efforts to enhance Eventstream’s capabilities, we are excited to introduce Eventhouse as a new destination for your data streams. Eventhouses, equipped with KQL Databases, are designed to handle and analyze large volumes of data, particularly in scenarios that demand real-time analytics and exploration. With the Eventhouse destination in Eventstream, you can efficiently process and route data streams into an Eventhouse and analyze the data in near real-time using KQL.

A screenshot of a computer

Description automatically generated

 

 

Eventstream’s Integration with Managed Private Endpoint

We’re excited to introduce the Private Network feature for Fabric Eventstream! With Fabric’s Managed Private Endpoint, you can now establish a private connection between your Azure services, such as Azure Event Hub, and Fabric Eventstream. This integration ensures your data is securely transmitted within a private network, allowing you to explore the full power of real-time streaming and high-performance data processing that Eventstream offers.

The diagram below shows how Eventstream pulls data from your Azure Event Hub within a virtual network using Fabric’s Managed Private Endpoint.

A diagram of a data flow

Description automatically generated

To learn more about Managed Private Endpoint, visit here: Overview of managed private endpoints for Microsoft Fabric – Microsoft Fabric | Microsoft Learn

 

Introducing the new look and feel of KQL Database

We’re excited to unveil the newly redesigned KQL DB experience in Microsoft Fabric Real Time Intelligence! This update brings a clean, modern interface packed with powerful features to help you work smarter, whether you’re managing several databases, dealing with streaming data, or exploring massive datasets.

A UI Tailored to Your Workflow

The new KQL DB interface is built to accommodate the diverse needs of data professionals. Whether you’re exploring a simple database or navigating one with hundreds of tables, the design remains intuitive and adaptable.
A screenshot of a computer

Description automatically generated

An Enhanced Database Page Experience

The new KQL database page has undergone a significant transformation. When you open the Tables tab of the database, you’re greeted with visually rich cards (tiles) that present key metrics briefly.4 These tiles display ingestion trends, data size, recent active users, and the availability status in OneLake. For those who prefer more detail, you can switch to the list view, offering a comprehensive table-focused perspective.

Exploring your data is now more interactive with the updated Data Preview tab. Here, you can instantly see the most recent records across all tables in a convenient JSON view, allowing for quick insights without needing to dive too deep.

If you’re looking to drill down further, the updated histograms and time filters let you see the overall data ingestion trend in more depth.

A screenshot of a computer

Description automatically generated

Diving Deeper into Tables

The individual table view has seen improvements, too. The Data Preview of a table lets you explore records with easy-to-use interface as the database-level preview. Also, the Schema tab gives you a complete picture of the data structure, including distribution stats and column details.

Histograms in the table view bring your data to life, allowing you to explore different time ranges and customize binning to suit your analysis needs. All this is wrapped in a design that’s both accessible and easy to navigate.

A screenshot of a computer

Description automatically generated

Flexible Views for Any Data Scope

The new look and feel of KQL DB in Microsoft Fabric offers a more intuitive, powerful, and visually engaging way to work with your data. We can’t wait for you to try it out and see how these enhancements elevate your experience!

To read more about the new KQL DB look and feel, check out the documentation.

 

Set alerts on KQL Querysets with Data Activator triggers

Now you can set up Data Activator alerts directly on your KQL queries in KQL querysets. Whether you’re tracking real-time metrics or keeping an eye on important logs, this feature makes sure you get notified the moment something important happens.

Set Alerts Based on KQL Query Results or Specific Conditions

With this new feature, you can set alerts to trigger based on specific results or conditions from a scheduled KQL query.

For example, if your KQL DB tracks application logs, you can configure an alert to notify you if the query, scheduled at a frequency of your choice (e.g., every 5 minutes), returns any logs where the message field contains the string “error”.

This feature also lets you monitor live data trends by setting conditions on visualizations, similar to how you can set alerts on visuals within Real-Time Dashboards. For instance, if you visualize sales data distribution across product categories in a pie chart, you can set an alert to notify you if the share of any category drops below a certain threshold. This helps you quickly identify and address potential issues with that product line.

A screenshot of a computer

Description automatically generated

You can choose whether to receive alerts via email or Teams messages when the condition is met.

To read more about setting alerts for KQL querysets, check out the documentation.

 

Data Exploration made easier with Top values feature

The “Top Values” feature in our no-code data exploration UI is available for Real-Time Dashboards’ tiles! This new component makes it easier than ever to explore and understand your data without writing a single line of code. Instantly view the top values for each column in your query result set, allowing you to see value distributions and gain insights into the nature of your data. Plus, seamlessly add new filters directly from this component to refine your analysis.

A screenshot of a computer

Description automatically generated

Screenshot of a screenshot of a website

Description automatically generated

 

Real-Time Dashboard lower than ever refresh rate

We are pleased to share an enhancement to our dashboard auto refresh feature, now supporting continuous and 10 seconds refresh rates, in addition to the existing options.

This upgrade, addressing a popular customer request, allows both editors and viewers to set near real-time and real-time data updates, ensuring your dashboards display the most current information with minimal delay. Experience faster with data refresh and make more timely decisions with our improved dashboard capabilities.

As the dashboard author you can enable the Auto refresh setting and set a minimum time interval, to prevent users from setting an auto refresh interval smaller than the provided value. Note that the Continuous option should be used with caution. The data is refreshed every second or after the previous refresh is completed if it takes more than 1 second.

 

Multivariate anomaly detection

Now you can use the power of Eventhouse, Spark, and OneLake to perform real-time monitoring and multivariate anomaly detection of time series data.  We added a workflow that is based on the algorithm that is used in the AI Anomaly Detector service (which is being retired as a standalone service).

A graph of a graph

Description automatically generated with medium confidence

Multivariate anomaly detection is a method of detecting anomalies in the joint distribution of multiple variables over time. This method is useful when the variables are correlated, thus the combination of their values at specific times might be anomalous, while the value each variable by itself is normal. Multivariate anomaly detection can be used in various applications, such as monitoring the health of complex IoT systems, detecting fraud in financial transactions, identifying unusual patterns in network traffic and more.

Multivariate anomaly detection in Fabric takes advantage of the powerful Spark and Eventhouse engines on top of a shared persistent storage layer. The initial data can be ingested into an Eventhouse and exposed in OneLake. The anomaly detection model can then be trained using the Spark engine. The trained model is stored in Fabric MLflow models registry. Predictions of anomalies on new streaming data can be done in real time using the Eventhouse engine. The interconnection of these engines that can process the same data in the shared storage allows for a seamless flow of data from ingestion, via model training, to anomalies prediction.

For more details see an overview and tutorial.

 

Enhanced Time Series Analysis: More Control, Better Insights

Whether you are using Real-Time Dashboards or KQL Querysets for time series analysis, you’ll find this new way of interacting with your data visualization valuable. You can now interact with the data by selecting specific items from the legend using the mouse, using Ctrl to add or remove selections, or holding Shift to select a range. The Search button helps you quickly filter items, while the Invert button allows you to reverse your selections. Navigate through your selections with ease using the Up and Down arrows to refine your data view.

A graph with lines and dots

Description automatically generated with medium confidence

 

Real-Time Intelligence Copilot conversational mode

We’d like to share an upgrade to our Copilot assistant, which translates natural language into KQL. Now, the assistant supports a conversational mode, allowing you to ask follow-up questions that build on previous queries within the chat. This enhancement enables a more intuitive and seamless data exploration experience, making it easier to refine your queries and dive deeper into your data, all within a natural, conversational flow.

A screenshot of a computer

Description automatically generated

 

New Rule Creation Experience

We’re excited to announce the launch of a new rule creation experience within Data Activator. This innovative experience unlocks new capabilities to monitor your events with more fine-tuned controls at greater scale, ensuring you solve and stay on top of your operational business challenges with ease.

The new unified experience allows you to easily bring in your data, identify the business object you want to monitor, quickly and easily create business rules on top of them, and automatically carry out actions when the rule is met. This feature is not only a new and improved experience with the same functionality as is available today, but also lays the groundwork for new modeling capabilities in the future.

With this update you’ll be able to:

  • Stay on top of your critical metrics by monitoring your business objects. You can track and analyze key business objects such as individual packages, households, refrigerators, and more in real-time, ensuring you have the insight needed to make informed decisions. Whether it’s understanding how individual instances of your business objects impact sales figures, inventory levels, or customer interactions, our monitoring system provides detailed insights, helping you stay proactive and responsive to changes in your business environment at a fine-tuned level of granularity.
  • Unlock the full potential of creating business rules on your data with advanced data filtering and monitoring capabilities. This update offers a wide array of options for filtering, summarizing, and scoping your data, allowing you to tailor your analysis to your specific needs. You can set up complex conditions to track when data values change, exceed certain thresholds, or when no new data has arrived within a specified timeframe.
  • Ensure your communications are perfect before hitting send by seeing a preview of your Email and Teams messages. This will allow you to see a preview of your message exactly as it will appear to the recipient. Review your content, check formatting, and make any necessary adjustments to ensure clarity. With this feature, you can confidently have Data Activator send messages on your behalf knowing they look just the way you intended.
  • Set up rules that trigger automatically with every new event that comes in on your stream of data. Whether you need to send notifications or initiate workflows, this feature ensures that your processes are always up-to-date and responsive.

These are just some of the highlights the new Data Activator rule creation experience provides, but there is much more to explore. To learn more about how this update can help you create and manage your business rules more easily and efficiently, check out our detailed blog post that covers all the features and benefits in depth.

 

We’ve made it easier to alert your teammates in Power BI

We’ve streamlined the process of creating Data Activator alerts for your teammates in Power BI. Previously, you had to create your alert in Power BI, then open it in Data Activator to add your teammates as recipients. Now, you can add recipients directly within Power BI. We’re rolling this out gradually during the coming few weeks, so look out for it during the month of September when you click “Set Alert” in Power BI.

For the latest updates and what’s coming in Real-Time Intelligence check our out documentation and roadmap.

 

Watch the Real-Time Intelligence demos

 

Data Factory

Dataflow Gen2

Copilot in Dataflow Gen2 (generally available)

Earlier this year, we released a Public Preview of the Copilot for Data Factory capabilities in Dataflows Gen2. Today, we are excited to announce that these capabilities are now Generally Available.

Copilot in Fabric enhances productivity, unlocks profound insights, and facilitates the creation of custom AI experiences tailored to your data. As a component of the Copilot in Fabric experience, Copilot in Data Factory empowers customers to use natural language to articulate their requirements for creating data integration solutions using Dataflow Gen2. Essentially, Copilot in Data Factory operates like a subject-matter expert (SME) collaborating with you to design your dataflows.

Copilot for Data Factory is an AI-enhanced toolset that supports both citizens and professional data wranglers in streamlining their workflow. It provides intelligent Mashup code generation to transform data using natural language input and generates code explanations to help you better understand earlier generated complex queries and tasks.

Learn more about how to get started with Copilot for Data Factory: Copilot for Data Factory overview – Microsoft Fabric | Microsoft Learn.

 

Fast Copy in Dataflow Gen 2 (general available)

The Fast Copy feature is now generally available. This functionality enables efficient ingestion of large amounts of data, utilizing the same backend as the Copy Activity in data pipelines. It significantly reduces data processing time and enhances cost efficiency.

Learn how to enhance performance and cut costs with Fast Copy in Dataflows Gen2.

Fast Copy supports numerous source connectors, including ADLS Gen2, Blob storage, Azure SQL DB, On-Premises SQL Server, Oracle, Fabric Lakehouse, Fabric Warehouse, PostgreSQL, and more. An on-premises gateway is also supported, allowing for high-performance data transfer from on-premises sources. For details, see Fast Copy with On-premises Data Gateway Support in Dataflow Gen2.

Resources:

Docs: Fast copy in Dataflows Gen2 – Microsoft Fabric | Microsoft Learn

 

Incremental refresh for Dataflow Gen2

We are happy to share the introduction of incremental refresh for Dataflows Gen2. This significant enhancement in Microsoft Fabric’s Data Factory is designed to optimize data ingestion and transformation, particularly as your data continues to expand.

A screenshot of a computer screen

Description automatically generated

About Incremental Refresh

Incremental refresh enables you to refresh only new or updated data, thus reducing refresh times, enhancing reliability by avoiding long-running operations, and minimizing resource usage. This feature is particularly valuable in scenarios where data volume is substantial, and efficiency is required.

Prerequisites

To leverage incremental refresh in Dataflows Gen2, ensure the following prerequisites are met:

  • You must have a Fabric capacity.
  • Your data source should support query folding and must include a Date/DateTime/DateTimeZone column for data filtering.
  • Your data destination must support incremental refresh. Supported destinations include Fabric Warehouse, Azure SQL Database, and Azure Synapse Analytics.

How to Use Incremental Refresh

  1. Create a new Dataflow Gen2 or open an existing one.
  2. In the dataflow editor, create a new query that retrieves the data you need to refresh incrementally.
  3. Ensure your query returns data with a Date/DateTime/DateTimeZone column for filtering.
  4. Verify that the query fully folds to the source system.
  5. Right-click the query and select Incremental refresh.
  6. Provide the required settings, including the DateTime column, data extraction parameters, and bucket size.
  7. Configure any advanced settings if necessary.
  8. Publish Dataflow Gen2.

Resources:

Incremental Refresh Docs

Incremental Refresh Blog

 

Certified connector updates

Check out the new and updated connectors in this release:

New connectors:

  • Kyvos
  • Clickhouse

Updated connectors:

  • InformationGrid

Are you interested in creating your own connector and publishing it for your customers? Learn more about the Power Query SDK and the Connector Certification program.

 

Watch the Data Factory demo

 

Data pipeline

Fabric Pipeline Integration in On-premises Data Gateway (generally available)

On-premises connectivity for Data pipelines in Microsoft Fabric is now Generally Available. Using the on-premises Data Gateway, customers can connect to on-premises data sources using data pipelines with Data Factory in Microsoft Fabric. This enhancement significantly broadens the scope of data integration capabilities. In essence, by using an on-premises Data Gateway, organizations can keep databases and other data sources on their on-premises networks while securely integrating them into Microsoft Fabric (cloud).

Learn more: How to access on-premises data sources in Data Factory – Microsoft Fabric | Microsoft Learn

 

Invoke remote pipeline in Data pipeline

We’ve been working diligently to make the very popular data pipeline activity known as “Invoke Pipeline” better and more powerful. Based on customer feedback, we continue to iterate on the possibilities and have now added the exciting ability to call pipelines from Azure Data Factory (ADF) or Synapse Analytics pipelines as a public preview!

This opens tremendous possibilities to utilize your existing ADF or Synapse pipelines inside of a Fabric pipeline by calling it inline through this new Invoke Pipeline activity. Using cases that include calling Mapping Data Flows or SSIS pipelines from your Fabric data pipeline will now be possible.

We will continue to support and include the previous Invoke Pipeline activity as “legacy”, without the support for ADF or Synapse remote pipeline invocation and without child pipeline monitoring in Fabric. But for the latest features like remote invocation and child pipeline monitoring, you can use the new Invoke Pipeline.

A screenshot of a computer

Description automatically generated

 

Spark Job environment parameters

One of the most popular use cases in Fabric Data Factory today is automating and orchestrating Fabric Spark Notebook executions from your data pipelines. A very common request has been to reuse existing Spark sessions to avoid any session cold-start delays. And now we’ve delivered on that requirement by enabling “Session tags” as an optional parameter under “Advanced settings” in the Fabric Spark Notebook activity! Now you can tag your Spark session and then reuse the existing session using that same tag. 

A screenshot of a computer

Description automatically generated

 

Watch the Data pipeline demos

 

Mirroring

Mirroring Azure SQL Database

Mirroring Azure SQL Database in Fabric now extends our capabilities to support mirrored tables within additional SQL’s Dynamic Definition Language (DDL). Now, operations such as Drop Table, Rename Table, and Rename Column can be seamlessly executed while tables are in the process of mirroring.

Click here to watch a demo.

 

New Azure Data Factory Item

Bring your existing Azure Data Factory (ADF) to your Fabric workspace! We are introducing a new preview capability that allows you to connect to your existing ADF factories from your Fabric workspace. By clicking “Create Azure Data Factory” inside of your Fabric Data Factory workspace, you will now be able to fully manage your ADF factories directly from the Fabric workspace UI! Once your ADF is linked to your Fabric workspace, you’ll be able to trigger, execute, and monitor your pipelines as you do in ADF but directly inside of Fabric.

 

Copy Job (public preview)

We’d like to introduce Copy Job, elevating the data ingestion experience to a more streamlined and user-friendly process from any source to any destination. Now, copying your data is easier than ever before. Moreover, Copy job supports various data delivery styles, including both batch copy and incremental copy, offering flexibility to meet your specific needs.

A screenshot of a computer

Description automatically generated

With Copy Job, you can enjoy the following benefits:

  • Intuitive Experience: Seamless experience data copying with no compromises, making it easier than ever.
  • Efficiency: Enable incremental copying effortlessly, reducing manual intervention. This efficiency translates to less resource utilization and faster copy times.
  • Flexibility: Empower yourself to customize your data copying behavior. From selecting tables and columns to data mapping and read/write behavior, you have the flexibility to tailor the process to your specific needs. Additionally, enjoy the freedom to set flexible schedules, whether it’s a one-time copy or at a regular cadence.

Click here to learn more about Copy Job.

 

Watch the Mirroring demos

 

Related blog posts

Fabric September 2024 Monthly Update

September 26, 2024 by Ye Xu

Fast Copy in Dataflow Gen2 is now General Available! This powerful feature enables rapid and efficient ingestion of large data volumes, leveraging the same robust backend as the Copy Activity in Data pipelines. With Fast Copy, you can experience significantly shorter data processing times and improved cost efficiency for your Dataflow Gen2. Additionally, it boosts … Continue reading “Announcing the General Availability of Fast Copy in Dataflows Gen2”

September 26, 2024 by Guy Reginiano

Now you can set up Data Activator alerts directly on your KQL queries in KQL querysets.