Microsoft Fabric, explained for existing Synapse users
Earlier this year, at Microsoft Build, we introduced, in Public Preview, Microsoft Fabric, “the biggest data product announcement since SQL Server”. Today, we are announcing the General Availability of Microsoft Fabric.
Arun explains in detail why we all believe Microsoft Fabric will redefine the current analytics landscape. I will focus here on what it means for customers that are using the current Platform-as-a-Service (PaaS) version of Synapse, explaining what it means for your current investments (spoiler: we fully support them), but also how to think about the future.
What happens with PaaS Azure Synapse Analytics
The PaaS offering of Azure Synapse Analytics is an enterprise analytics service designed to accelerate time to insight across data warehouses and big data systems. It brings together the SQL technologies used in enterprise data warehousing, Azure Data Factory pipelines, Apache Spark technologies for big data, and Azure Data Explorer for log and time series analytics.
Microsoft has no current plans to retire Azure Synapse Analytics. Customers can continue to deploy, operate, and expand the PaaS offering of Azure Synapse Analytics. Rest assured, should these plans change, Microsoft will provide you with advanced notice and will adhere to the support commitments in our Modern Lifecycle Policy in order to ensure our customers’ needs are met.
The evolution of Microsoft’s big data analytics products
The next versions of our big data analytics products are now a core part of Microsoft Fabric.
Fabric opens new architectural horizons for our analytical engines. Fabric offers a unified storage abstraction for all your data, OneLake, organized into a logical data mesh, with federated governance and granular control and an intuitive, personalized data hub. All Fabric engines separate storage from compute, and store data in OneLake using a single, open data format.
On this new foundation, we can invent new, unprecedented ways of deploying pipelines, data warehousing, data engineering, data science, observability and real-time analytics technologies, to ultimately simplify and increase the efficiency of our customers’ solutions. Fabric allows us, and our customers, to do more. This is why most of our innovation efforts will be focused on Fabric.
How to think about your current Azure PaaS Synapse Analytics solutions
As mentioned above, there is no immediate need to change anything, as the current platform is fully supported by Microsoft. Your existing solutions will keep working. Your in-progress deployments can continue, all with our full support.
However, you probably have already started thinking about a Microsoft Fabric future for your analytics solutions. The following steps may help you with this thought process.
Understand Microsoft Fabric
Microsoft Fabric represents a significant upgrade to all our analytics engines. All of them are improved, faster, and more scalable. And there is a lot to learn about the new engines and how to best use them. Fabric reimagines collaboration and empowers the business users in an unprecedented way. But it is much more than just better engines or just better integration.
The unified, open-source data format means that there is no need to copy data from one engine to another. You can shape data using the technology of your choice, then query it with any other technology.
Fabric introduces completely new ways to make your data part of your analytics landscape. Shortcuts (within Azure, or cross cloud), database mirroring, seamless access to Dataverse and M365 data, all these solutions are designed to remove friction and costs.
Understanding these technologies will enable you to make the best out of Fabric, in terms of efficiency, agility and costs.
Our teams have worked hard to produce detailed documentation for all the Fabric concepts, and the best complement for the documentation is hands-on experience. The easiest way to understand Fabric in depth is to try the product: Microsoft Fabric free trial . Arun’s blog spells out clearly how to learn more about Microsoft Fabric.
Understand what it means for your solution
Your analytics solution may use different technologies and engines. Fabric is a complete analytics platform, so you will find, inside Microsoft Fabric, new and enhanced analytics capabilities of the products with which you are familiar today.
Fabric brings new capabilities, that have no parallel with the current PaaS Synapse Analytics offering. The Fabric SQL Engine can operate, with equal performance, scale, and security, over any OneLake artifact (warehouses, lakehouses, mirrored databases). It also supports cross-artifact operations removing the need for extra copies while Power BI, for example, in DirectLake mode, can now analyze real time streaming data, or Spark output.
All these changes enable simpler, more efficient solutions, removing the need for intermediate steps and multiple data copies. Your solution can get significantly simpler and cheaper.
Below, I use one example of common PaaS Azure Synapse Analytics architectures, together with a possibly more efficient solution in Fabric, to demonstrate such potential simplifications.
Example 1: Data Lake, from Synapse to Fabric
Today, you may prepare your data in an Azure Data Lake Storage Gen2 (ADLSg2) lakehouse (typically using Spark, Synapse or Azure Databricks), then use a pipeline to load data into a Synapse SQL Dedicated Pool, then use Power BI or some other BI tool for your report.
You can keep your current solution intact, and upgrade to Fabric engines.
In Fabric, however, this solution can be simplified:
- A Data Engineering Lakehouse, in Microsoft Fabric, allows you to use your current ADLSg2 data, as prepared with Synapse Spark or Azure Datrabricks (via shortcuts).
- The SQL Analytics Endpoint allows you to apply the security rules from the Dedicated Pool directly over the Lakehouse. There is no need for a dedicated capacity, nor for the pipeline copying from the lake to your warehouse.
- Using the new DirectLake mode, Power BI can now operate directly over the Lakehouse, with performance similar to Import. Your other BI tools can continue to operate over the SQL Analytics Endpoint.
- By migrating your Notebooks and Spark Jobs to Fabric Spark, your Lakehouse data will be automatically optimized for all the other Fabric engines (while also being stored in an open format)
To learn more about the Lakehouse pattern in Microsoft Fabric, please visit Lakehouse end-to-end scenario: overview and architecture – Microsoft Fabric | Microsoft Learn
Assess our migration tools and processes
We are investing significant development efforts in migration processes and tooling. And our migration efforts are prioritizing current PaaS Synapse Analytics customers.
The processes and tools we are designing are intended to minimize the friction, disruption and cost for our existing customers.
As you will see in the section on Migration Resources, we are developing tools to:
- Use your data in-place whenever possible
- Reuse code investments (pipelines, notebooks) when possible
- Migrate code (stored procedures, views, notebooks)
These investments are not complete. We will keep posting updates to our migration tools. Join the fast-growing Fabric community , and our specialists as well as external experts will be ready to work with you. The Fabric Ideas forum, on the community site, is the best way to suggest new features, and it is closely monitored by the Microsoft Fabric product teams.
Develop, then plan to deploy a migration strategy
After having learned about Fabric and evaluating the product, you will have developed enough confidence in the new Fabric engines and the migration technology to move your solution to Fabric. For some of you this may happen soon, for others it may take years.
There is no rush – we will keep supporting your existing solutions – but we are ready for you to migrate whenever the time is right.
When you are ready to move your solution to Fabric, you will be able to exchange your existing 1- or 3-year Synapse Reserved Instance (RI) purchases for 1 year Fabric RI purchases to continue to apply your reservation discounts in Fabric. Additionally, if you want to increase your RI commitment for your Fabric portfolio you will have access to discounts of >40% over the Fabric Pay-as-you-go pricing.
In the next sections, the product leaders explain how to think about Fabric from the perspective of different PaaS Synapse Analytics workloads.
Data Factory Pipelines
Data Factory in Microsoft Fabric brings Power Query and Azure Data Factory together into a modern trusted data integration experience, that empowers data and business professionals to extract, load, and transform data for their organization. In addition, powerful data orchestration capabilities enable you to build simple to complex data workflows, that orchestrate the steps needed for your data integration needs.
Key concepts in Data Factory in Microsoft Fabric include:
- Get Data and Transformation with Dataflow Generation 2 is an evolution of Dataflow in Power BI. Dataflow Generation 2 is re-architected to leverage Fabric compute engines for data processing and transformation. This enables Dataflow Generation 2 to ingest and transform data at any scale.
- Data Orchestration with Data Pipelines – For customers familiar with Azure Data Factory (ADF), data pipelines in Microsoft Fabric use the same technology that powers Azure Data Factory. As part of the GA of Fabric, data pipelines in Microsoft Fabric will have most of the activities available in ADF.See here a list of activities that will be part of data pipelines in Fabric. SSIS activity will be added to data pipelines by Q2 CY2024.
- Enterprise-ready Data Movement – Whether it is petabyte-scale data to small data, Data Factory provides a serverless and intelligent data movement platform that enables you to move data between diverse data sources and data destinations reliably. With support for 170+ connectors, Data Factory in Fabric enables you to move data between multi-clouds, data sources on-premises, and within virtual networks (VNet). Intelligent throughput optimization enables the data movement platform to automatically detect the size of the compute needed for data movement.
To enable customers to upgrade to Microsoft Fabric from Azure Data Factory (ADF), we will be supporting the following:
- Data pipelines activities – For many of the activities that you use in ADF, we have added these into Data Factory in Fabric. In addition, we have added new activities (e.g. Teams, Outlook) for notifications. See here for a list of activities that are available in Data Factory in Fabric.
- OneLake/Lakehouse connector in Azure Data Factory – For many ADF customers, you can now integrate with Microsoft Fabric, and bring data into the Fabric Onelake
- Azure Data Factory Mapping Dataflow to Fabric – We have put together a guide for ADF customers who are looking at building new data transformations in Fabric.Find out more at https://aka.ms/datafactoryfabric/docs/guideformappingdataflowusers
In addition, customers looking at migrating their ADF mapping dataflows to Fabric, you can leverage sample code from the Fabric Customer Advisory Team (Fabric CAT) to convert mapping dataflows to Spark code. Find out more at https://github.com/sethiaarun/mapping-data-flow-to-spark
As part of Data Factory in Fabric roadmap, we will be working towards the preview of the following by Q2 CY2024:
- Mounting of Azure Data Factory in Fabric – This enables customers to be able to mount their existing Azure Data Factory in Microsoft Fabric. All ADF pipelines will work as-it-is, and continue running on Azure, while enabling you to explore Fabric, and work out an upgrade plan.
- Upgrade from Azure Data Factory pipelines to Fabric – We will be working with customers and the community on learning how we can best support upgrades of data pipelines from ADF to Fabric. As part of this, we will deliver an upgrade experience that empowers you to test your existing data pipelines in Fabric using mounting and upgrading the data pipelines.
Learn more about how you can upgrade to Data Factory in Fabric – https://aka.ms/datafactoryfabric/upgradetofabric
Synapse Data Warehouse
Fabric Data Warehouse is the next generation of data warehousing in Microsoft Fabric. It is the first transactional data warehouse to natively support an open data format enabling data engineers and business users to collaborate seamlessly without compromising security or governance. Just like the previous data warehouse generation, SQL provides multi-table ACID transactional guarantees. It is built on the well-established SQL Server Query Optimizer and Distributed Query Processing engine but comes with major improvements that address many of the challenges customers face in enabling workloads associated with modern analytics. These improvements were driven by rearchitecting the data warehouse by leveraging IP from both Dedicated and Serverless SQL Pools along with:
- Separation of storage and compute: data is stored in OneLake and is clearly separated from the compute used by the SQL engine. There is an elastic allocation of compute resources based on demand, as well as use of distinct compute resources for different workload types on top of the same data.
- Leveraging the infinite compute capabilities of Azure Cloud: giving us the capability of going beyond a limited topology offered by the Synapse Gen2 architecture.
- Support for open data format: allowing a single copy of the data to be used by all the Fabric workloads such as Data Science, Data Engineering, and Power BI.
With this new architecture, the new engine enables numerous new capabilities that were not possible in either Dedicated and Serverless SQL Pools such as:
- Cross database querying without any ETL or data movement.
- Cloning without creating copies of the data.
- Autoscaling enabling elastic scale up and down of the compute nodes with dynamic resource allocation tailored to data volume, usage, or query complexity.
- Enabling a pay for what you use pricing model.
- No knobs performance via automated query optimizations, statistics, and data distributions.
All of this with the concepts familiar to SQL users such as Views, Stored Procedures, SQL security (row-level security, column-level security, dynamic data masking) and full benefits of the T-SQL tooling ecosystem.
These architectural changes cannot be backported to either one of the old engines. Because of the open format, your data warehouses cannot be upgraded in place either. Data stored in a proprietary format in Gen2 needs to be extracted and stored in the open format of Fabric.
A migration can be done at your own pace when you are ready to leverage these new capabilities. To enable this, we have added the following available to you now:
- Ability to export your Dedicated SQL Pool to a SQL Project and import it in Fabric.
- PowerShell scripts are available in GitHub that convert Gen2 DDL to Fabric supported DDL.
- Detailed migration documents with best practices. Find out more at the Azure Synapse dedicated SQL pools to Fabric Migration Guidance whitepaper
In addition, we have also started working on an in-product Migration Assistant that will automatically detect and convert your Synapse Gen2 code to Fabric Data Warehouse code. It will also redirect your endpoints, so you don’t have to worry about application migration. We anticipate this to be available in CY24.
Synapse Data Engineering
Fabric Data Engineering is our big data analytics workload in Fabric, empowering data engineers to leverage the power of Apache Spark to transform their data at scale and build out a lakehouse architecture. The Fabric Data Engineering experience targets users of Apache Spark pools in the Azure Synapse Analytics world. Here are some of the key takeaways regarding the Fabric Data Engineering experience:
Runtime for big data workloads
Every Fabric workspace comes pre-wired with a ‘starter pool’ (default Spark cluster) with a Fabric Runtime that contains up to date versions of Spark, Delta, Java and Python. Just like in Azure Synapse Analytics, customers can also create their own custom clusters with their own configurations and libraries if they want.
The Apache Spark experience in Fabric also contains many new and exciting enhancements:
- Starter pools in Fabric are automatically kept live meaning users can enjoy sessions that start within ~15 seconds
- High concurrency mode in Fabric means multiple notebooks can be attached to a single session, accelerating the start-up times and reducing costs
- Spark clusters start all the way from a single node, further reducing the costs of getting started with Spark
Simplified lakehouse architecture
Every Fabric workspace also comes pre-wired with OneLake, our SaaSified data lake for the organization. Users can easily create lakehouse items, which are the perfect container for bringing in all your data into OneLake using Spark, dataflows and pipelines. Existing data can be easily included with no data movement through the use of shortcuts. We will also automatically discover metadata of Delta tables for you, making it super easy to start working with existing data with zero friction. Additionally, we have reduced the price for Spark in Fabric by almost 40% vs. the retail price of Synapse Spark.
Here are some other exciting things to keep in mind about the lakehouse in Fabric:
- Every lakehouse comes with a built in SQL endpoint and Power BI dataset. This means that as soon as you transform your data with Spark, you can start querying it using our SQL engine and Power BI, with no data movement necessary
- Spark (along with every other Fabric engine) will automatically write the data into the lakehouse with v-order enabled, automatically optimizing it for BI reporting
First Class Developer Experiences
The Synapse Data Engineering experience brings in familiar authoring tools, including notebooks for interactive querying experiences and Spark Job Definitions for submitting batch jobs. These capabilities come with a variety of new enhancements and users even have some new authoring experiences to look forward to:
- Notebooks in Fabric include numerous usability improvements including auto-save, real time collaboration and commenting, a built-in file system as well as native file format support when checking into git. Users can also make use of light-weight scheduling (in addition to using the pipeline activity).
- Spark Job Definitions come with retry policy support, making it easier to continuously run long running streaming jobs
- Native VS Code support makes it easy to work with your Data Engineering items (notebooks, Spark Jobs, lakehouse) all in your favorite IDE, including full debugging support
- The newly released environment item streamlines the packaging of all of your Spark configurations, libraries, cluster settings and more, and simplifies the re-usability of your hardware and software environment across your code artifacts.
To summarize, with Synapse Data Engineering, you can start building on top of your existing Azure Synapse Spark investments quickly and incrementally. Start by leveraging shortcuts to existing data in your data lake and bringing-in your notebooks using the import capability. We are starting work on an in-product migration assistant but in the meantime, please use our newly published Azure Synapse Spark to Fabric Migration Guidance whitepaper.
Synapse Data Science
Synapse Data Science empowers data scientists to explore their data, build and operationalize their predictive models. Coming from the Azure Synapse Analytics world, you will see many familiar constructs such as Python and R being baked into the runtime including many popular ML packages, the ability to install your own third party & custom libraries as well as the availability of SynapseML, our open source library for creating massively scalable ML pipelines.
Fabric Data Science offers a variety of new capabilities data scientists can look forward to:
Model & Experiment tracking
Data scientists are able to leverage experiments and models as readily available in items in the Fabric workspace. Support for ML models and experiments allows users to manage models and track experiment runs using standard MLFLow APIs. Comparison experiences make it easy to compare different experiment runs and auto logging helps capture key metrics automatically as users author code to train models.
Model batch scoring
To operationalize their ML models, users can leverage the scalable PREDICT function for distributed batch scoring on Spark. This capability exists in Azure Synapse today and so existing Synapse users should feel right at home. The Fabric Data Science experience provides low code UI for scoring data and tight integration with the lakehouse, making it easy to enrich data and surface it in Power BI reports with zero friction.
Data Exploration & Enrichments
Fabric Data Science offers many innovative solutions in the space of exploring and transforming your data. These include:
- Data Wrangler – a low code UI for carrying out data transformations that automatically generate Python code
- Semantic Link – a library enabling seamless connectivity to the Power BI semantic model through data science tools like notebooks
- Pre-built AI models – newly released public preview capability providing built-in access to Azure AI services like text analytics and translation services
The migration path for a data scientist in Azure Synapse Analytics is like that of a Spark data engineer – they will need to consider their notebooks, Spark pools and data. We recommend starting with the Azure Synapse Spark to Fabric Migration Guidance whitepaper.
Synapse Real-time Analytics
Synapse Real-time Analytics is a robust platform tailored to deliver real-time data insights and observability analytics capabilities for a wide range of data types. This includes observability time-based data like logs, events, and telemetry data. It’s the true streaming experience in Fabric! Building on the same foundation as Azure Synapse Data Explorer, Synapse Real-time Analytics equips both citizen data scientists and professional data engineers with a suite of features and tools to fully unleash the potential of their data.
Experience unmatched efficiency by creating a database, ingesting data, running queries, and generating Power BI reports, all within a 5-minute timeframe. Real-time Analytics puts speed at the forefront, allowing you to dive into data analysis without delay.
For an authentic streaming experience in Fabric, the “Get Data” feature has received a modern facelift with an intuitive design and user-friendly interface. It simplifies data ingestion, accepting any data format or structure from various sources in either streaming or batch mode. Your data becomes query able within seconds.
Whether you’re a Kusto Query Language (KQL) enthusiast or prefer traditional SQL, Real-time Analytics accommodates your needs. This service enables you to generate quick KQL or SQL queries, ensuring that you can work in your preferred language and obtain results swiftly. It doesn’t matter if you’re working with a small dataset (a few gigabytes), a medium-sized one (a few terabytes), or even massive datasets (in the petabytes range).
Fabric Real-Time Analytics offers a multitude of innovative solutions for exploring and visualizing your data, including:
KQL Queryset: A workbench for creating, managing, and sharing your queries.
Power BI Report: A one-click option to generate a Power BI report on top of any query or table.
Notebook: Seamlessly connect your Fabric Notebook with the KQL Database for data ingestion and querying.
NL2KQL (Coming Soon): Write your query in natural language, and Fabric will generate and execute the corresponding KQL query for you.
Real-Time Dashboard (Coming Soon): The Fabric Real-Time Dashboard is a collection of tiles that enable native export of Kusto Query Language (KQL) queries as visuals. This allows for easy query modification and visual formatting, enhancing data exploration and delivering superior query and visualization performance.
Fabric Real-Time Analytics is your gateway to real-time insights and a streamlined data analysis experience. Whether you’re pioneering new data horizons or looking to optimize your data analytics solutions, this service is your trusted partner. Stay ahead in the data game and embark on your journey with Fabric Real-Time Analytics today.
For more information on Fabric Real-Time Analytics, visit the general availability blog.
Fabric KQL databases are 100% compliant with Azure Data Explorer (ADX) and Azure Synapse Data Explorer (Preview) and our powered by the same technology. It means that all current applications, SDK, integrations, and tools that work with ADX will continue to work smoothly with Fabric KQL databases.
There is a broad set of capabilities to support mixed environments and migrations, some are available now and some will light up in the next months.
- Available now:
- Full binary compatibility of APIs, SDKs and tools.
- Create a database shortcut to host a read only, in place, up to date instance of the database in Fabric.
- Coming over the next months:
- Migrate an Azure Synapse Data Explorer pool from a Synapse workspace and attach it to a Fabric workspace
- Attach an Azure Data Explorer cluster to a Fabric workspace
- Sync Azure Data Explorer user queries and dashboards into a Fabric workspace query sets and dashboards