Microsoft Fabric Updates Blog

Synapse June 2023 Monthly Update

Monthly Update

Welcome to the first edition of the Synapse in Microsoft Fabric monthly update! Visit the blog site the first week of the month to see what our teams have been up to. This blog will cover the 4 Synapse experiences in Fabric: Data Warehouse, Data Engineering, Data Science, and Real-Time Analytics.

Check out our companion video below and read on for all of the announcements we have for you for this month! We’re excited you’re here!

Contents

Data Warehouse
Data Engineering
Data Science
Real-Time Analytics

Data Warehouse

Synapse Data Warehouse is the next generation of data warehousing in Microsoft Fabric that is the first transactional data warehouse to natively support an open data format enabling IT teams, data engineers and business users to collaborate seamlessly and extract actionable insights from their data, all without compromising enterprise security or governance. Just like the previous data warehouse generation, SQL provides multi-table ACID transactional guarantees. It is built on the well-established SQL Server Query Optimizer and Distributed Query Processing engine but is bolstered with the key improvements below that add significant new value to enterprises.

Some of the key Synapse Data Warehouse experiences that are launching as part of Microsoft Fabric at Build are:

Fully managed: this new data warehouse is a fully managed SaaS solution and effortlessly extends modern data architectures to both professional developers who love to write code and citizen enthusiasts with no coding skills. What previously took enterprises months to accomplish can now be done in minutes efficiently.
No provisioning and managing of resources: instead of provisioning dedicated clusters, it is based on a fully serverless compute infrastructure where resources are provisioned in milliseconds as jobs requests come in. Enterprises benefit from resource efficiencies and only pay for what they use.
Separation of storage and compute: compute nodes used are independent of storage enabling enterprises to scale and pay for either one separately.
Open data standards: data is not locked-in the proprietary SQL Server format but is stored in the open data standard of Delta-Parquet in the Microsoft OneLake providing interoperability not only with all workloads in Fabric but also the Spark ecosystem without requiring any data movement.
Cross-querying: as a result of the open data standard support, data in the lake whether processed by a Fabric workload or any other compute engine can be queried and cross-joined without making any copies of the data.
Auto-scaling: it automatically scales resources instantly as query and usage requirements increase and down-scales when there is no more need for these resources, all without any user intervention.
Self-optimizing: it automatically detects and isolates workloads to deliver predictable performance. The best performance is based on caching which is automatic and multi-tiered based on activity. Query plans generated are optimal. There is no need to hire highly skilled engineers to manage workload groups or tune the data warehouse.
Fully integrated: it is fully integrated with all Fabric workloads right out of the box for any developer. Users can continue to benefit from the rich capabilities of the SQL engine using the T-SQL language or a simple user interface. All this with the continued benefits of the SQL ecosystem.

Learn more about Data Warehouse in Microsoft Fabric by reading Introducing Synapse Data Warehouse in Microsoft Fabric and watching Build 2023: Modernize your Enterprise Data Warehouse, generate value from data with Fabric.

Data Engineering

Synapse Data Engineering is one of the core experiences of Microsoft Fabric. Microsoft Fabric empowers teams of data professionals to seamlessly collaborate, end-to-end on their analytics projects, ranging from data integration to data warehousing, data science and business intelligence. With data engineering as a core experience in Fabric, data engineers will feel right at home, being able to leverage the power of Apache Spark to transform their data at scale and build out a robust lakehouse architecture.

Some of the key Synapse Data Engineering experiences that are launching as part of Microsoft Fabric at Build are:

Build a lakehouse for all your organizational data: lakehouse combines the best of the data lake and warehouse, removing the friction of ingesting, transforming, and sharing organizational data, all in an open format.
Runtime with great default performance & robust admin controls: public preview is shipping with ‘Runtime 1.1’ which includes Spark 3.3.1, Delta 2.2 and Python 3.10. To remove friction in getting started, the Spark Runtime comes pre-wired to every Microsoft Fabric workspace.
Developer experience: Our goal is for every data engineer to have a delightful authoring experience, irrespective of their tooling of choice.

Learn more about Data Engineering in Fabric by reading Introducing Synapse Data Engineering in Microsoft Fabric and watching Build 2023: Using Spark to accelerate your lakehouse architecture with Microsoft Fabric.

Data Science

Synapse Data Science in Microsoft Fabric allows data science practitioners to work seamlessly on top of the same secured and governed data that has been prepared by data engineering teams. This eliminates the need to copy data and figure out ways to give your data science teams secure access to data. In Microsoft Fabric, the open Delta Lake support allows data science users to version datasets to create reproducible machine learning code. Additionally, data science users have access to a wide range of easy-to-use getting started experiences, low-code tools and code authoring experiences with Notebooks and Visual Studio Code. Synapse Data Science in Microsoft Fabric also provides a rich set of built-in ML tools. For example, MLFlow model and experiment tracking, powered by Azure machine learning, is built in. The SynapseML Spark library provides scalable ML tools and users can serve predictions swiftly to Power BI with the new PBI Direct Lake capability. Finally, streamlined collaboration across different analytics roles makes hand-offs seamless and teams more productive.

Some of the key Synapse Data Science experiences that are launching as part of Microsoft Fabric at Build are:

Data prep and code generation with Data Wrangler
ML models and experiments as first-class citizens with MLFlow
SynapseML, a comprehensive machine learning library for Spark
Enrich data in your Lakehouse with scalable PREDICT
R Language support

Learn more about Data Science in Fabric by reading Introducing Synapse Data Science in Microsoft Fabric and watching Build 2023: Models to outcomes with end-to-end data science workflows in Microsoft Fabric.

Real-Time Analytics

With Real-Time Analytics, organizations can simplify their data integration and focus on scaling up their analytics solution while democratizing data for everyone, from citizen data scientists to advanced data engineers. Real-Time Analytics enables quick access to data insights through automatic data streaming, indexing, and partitioning, and employs auto-generated queries and visualizations, all while preserving powerful analytical capabilities. The platform is optimized for streaming, time-series data, and utilizes a query language and engine with exceptional performance for searching structured, semi-structured, and unstructured data.

Some of the key Synapse Real-Time Analytics experiences that are launching as part of Microsoft Fabric at Build are:

Ingest data from any source and in any data format, without the need to build complex data models or create scripts to transform the data.
Scale to an unlimited amount of data, from gigabytes to petabytes, with unlimited scalability on concurrent queries and concurrent users.
Offers the necessary flexibility to work with structured, semi-structured, or unstructured data formats, including free text
Simplified Get Data experience for bringing data from any format and source.
One-click Power BI report generation.
One Logical Copy – data can be available to Microsoft OneLake and exposed to other Fabric experiences.
Truly serverless – no SKU selection.
Real-time streaming data availability in seconds from ingestion to querying.
Querying OneLake data via OneLake shortcuts.
Seamless connectivity with Azure Data Explorer for databases via Cloud Connection.
Real-time complex data structure transformation.