Microsoft Fabric Updates Blog

Mapping ​​Azure Synapse dedicated SQL pools to Fabric data warehouse compute

Writers: John Hoang and Brad Schacht
Technical Reviewers: Steve Howard and Priya Sathy

Introduction

The following is an article in a series that focuses on the migration from Azure Synapse Analytics to Microsoft Fabric. The first article, ​​Azure Synapse dedicated SQL pools to Fabric migration, discussed options to migrate DDL, T-SQL, and data from Synapse Dedicated SQL Pools Gen2 to Fabric Data Warehouse.

This article focuses on mapping Data Warehouse Units (DWU) from Azure Synapse Analytics dedicated SQL pool to an approximate equivalent number of Fabric Capacity Units (CU). For more information on Fabric capacity sizes and license types, please refer to Microsoft Fabric concepts. We used the TPC-H dataset for illustration and performance testing. Your actual result may vary depending on many factors including type of data, datatypes, width of tables, data source latency, etc.

Approach

A common ask from customers is “what is a good starting SKU that I should consider when migrating to Fabric data warehouse”. For example, if I am currently using DWU6000, what Fabric SKU would be a good starting point? While a simple resource mapping (CPU, memory, IO) between Synapse dedicated SQL pools and Fabric might seem like a great option, this may not provide an accurate comparison. There are several major differences between Synapse Gen2 and Fabric that must be taken into considerations including changes to the query optimizer, data storage, workload management, burstable capacity, and a different underlying hardware architecture all of which contribute to overall query and data ingestion performance.

Guidance derived from tests using TPCH 100GB, 1TB and 10TB covers a broad spectrum of workload sizes from our customers. We executed the power run, (single user running the 22 TPCH queries sequentially) and simulated multiple concurrent users each executing the TPCH queries simultaneously on both Synapse dedicated SQL pools and Fabric Warehouse on various DWUs/SKUs and used the result for our guidance.

While the dataset may not accurately reflect your workload, it does provide a data backed context that you can use as the “starting point” for your POC or pilot implementation. Fabric provides a sub-second online capacity scaling option that you can easily scale up or down to meet your performance SLA.

Note that the mapping focuses on the data warehouse component of Fabric. Fabric also uses the same capacity to serve Data Engineering, Data Factory, Data Science, Real-Time Analytics and Power BI. Estimating these components is out of scope for this article.

Synapse DWU to Fabric SKU Mapping

Mapping by Price

Let’s start with a simple DWU to Fabric SKU mapping based on price. You can use this information to understand the cost difference of Fabric compared to Azure Synapse. The table below maps the dedicated SQL pool DWU to Fabric SKU based on on-demand monthly pricing for the East US region. Dedicated pools do not have an equivalent to the lower Fabric F2 and F4 SKUs. Fabric provides a 41% discount for 1 year capacity reservation, whereas Synapse dedicated SQL Pool provides 37% for 1 year reservation and 65% for three-year reservation. Please refer to Microsoft Fabric pricing for the latest information. Fabric also allows you to pause and resume your capacity to enable additional cost savings.

Dedicated SQL Pool SLO Dedicated SQL Pool Price Fabric SKU Fabric DW Price
N/A N/A F2 $262.80
N/A N/A F4 $525.60
DWU100c $1,102.30 F8 $1,051.20
DWU200c$2,204.60 F16 $2,102.40
DWU300c$3,306.90
DWU400c $4,409.20 F32 $4,204.80
DWU500c$5,511.50
DWU1000c $11,023.00 F64 $8,409.60
DWU1500c $16,534.50 F128 $16,819.20
DWU2000c$22,046.00
DWU2500c$27,557.50
DWU3000c $33,069.00 F256 $33,638.40
DWU5000c$55,115.00
DWU6000c $66,138.00 F512 $67,276.80
DWU7500c$82,672.50
DWU10000c$110,230.00
DWU15000c $165,345.00F1024 $134,553.60
DWU30000c $330,690.00F2048 $269,107.20

Development environment

With the starting price of $.36/hour, the F2 is truly a “value” SKU. This is a great option for development environments. Many non-production environments are used for development and functional testing using a subset of production data without stringent performance SLAs. You can assign multiple workspaces to a single F2 capacity allowing multiple development teams to share the same capacity while allowing workspace and resource isolation. Then scale up instantly and with no downtime as additional workloads are added, convert the workspace from a DEV environment to a TEST environment, or simulate production tests then scale back down to save on the environment’s cost.

Mapping by database size

The table below summarizes the DWU to Fabric SKU mapping based on the database size. The options include “Minimum SKU”, which is the lowest SKU to “start with” for the associated workload size. The “Recommended SKU” is the ideal option that will provide optimal all-around performance.

There will be a slight mindset shift with Fabric compared to Synapse dedicated SQL pools. One example is “What happens when my database increases to larger than 10TB?” Scaling up to higher SKU is a simple and a valid solution. However, you have other great options and architecture design choices with Fabric. You can create different workspaces to distribute the workload, create capacity and workspace combinations for disparate use cases, and create shortcuts to access data across multiple workspaces to create a central data hub while the spokes handle data engineering operations.

Data Size

Minimum SKU

Recommended SKU

100GB

F2

F2

1TB

F2

F32

10TB

F64

F128

In the following sections, we will break down the rationale behind the recommendations for each data size.

100 GB data

The minimum Fabric capacity size required to run a workload consisting of a 100 GB dataset is an F2. In our tests, the F2 provided excellent data ingestion, transformation, and query performance for both power run and concurrency matching the dedicated SQL pool’s DWU500 performance, at a significantly lower cost.

Since the SKU handled the workload so well, we also recommend using an F2 for databases of this size.

1 TB data

The minimum Fabric capacity size required to run a workload consisting of a 1 TB dataset is an F2. With this larger dataset, the power run performance seen on a Fabric F2 is equivalent to running a Synapse dedicated SQL pool at DWU400.

Even with the 1 TB dataset, you can get started with the lowest Fabric SKU, an F2. The 1TB dataset power run executes all the queries successfully with performance equal to that of a Synapse dedicated SQL pool DWU400 but coming in at 1/16 the cost. However, with Fabric you will need to scale up if there is a concurrency requirement or faster query performance for this dataset, not because the capacity cannot execute the queries, but because running more than five users concurrently will exhaust the small F2’s allotted capacity units rather quickly.

We recommend using F32 for the best all-around performance for databases of this size.

10 TB data

The minimum Fabric capacity size required to run a workload consisting of a 10 TB dataset is an F64. With this, the largest dataset we tested, the power run on a Fabric F64 provided performance comparable to DWU3000 but at approximately 1/3 the cost.

Synapse dedicated SQL pools struggled to complete the power run, failing due to TEMPDB size limitations at DWU100 through DWU400. DWU1000, the lowest SLO at which Synapse dedicated SQL pools would complete the power run successfully, it ran 2.5x longer than the Fabric F64.

When comparing the concurrency runs, the Synapse dedicated SQL pool running at DWU6000 was unable to complete the tests, failing due to a TEMPDB size limitation, while Fabric was able to complete the same workload on the F64.

While the 10 TB dataset can be run successfully on Fabric’s F64, you will need to scale up for better performance when running concurrent users or if you need faster query processing.

We recommend using an F128 for the best all-around performance for databases of this size.

Data Size

Fabric SKU

Fabric Price

Synapse DWU

Synapse Price

Cost Saving

100GB

F2

$262.80

500

$5,511.50

21x

 

F64

$8,409.60

1000

$11,023.00

1.3x

1 TB

F2

$262.80

400

$4,409.20

16.8x

 

F32

$4,204.80

1000

$11,023.00

2.6x

 

F64

$8,409.60

1500

$16,534.50

2x

10 TB

F64

$8,409.60

3000

$33,069.00

3.9x

Note: The F64 is included in the table for reference because that is the size equivalent for the Fabric trial capacity.

Value proposition of Fabric’s lowest SKU, F2

For the price of a cup of coffee a day, Fabric’s lowest SKU delivers a state-of-the-art analytics platform at your fingertips. It can serve as your development environment for one or more workspaces with the benefit of workspace (resource) and security isolation. It also delivers excellent data ingestion, power run, and concurrency performance for smaller datasets with the flexibility to perform sub-second, online scale operations if your workload increases or higher concurrency requirements arise.

Synapse dedicated SQL Pools Gen2 limited the number of concurrent queries available by service level. This meant that even though there were enough resources to handle a large number of smaller queries on a DWU100 SQL pool it would only execute four concurrent queries. The only way to increase the number of concurrent queries was to scale up. Workloads needed to run on DWU6000 to unlock the full concurrency capabilities of the platform which maxes out at 128 concurrent queries. With Fabric data warehouse that is no longer the case; concurrency is not limited by the F SKU you are running. Instead, concurrency is only limited by the number of CUs available. In fact, the concurrency tests run for each dataset described earlier in this post included both 10 and 25 concurrent users. That means even an F2 delivers good performance running 25 users with a 100 GB dataset.

Head over to the documentation more information on how Fabric data warehouse delivers best in class performance with features like burstable capacity, automatic workload management, and capacity smoothing.

Value proposition of Fabric’s trial capacity, F64

When you sign up for the Microsoft Fabric trial, your trial includes an F64 SKU that you can use to access all the Fabric product experiences for 60 days. We believe the F64 will provide excellent all-around user experience for anything you throw at it. You can try data engineering with Spark, build Data Factory pipelines, use Power BI’s new Direct Lake mode, and run a 10 TB workload using the data warehouse. The trial is a great opportunity to run a POC or a competitive performance analysis for datasets up to 10 TB. This is an excellent SKU to experience Fabric at scale for free!

Susiję tinklaraščio įrašai

Mapping ​​Azure Synapse dedicated SQL pools to Fabric data warehouse compute

lapkričio 4, 2024 – Salil Kanade

AI is transforming data warehousing, making complex analytics more accessible and efficient. With tools like Copilot for Data Warehouse and AI Skill, Microsoft Fabric offers two powerful, complementary resources that serve both data developers and business users. This blog explores how these tools differ, when to use each, and how they can work together to … Continue reading “Data Warehouse: Copilot & AI Skill”

spalio 31, 2024 – Jovan Popovic

Fabric Data Warehouse is a modern data warehouse optimized for analytical data models, primarily focused on the smaller numeric, datetime, and string types that are suitable for analytics. For the textual data, Fabric DW supports the VARCHAR type that can store up to 8KB of text, which is suitable for most of the textual values … Continue reading “Announcing public preview of VARCHAR(MAX) and VARBINARY(MAX) types in Fabric Data Warehouse”