Use OneLake shortcuts to access data across capacities: Even when the producing capacity is paused!
Shortcuts in Microsoft OneLake allow you to unify your data across domains and clouds by creating a single virtual data lake for your entire enterprise. With shortcuts, data can be reused multiple times, making it simple to consolidate data, without data movement, data duplication or changing ownership of the data. The consumption of data via shortcuts is always counted against the consumer’s capacity, so the data owners don’t have to worry about downstream usage throttling their own capacity. You can even pause the capacity where the data is stored without disrupting downstream consumers! In this blog we will describe how OneLake capacity consumption works when accessing data through a shortcut, particularly across capacities.
Shortcuts in Data Mesh and Medallion Architectures
Cross-capacity shortcuts can appear in several architectural patterns. In a data mesh pattern, data is managed by the designated domain experts versus a centralized IT team. The domain experts can then certify and promote items for downstream consumption across the organization. With OneLake shortcuts, the same copy of data can then be used in multiple domains, across workspaces tied to the same or different capacities. Combining this with the ability to associate workspaces with domains, Fabric makes it easy to implement a data mesh architecture.
You might also utilize multiple capacities in a medallion architecture. One capacity can be dedicated to ingesting and transforming bronze data into silver and gold. The producing team can manage and pay for this capacity. While consumers of this data can use a different capacity to create Power BI reports on top of the gold data. We’ll walk through an example of how OneLake capacity is consumed with this architecture but before we do, let’s recap how OneLake usage is defined.
OneLake Consumption
OneLake storage is billed at a pay-as-you-go rate per GB, like Azure Data Lake Storage or Amazon S3. For simplicity, OneLake doesn’t include a separate charge for transactions (e.g. reads, writes). Instead, transactions consume CUs from your existing Fabric capacity that is also used to run your other Fabric experiences. When the capacity is paused, OneLake storage will continue to be billed.
OneLake Consumption via Shortcuts
Returning to our medallion architecture example, let’s say Capacity1 contains a workspace with bronze data in a lakehouse. Data is loaded daily to this lakehouse and transformed using a Spark notebook. The requests Spark makes to OneLake are billed to Capacity1. Now, let’s say you want to pause Capacity1 and have reports built off this data billed to a separate capacity, Capacity2. To do this, you can create a shortcut in a lakehouse in a workspace tied to Capacity2. You can even create the shortcut after the capacity is paused using shortcut APIs. Now, any requests from Power BI reports or semantic models built off this data will be billed to Capacity2. Pausing the capacity that produced the data will not disrupt downstream consumption.
In this blog, we walked through an example of how OneLake shortcuts across capacities enables you to separate the Fabric CU consumption of data producers and consumers. We encourage you to explore all the ways shortcuts can add value to your architecture!
For more information, check out the following resources: