Microsoft Fabric Updates Blog

Virtualize your existing data into OneLake with shortcuts

Connect data silos without moving or copying data

Data silos continue to be a major challenge for enterprise data analytics solutions. Organizations within an enterprise often operate autonomously from one another, building solutions that only support their own needs and utilizing different tools, processes, and standards along the way. Collaboration across different organizations can be incumbered by bureaucracy and budget constraints. And once cross-org agreements are established, data movement needs to be carefully orchestrated, adding latency to data refreshes and new opportunities for failures. This can result in inconsistent data across reports and leaders may lose confidence in the accuracy of the information presented to them.
 
Enterprises often try to combat this problem by establishing a single enterprise data lake to house data across all of their organizations. The scale of these solutions, however, becomes their greatest downfall. With the need to onboard thousands of data sources, the backlog can reach multiple years long. In the meantime, business operations must continue and new silos form.
 
Rather than forcing organizations to disband silos and migrate data to enterprise solutions we need a better way to leverage data from where it already resides and eliminate data copies and data movement all together.
  

The OneLake approach

OneLake allows you to create special folders that point to other storage locations. We call these special folders “shortcuts”. Shortcuts allow you to create virtualized data products comprised of multiple data silos across any number of organizations within your enterprise. You no longer need to worry about managing multiple copies of data or orchestrating any data movement. Simply create a shortcut that points to the data you want to access, and it appears within your Lakehouse immediately.
 

Real-world examples

Let’s say that your team wants to start using Fabric today but your organization’s data is stored in ADLS. Traditionally, you would schedule a job to periodically copy the data from ADLS into the new platform. The size of the data and frequency of the schedule would dictate the latency of your data refresh. Additionally, you would also set up monitoring and alerting to ensure the job completed on schedule and without failures. This is a significant investment in time and effort and adds a lot of process overhead just to make your data available.
 
Shortcuts greatly simplify this. You start by creating a Lakehouse in Fabric. A Lakehouse is an item in Fabric that lets you organize data for a specific purpose like Marketing Analytics. It provides support for both data engineering workloads through Spark and data consumption through the SQL serving layer. The Lakehouse also provides a view of your data in OneLake. In the Lakehouse you can create shortcuts that point to other storage locations. In this case you can create a shortcut that points to the ADLS account, where your organization’s data is stored. The shortcut just appears as another folder within the Lakehouse. When you open the folder, all your data in the ADLS account appears immediately without managing any data movement.


 
You can choose to create one shortcut or many within the Lakehouse. When you create the shortcut, you specify the folder within the ADLS storage account you want the shortcut to point to. This can be at any level of the hierarchical namespace and allows you to control the scope of data available through the Lakehouse. Additionally, if you point the shortcut to a folder containing data in the Delta format, it will automatically be recognized as a table and table meta data will get registered with Spark and SQL. The Lakehouse provides a single place to curate your data product then enables you to use your data throughout fabric. Spark, SQL, Dataset and Power BI reports can all read data directly from the lake via your Lakehouse. Regardless of whether your data is stored locally or accessed through a shortcut it all just works.
 

Build a virtualized lake

As you build out your Lakehouse, you will likely want to utilize data from many different silos. You may want to source data from another organization, but their data is in a different storage account or even a different cloud provider. Within your Lakehouse you can create shortcuts that point to different storage accounts. These accounts can be distributed across subscriptions and tenants. In addition to creating ADLS shortcuts, you can even create shortcuts to Amazon S3 accounts.
 
Shortcuts can also point to other items within Fabric including data warehouses, KQL databases and other Lakehouses. Perhaps your sales organization maintains a customer master in a data warehouse in Fabric. Your marketing organization can leverage this customer master by simply creating a shortcut from their Lakehouse to a table in the data warehouse. Because the data is never copied, any updates to the customer master in the data warehouse are immediately reflected in the Lakehouse.


 
Shortcuts provide the connectivity necessary to unlock data silos. Don’t wait for IT to finish building your enterprise data lake. Use shortcuts to simply and easily create virtualized lakes that unite data across organizations, accounts, subscriptions and clouds. Spend less time moving your data and more time analyzing it.
 
Watch the following video to see shortcuts in action.


 
Review OneLake shortcuts for more information.
  

Get started with Microsoft Fabric

Microsoft Fabric is currently in preview. Try out everything Fabric has to offer by signing up for the free trial—no credit card information required. Everyone who signs up gets a fixed Fabric trial capacity, which may be used for any feature or capability from integrating data to creating machine learning models. Existing Power BI Premium customers can simply turn on Fabric through the Power BI admin portal. After July 1, 2023, Fabric will be enabled for all Power BI tenants.

 

Sign up for the free trial. For more information read the Fabric trial documentation.

  

Other resources

If you want to learn more about Microsoft Fabric, consider:

Related blog posts

Virtualize your existing data into OneLake with shortcuts

April 11, 2024 by Matthew Hicks

Microsoft OneLake is a unified data lake for all of your organization’s data. With OneLake shortcuts, you can reference data in different locations and have that data logically represented within OneLake, with no data movement or duplication. With the recent announcement of shortcuts to Google Cloud Storage, you can use shortcuts to seamlessly bring in … Continue reading “Public Preview of OneLake shortcuts to S3-compatible data sources”

April 11, 2024 by Trevor Olson

We are excited to announce that you can now create OneLake shortcuts to your Google Cloud Storage (GCS) buckets. With the addition of GCS, you can now utilize cross-cloud shortcuts to analyze your data across all three major cloud platforms. Shortcuts in OneLake allow you to connect to your existing data through a single unified … Continue reading “Shortcuts to Google Cloud Storage, now available in Public Preview”