Microsoft Fabric Updates Blog

Public Preview of OneLake shortcuts to S3-compatible data sources

Microsoft OneLake is a unified data lake for all of your organization’s data. With OneLake shortcuts, you can reference data in different locations and have that data logically represented within OneLake, with no data movement or duplication. With the recent announcement of shortcuts to Google Cloud Storage, you can use shortcuts to seamlessly bring in data from many different cloud sources, including Azure, Amazon Web Services, and Google Cloud Platform.

A widely supported API offered by many distributed file systems and services that offer object storage is the Amazon S3 API. Fabric customers, partners, and other community members have voiced high demand for OneLake shortcuts connectivity to S3 compatible sources.

We heard you loud and clear! We are excited to announce the public preview availability of S3 compatible shortcuts in OneLake!

A screenshot of a computer

Description automatically generated

With this feature, it’s quick and easy to create a shortcut that references your cloud-based S3 compatible data sources. The data source endpoint simply needs to offer S3 compatible APIs, be publicly hosted and accessible, and accept the key/secret credentials you provide during shortcut creation.

Once you set up your shortcut, you can access and use this data with the many Fabric engines or other services using OneLake’s open APIs!

Getting started

Here’s how to get started today:

  1. Open your Fabric lakehouse and create a new shortcut.
    • If your data is already in the Delta Lake format, create your shortcut in the Tables section of your lakehouse. This will allow your table shortcut to benefit from metadata synchronization across Fabric engines, letting you use this structured data where tables are used in Fabric.
    • Otherwise, create a shortcut in the Files section of the lakehouse. and create a shortcut from there.

A screenshot of a computer

Description automatically generated

2. Under External sources, select Amazon S3 compatible (preview).

A screenshot of a computer

Description automatically generated

3. Configure your connection settings. Enter your S3 compatible data source’s publicly reachable endpoint URL, along with the key/secret credential that has authorization to list buckets, get bucket info, list objects, and read data.

    • Be sure to provide a non-bucket-specific endpoint. For example, you can provide https://s3.contoso.com, but not https://s3.contoso.com/myBucket or https://mybucket.s3.contoso.com.

A screenshot of a computer

Description automatically generated

4. Browse to your desired bucket and folder(s) that you would like to reference in OneLake.

A screenshot of a computer

Description automatically generated5. Confirm your choices, rename your shortcut(s) if preferred, and create your shortcuts!

A screenshot of a computer

Description automatically generated

6. That’s all – you can now start analyzing your data throughout Fabric, with no data movement or duplication!

Caching

Just like Amazon S3 and Google Cloud Storage shortcuts in OneLake, your S3 compatible shortcuts also support caching. With caching enabled, egress costs are greatly reduced. As files are read through the shortcut, the files are stored in a cache for the Fabric workspace. Subsequent read requests are served from cache rather than your S3 compatible source directly.

Caching can be enabled for each of your Fabric workspaces. To enable shortcut caching, workspace administrators can open a Fabric workspace and select Workspace settings.

In the workspace settings panel, select the OneLake tab. Switch the toggle for Enable Cache for Shortcuts to On. Then click the Save button.

A screenshot of a computer

Description automatically generated

After that, caching is now enabled for all GCS, S3, and S3 compatible shortcuts in that workspace. To learn more about shortcut caching, see our recent blog post on reducing costs through caching.

What’s next?

Coming soon, we will add support for shortcuts to connect to on-premises and network-restricted environments. This means you will soon be able to use your Fabric on-premises data gateway to add data to OneLake from your on-premises S3 compatible data sources that are not directly exposed to the public internet. Stay tuned for this future improvement!

We hope you enjoy this new feature and find it useful as you plan and build your data solutions with Fabric. As always, we appreciate your feedback and ideas for future improvements. Please submit any feedback or suggestions at Microsoft Fabric Ideas.

Related blog posts

Public Preview of OneLake shortcuts to S3-compatible data sources

April 23, 2024 by Misha Desai

At the recent Fabric Conference, we announced that both code-first automated machine learning (AutoML) and hyperparameter tuning are now in Public Preview, a key step in making machine learning more complete and widely accessible in the Fabric Data Science. Our system seamlessly integrates the open-source Fast Library for Automated Machine Learning & Tuning (FLAML), offering … Continue reading “Introducing Code-First AutoML and Hyperparameter Tuning: Now in Public Preview for Fabric Data Science”

April 18, 2024 by Santhosh Kumar Ravindran

We are excited to announce a new feature which has been a long ask from Synapse Spark customers, Optimistic Job Admission for Spark in Microsoft Fabric.This feature brings in more flexibility to optimize for concurrency usage (in some cases ~12X increase) and prevents job starvation. This job admission approach aims to reduce the frequency of … Continue reading “Introducing Optimistic Job Admission for Fabric Spark”