Microsoft Fabric Updates Blog

Virtualize your Cloudera/Hadoop data estate into Fabric OneLake with Apache Ozone

OneLake Shortcut

Microsoft Fabric OneLake shortcuts facilitate the virtualization of data from various cloud object stores and on-premises environments. For on-premises sources like Cloudera/Apache Ozone, the OneLake S3 Compatible Shortcut can be utilized to connect to these data sources. With OneLake Shortcuts, users can create a virtual reference to their Cloudera cluster data without moving or duplicating the data. To learn more about Fabric OneLake shortcuts, reference this blog OneLake with shortcuts.

Apache Ozone

Apache Ozone is an open-source, scalable object store designed for analytics, offering compatibility with the S3 API. It is available on Cloudera CDP runtimes, enabling the creation of a OneLake S3 Compatible Shortcut to your on-premises Hadoop cluster. For further information, please refer to Cloudera documentation Introduction to Ozone.

Lakehouse and Open Data Platforms

Both Microsoft Fabric and Cloudera CDP have adopted the Lakehouse architecture and leverage the Apache Iceberg open data format. Furthermore, Microsoft Fabric offers native support for the Delta Lake open format. This discussion will examine methods for integrating these two platforms.

Benefits and Business Case

Cloudera customers now have the capability to extend their on-premises Hadoop data estate to the cloud without requiring data movement. This OneLake shortcut to Apache Ozone facilitates seamless migration to the cloud or allows you to burst your Gold Data Products to the cloud, thereby leveraging OnDemand cloud compute resources.

Get started with Microsoft Fabric and Cloudera/Apache Ozone Clusters

The remainder of the article will show the integration in action.

Apache Ozone/Cloudera Prerequisites

  • Configure Apache Ozone Filesystem on your Cloudera cluster.
  • Configure the Ozone S3 Gateway and expose non-default volumes to the default /s3v/ volume. Also note the Gateway endpoint URL for later.
  • With the Ozone CLI, create AWS access key ID and AWS secret key credentials for Microsoft Fabric Shortcut. Save this information for later.
  • Migrate HDFS data to Ozone. Preferably Iceberg or Delta tables with their respective data and metadata subdirectories and link them to the Ozone S3 default /s3v/ volume.
  • Validate the data is available in the S3 default /s3v/ volume for Fabric Shortcut.

A computer screen with white text

AI-generated content may be incorrect.

aws s3api –endpoint http://localhost:9878 list-objects –bucket icebergdata

In this instance, the command shows the DIM_Geographies table that was copied to the default S3 volume within the icebergdata Ozone bucket. Both the data and metadata directories for this table are accessible, which is essential for our Fabric OneLake Shortcut Table.

Azure Prerequisites

A screenshot of a computer

AI-generated content may be incorrect.

In this example, we navigate to the virtual machine where the gateway was installed and configured. You can see my Fabric-Gateway-Ozone gateway is online and ready to communicate with my Microsoft Fabric environment.

If we navigate back to Microsoft Fabric and explore the OneLake Catalog, you will see my Cloudera-OnPrem-Data Workspace and OzoneToFabricLH Lakehouse.

A screenshot of a computer

AI-generated content may be incorrect.

Next, we will select the OzoneToFabricLH Lakehouse to open it. Subsequently, click on the ‘…’ next to Tables on the left side of the screen to create a new shortcut to Apache Ozone. This will enable the virtualization of our DIM_Geographies iceberg table without any data duplication.

A screenshot of a computer

AI-generated content may be incorrect.

A screenshot of a computer

AI-generated content may be incorrect.

Next, we will create a new connection to the On-Premises Data Gateway using the Apache Ozone credentials we created earlier: AWS access key ID and AWS secret key.

A screenshot of a computer

AI-generated content may be incorrect.

In this example, Fabric automatically detects the On-Premises Data Gateway. The user needs to provide the URL for the Ozone s3api endpoint in the Cloudera environment, the AWS access key ID, and the AWS secret key, then select Next.

A screenshot of a computer

AI-generated content may be incorrect.

In this example, a successful connection to Ozone is demonstrated. Proceed by selecting your iceberg table directory and clicking Next. Please ensure that the data and metadata subdirectories are present, as these are required for Fabric to recognize and translate this path as an iceberg table.

A screenshot of a computer

AI-generated content may be incorrect.

To rename the Shortcut, click on the pencil icon to make the desired changes. For this example, we will maintain all default settings and proceed by clicking Create.

A screenshot of a computer

AI-generated content may be incorrect.

Upon the successful creation of the Shortcut, a virtualized version of the table will appear in Lakehouse within Fabric. Please observe the link icon next to the table name, which indicates that it is a Shortcut rather than a natively managed table in Fabric.

Next, choose the Table name. Fabric will execute an initial read query on the table in Ozone. To reduce network communication, Fabric caches recently queried data. As the cache reaches its limit, it replaces older result sets with those from newer queries.

With our data now integrated into Fabric, we can utilize Power BI to visualize it and develop advanced dashboards using existing data from Azure or other cloud platforms.

A screenshot of a graph

AI-generated content may be incorrect.

Summary

This article examines the Fabric OneLake Shortcut to S3-compatible solutions such as Apache Ozone on Cloudera clusters. This integration aids in consolidating data silos, simplifying data pipelines, facilitating the construction of agentic systems from a centralized data estate by utilizing Azure AI Foundry, and enhancing efficiency in enabling users and generating business value.

What’s Next

For further information on this solution, please contact your Microsoft account team for guidance on using Fabric with your Cloudera on-premises data estate.

References

Shortcut Blog: Virtualize your existing data into OneLake with shortcuts

Cloudera Ozone Doc: Introduction to Ozone

Cloudera Apache Iceberg: Apache Iceberg features

Configure Cloudera Ozone Filesystem: Working with Ozone File System (ofs)

Configure Cloudera Ozone S3 Gateway: Using Ozone S3 Gateway to work with storage elements

Create Ozone Credentials: Configure S3 credentials for working with Ozone

Migrate HDFS data to Ozone: Process of migrating the HDFS data to Ozone

Get Started with: Fabric Trial

 Create Fabric Workspace

Create Fabric Lakehouse: Bring your data to OneLake with Lakehouse

Download and Install On-premises Data Gateway

Next steps 

We can’t wait for you to try out OneLake shortcuts on your own data and let us know what you think. Submit your feedback on Fabric Ideas and join the conversation on the Fabric Community. To get into the technical details, head over to the Fabric documentation. 

Related blog posts

Virtualize your Cloudera/Hadoop data estate into Fabric OneLake with Apache Ozone

October 29, 2025 by Adam Saxton

This month’s update delivers key advancements across Microsoft Fabric, including enhanced security with Outbound Access Protection and Workspace-Level Private Link, smarter data engineering features like Adaptive Target File Size, and new integrations such as Data Agent in Lakehouse. Together, these improvements streamline workflows and strengthen data governance for users. Contents Events & Announcements Fabric Data … Continue reading “Fabric October 2025 Feature Summary”

October 20, 2025 by Tzvia Gitlin Troyna

The Eventhouse Endpoint for Lakehouse is a powerful new capability in Microsoft Fabric that enables users to query Lakehouse tables with exceptional speed and ease, delivering real-time insights with high performance with large data volume, flexibility, advanced analytics capabilities, support for enhanced data formats such as strings and dynamic types and simplicity. Whether you’re working … Continue reading “Unlock Real-Time Intelligence with the Eventhouse Endpoint for Lakehouse”