Virtualize your Cloudera/Hadoop data estate into Fabric OneLake with Apache Ozone
OneLake Shortcut
Microsoft Fabric OneLake shortcuts facilitate the virtualization of data from various cloud object stores and on-premises environments. For on-premises sources like Cloudera/Apache Ozone, the OneLake S3 Compatible Shortcut can be utilized to connect to these data sources. With OneLake Shortcuts, users can create a virtual reference to their Cloudera cluster data without moving or duplicating the data. To learn more about Fabric OneLake shortcuts, reference this blog OneLake with shortcuts.
Apache Ozone
Apache Ozone is an open-source, scalable object store designed for analytics, offering compatibility with the S3 API. It is available on Cloudera CDP runtimes, enabling the creation of a OneLake S3 Compatible Shortcut to your on-premises Hadoop cluster. For further information, please refer to Cloudera documentation Introduction to Ozone.
Lakehouse and Open Data Platforms
Both Microsoft Fabric and Cloudera CDP have adopted the Lakehouse architecture and leverage the Apache Iceberg open data format. Furthermore, Microsoft Fabric offers native support for the Delta Lake open format. This discussion will examine methods for integrating these two platforms.
Benefits and Business Case
Cloudera customers now have the capability to extend their on-premises Hadoop data estate to the cloud without requiring data movement. This OneLake shortcut to Apache Ozone facilitates seamless migration to the cloud or allows you to burst your Gold Data Products to the cloud, thereby leveraging OnDemand cloud compute resources.

Get started with Microsoft Fabric and Cloudera/Apache Ozone Clusters
The remainder of the article will show the integration in action.
Apache Ozone/Cloudera Prerequisites
- Configure Apache Ozone Filesystem on your Cloudera cluster.
- Configure the Ozone S3 Gateway and expose non-default volumes to the default /s3v/ volume. Also note the Gateway endpoint URL for later.
- With the Ozone CLI, create AWS access key ID and AWS secret key credentials for Microsoft Fabric Shortcut. Save this information for later.
- Migrate HDFS data to Ozone. Preferably Iceberg or Delta tables with their respective data and metadata subdirectories and link them to the Ozone S3 default /s3v/ volume.
- Validate the data is available in the S3 default /s3v/ volume for Fabric Shortcut.

aws s3api –endpoint http://localhost:9878 list-objects –bucket icebergdata
In this instance, the command shows the DIM_Geographies table that was copied to the default S3 volume within the icebergdata Ozone bucket. Both the data and metadata directories for this table are accessible, which is essential for our Fabric OneLake Shortcut Table.
Azure Prerequisites
- Start your FREE 60-day Fabric trial, if you don’t already have a Fabric tenant on Azure.
- Create a Fabric Workspace.
- Create a Fabric OneLake Lakehouse.
- Download and Deploy the On-premises Data Gateway.
- Validate Gateway connectivity to Microsoft Fabric is ready.

In this example, we navigate to the virtual machine where the gateway was installed and configured. You can see my Fabric-Gateway-Ozone gateway is online and ready to communicate with my Microsoft Fabric environment.
If we navigate back to Microsoft Fabric and explore the OneLake Catalog, you will see my Cloudera-OnPrem-Data Workspace and OzoneToFabricLH Lakehouse.

Next, we will select the OzoneToFabricLH Lakehouse to open it. Subsequently, click on the ‘…’ next to Tables on the left side of the screen to create a new shortcut to Apache Ozone. This will enable the virtualization of our DIM_Geographies iceberg table without any data duplication.


Next, we will create a new connection to the On-Premises Data Gateway using the Apache Ozone credentials we created earlier: AWS access key ID and AWS secret key.

In this example, Fabric automatically detects the On-Premises Data Gateway. The user needs to provide the URL for the Ozone s3api endpoint in the Cloudera environment, the AWS access key ID, and the AWS secret key, then select Next.

In this example, a successful connection to Ozone is demonstrated. Proceed by selecting your iceberg table directory and clicking Next. Please ensure that the data and metadata subdirectories are present, as these are required for Fabric to recognize and translate this path as an iceberg table.

To rename the Shortcut, click on the pencil icon to make the desired changes. For this example, we will maintain all default settings and proceed by clicking Create.

Upon the successful creation of the Shortcut, a virtualized version of the table will appear in Lakehouse within Fabric. Please observe the link icon next to the table name, which indicates that it is a Shortcut rather than a natively managed table in Fabric.
Next, choose the Table name. Fabric will execute an initial read query on the table in Ozone. To reduce network communication, Fabric caches recently queried data. As the cache reaches its limit, it replaces older result sets with those from newer queries.
With our data now integrated into Fabric, we can utilize Power BI to visualize it and develop advanced dashboards using existing data from Azure or other cloud platforms.

Summary
This article examines the Fabric OneLake Shortcut to S3-compatible solutions such as Apache Ozone on Cloudera clusters. This integration aids in consolidating data silos, simplifying data pipelines, facilitating the construction of agentic systems from a centralized data estate by utilizing Azure AI Foundry, and enhancing efficiency in enabling users and generating business value.
What’s Next
For further information on this solution, please contact your Microsoft account team for guidance on using Fabric with your Cloudera on-premises data estate.
References
Shortcut Blog: Virtualize your existing data into OneLake with shortcuts
Cloudera Ozone Doc: Introduction to Ozone
Cloudera Apache Iceberg: Apache Iceberg features
Configure Cloudera Ozone Filesystem: Working with Ozone File System (ofs)
Configure Cloudera Ozone S3 Gateway: Using Ozone S3 Gateway to work with storage elements
Create Ozone Credentials: Configure S3 credentials for working with Ozone
Migrate HDFS data to Ozone: Process of migrating the HDFS data to Ozone
Get Started with: Fabric Trial
Create Fabric Lakehouse: Bring your data to OneLake with Lakehouse
Download and Install On-premises Data Gateway
Next steps
We can’t wait for you to try out OneLake shortcuts on your own data and let us know what you think. Submit your feedback on Fabric Ideas and join the conversation on the Fabric Community. To get into the technical details, head over to the Fabric documentation.
