OneLake: your foundation for an AI-ready data estate
For years, organizations have aspired to build a culture where data isn’t just accessible—it’s woven into every decision. And now with generative AI, AI assistants are making it easier than ever for business users to explore data, quickly answer their pressing data questions, and even build custom agents on their data. And yet, for many, the promise of a truly data-driven culture remains elusive. The typical data estate has grown organically over time, with many different, team-specific data tools and services. These varied layers and silos lead to data sprawl and duplication, access issues, and even data exposure risks—making it hard for data teams and end users to access, find, and use the data they need to unlock insights.
A decade ago, we faced the same issues with document sharing. Sharing documents with your coworkers meant emailing attachments or managing files on local network drives. Then, cloud services like OneDrive and Dropbox transformed document sharing and collaboration by providing a single, accessible home for files. In the data realm, a similar transformation is happening now with OneLake.
Instead of the patchwork of storage accounts and ad-hoc data marts scattered across departments, organizations need a single, unified access point for all their data. Now with Microsoft OneLake, we have the solution. With OneLake, you can access your entire multi-cloud data estate from a single data lake that spans the entire organization. Similar to how OneDrive is wired into all your Microsoft 365 applications and provides a convenient storage location, OneLake acts as the central, accessible location for comprehensive data access and management.
In this blog post, we’ll explore why OneLake is the ideal data lake to unify your data estate and help you create AI applications, focusing on five key pillars: breaking down silos, connecting to all your data, working from a single data copy, discovering and managing in a data catalog, and sharing data with granular security.
Breaking down siloes with a unified data foundation
Traditionally, every department, team, and even project in an organization creates their own siloed data stores to maintain data ownership and granular control over security and compliance. The result, however, is a fragmented patchwork of ‘data islands’. This siloed system can’t keep up with fast paced data projects, especially as frontier firms start deploying agents across the organization that need access to cross-department data.
Instead, you can deploy OneLake as the central data access point for the entire organization. Every Microsoft Fabric tenant comes with just a single OneLake instance, with no additional infrastructure to manage. Every department, team, and project can store or connect their data to a single unified data lake and then use a system of Fabric domains, sub-domains, and workspaces—each with their own administrator—to organize their data into a logical data mesh. This system maintains data ownership and allows for federated governance while ensuring authorized users can discover and use data from other domains without friction. Watch this video to see how you can set up your own logical data mesh in OneLake:
By consolidating data access to one place, OneLake dramatically simplifies data sharing and integration. When a data project requires data from multiple departments, users can query and combine data from multiple domains directly in OneLake rather than requesting exports or setting up complex pipeline jobs. And OneLake’s reach isn’t limited to Azure, it can virtualize data from across your other clouds and will appear just like any other data item in OneLake.
Connect to any data, anywhere without duplication
With your data mesh in OneLake organized, you then have the tools to connect to all of the data sources in your data estate. Most data estates naturally span multiple clouds, accounts, databases, domains, and engines, and data professionals spend half their time trying to connect data sources to incompatible platforms or updating their out-of-date data with complex data pipelines. With OneLake, we’ve simplified how you bring data in with a zero-copy, zero-ETL approach with two key Fabric capabilities: shortcuts and mirroring.
OneLake shortcuts enable your data teams to virtualize data in OneLake without having to move and duplicate it. They act essentially as metadata pointers, similar to a shortcut on your desktop. This capability is particularly adept at helping you break down siloes across your data estate and even between OneLake domains. You can create shortcuts to data which lives in another domain or workspace, while ensuring only one copy of the data exists. Shortcuts even preserve data ownership and governance across domains, meaning if you update your data item or restrict access to it, all users who’ve bypassed to the data will instantly see the change. With shortcut transformations, you can even apply automatic changes to the data like converting the data format or removing PII data. We have shortcuts available for OneLake, Azure Data Lake Storage, Azure Blob storage, Amazon S3 and S3 compatible sources, Iceberg-compatible sources, Microsoft Dataverse, on-premises sources, and more on their way.
You can also use mirroring, a no-ETL experience to add proprietary databases or data warehouses to Fabric. Depending on the data source, mirroring can either replicate the entire database or just the metadata in OneLake in Delta Parquet tables and keep the data in sync in near real time. We currently have Mirroring enabled for Azure Cosmos DB, Azure SQL DB, Azure SQL MI, Azure PostgreSQL, Azure Databricks Unity Catalog, Snowflake, and many more sources coming soon including SQL Server, SQL Server 2025, Oracle, and Dataverse. With Open Mirroring, you can even create custom mirroring experiences for your own applications.
Check out this quick demo of these features in action:
The benefits of these innovative, no-ETL options are massive. No more cumbersome ETL pipelines, no more sprawling, out-of-date copies of the data, and no more data siloes across every part of your business. Once your data is connected to OneLake, you only need a single copy across every engine.
Collaborate on a single copy of data with open formats
When we built OneLake and the Fabric engines, we designed them to support open data formats, standardizing on both the Delta Parquet and Apache Iceberg formats. This commitment to common open data formats means that you need to load your data into OneLake once and all the Fabric engines can operate on the same data, without having to separately ingest it. Having only one copy of the data means teams can collaborate on a single source of truth rather than fragmenting information into endless copies in each stage of the analytics journey.
Creating multiple copies of the same data not only wastes storage space but also leads to version mismatch. By eliminating redundant copies, OneLake ensures everyone is working from the most up to date version of the data without refresh delays or manual syncs. Instead of marketing and finance creating separate copies of a lakehouse with customer revenue data, they can work from the same data with different metadata, filters, and BI reports added. IT teams can spend less time maintaining complex pipelines and admins only have one copy to manage with far easier audit trails to follow. Moreover, data professionals can easily pick the engine they most prefer, whether its T-SQL or Spark, knowing all the engines are optimized for Delta Parquet and will work from the same copy.
Everyone operates on the same single version of truth, from a data scientist training a model to an executive reviewing a dashboard, driving a more aligned and efficient organization.
Discover, manage, and govern in a complete catalog
Minimizing data duplication and sprawl also requires ensuring the right people can find and explore the right data. The benefits of a data culture have been clear for years, but with generative AI the potential business impact is increasing exponentially. Frontier firms are already using AI assistants and building custom agents to transform how their teams interact with data from technical professionals creating data items and drafting code to business users quickly answering their pressing data questions. But crucially, this culture requires that everyone has the ability to discover high quality data.
That’s where the OneLake catalog comes in. We’ve designed the OneLake catalog to be the single place for data professionals and business users to discover, manage, and govern the data they own and can access across OneLake. With over 30M monthly active Power BI and Fabric users, it’s already the default source of data and insights for many business users. The OneLake catalog comes with two tabs, Explore and Govern, that can help all Fabric users discover and manage trusted data, as well as provide governance insights for data owners.
Instead of searching through a maze of databases or SharePoint sites, users can use the Explore tab and even narrow their search by domain, workspace, item type, endorsements, and more to find exactly what they need in seconds. You can then deep dive into a data item to see its description, owner, schema, lineage, and usage metrics. We’ve also integrated OneLake catalog everywhere your people work including Microsoft Teams, Microsoft Excel, Microsoft Copilot Studio, and 100s of other scenarios—bringing data access to the 350 million Microsoft 365 users.
In the Govern tab, data owners can get out of the box insights and recommended actions based on the curation and quality level of their data based on sensitivity label coverage, tagging, endorsements, data location, and more.
Check out the full demo of the OneLake catalog:
Share broadly with granular security and control
However, while broad access to data is critical for empowering the business, security leaders know that cyber-attacks are becoming more sophisticated, and the average cost of a single breach is nearing $10 million. Traditionally, the response is to lock down access to only trusted users, but our research tells us that 63% of data breaches stem from inadvertent, negligent or malicious insiders. The reality is people will try to work around lock down controls using tools like Excel which are harder to govern, less transparent, and harder to maintain.
That’s why we’ve designed OneLake security—an experience designed to help you share data across your organization without exposing sensitive information. With OneLake security, you can create roles to set permissions at the data item, folder, table, or even row/column level, enabling you to still share a data item while restricting access to any sensitive data your item may contain. These permissions are then automatically enforced across all analytics experiences, so whether a user is querying data through a Spark notebook, viewing it in a Power BI report, or exploring it through a Fabric data agent, OneLake’s security model ensures they see only what they’re permitted.
Check out this visual to see how OneLake security works:

This unified approach to security means users no longer have to maintain separate permissions across different engines. It also means the original data owners always maintain control over who can access the data source, even if the data is bypassed to another lakehouse or workspace owned by someone else. The end result is that data sharing can be done safely, knowing you have the fine-grained controls in place.
Check out this full overview video:
On top of this built-in security, you can also leverage the same security features from tools like Microsoft 365 with Purview Information Protection sensitivity labels and Purview Data Loss Prevention (DLP) policies. Technical and non-technical users alike can apply sensitivity labels to classify their data items, automatically restricting access based on the data item’s sensitivity even when the data exported to other tools like Microsoft Excel. DLP policies will also automatically detect when sensitive data is uploaded to unauthorized destinations, alerting users and offering guidance to mitigate risks.
In short, OneLake’s security model means you get the benefit of broad data accessibility and self-service analytics without sacrificing oversight and control. Together, these capabilities provide a unified, enterprise-grade framework for securing data, enabling responsible AI use, and ensuring compliance across the OneLake environment.
Building data-driven agents with curated data from OneLake
Creating custom AI experiences requires data—lots of it. Data is the foundation on which AI is built, and the simple fact is AI is only as good as the data it’s based on. For generative AI solutions to be as accurate as possible, they need to be built with clean data and in a semi-structured way. With your data in OneLake, you can use Fabric’s various workloads to make the data AI-ready. Fabric has tools for data integration and engineering, data warehousing, data science, real-time analytics, data modeling and visualization, and even has native, industry-specific and partner-created workloads to help you accelerate your data projects.
You can then directly connect your data to AI platforms like Azure AI Foundry to build and scale data-driven GenAI apps. We’ve built native integration between OneLake and Azure AI Foundry to make this as seamless as possible. The integration between Azure AI Foundry and OneLake is built on OneLake shortcuts, helping you work with your structured and unstructured data from OneLake in Azure AI Foundry without creating copies and adding more data sprawl. OneLake also directly integrates with Azure AI Search, which can store, index, and retrieve data, including vector embeddings, from your data sources including OneLake.
Finally, you can ground your Azure AI Agent’s responses with data from Fabric using Fabric data agents to unlock powerful data analysis capabilities. Fabric data agents are AI-powered assistants that can learn, adapt, and deliver insights, allowing users to interact with the data through chat. With out-of-the-box authorization, this integration simplifies access to enterprise data in Fabric while maintaining robust security, ensuring proper access control and enterprise-grade protection.
Check out this full demo:
Conclusion: A unified data lake for your entire organization
Microsoft OneLake is more than just a new tool—it’s the strategic centerpiece of a data estate that can reshape how an organization harnesses data. By unifying data in one place and breaking down silos, it can become the single point for all your users to discover and explore your organization’s data organized into a logical data mesh. With shortcuts and mirroring in OneLake, you can unify all of your multi-cloud and on-premise sources and enable your people to work from a single copy of data—meaning fewer copies of data, better collaboration between your teams, and more streamlined analysis. By enabling collaboration on a single copy of data, it ensures every decision is based on the same facts, eliminating the version control and governance nightmares.
Organizations like Lumen, IFS, NTT Data, and the Chalhoub Group have all adopted Microsoft OneLake and Microsoft Fabric to unify ingestion, storage, and analytics in one platform. Using OneLake shortcuts, mirroring, Direct Lake mode, and more, Lumen—a leader in enterprise connectivity—cut 10,000 hours of manual effort, “We used to spend up to six hours a day copying data into SQL servers,” says Chad Hollingsworth, Cloud Architect at Lumen. “Now it’s all streamlined… OneLake allowed us to ingest once and use anywhere.” IFS, a leading provider of enterprise software, faced high costs and complexity from a fragmented data architecture. The company unified their data estate on Microsoft OneLake, increasing data access from 20% to more than 85%, cut costs, and accelerated insights, “the primary challenge we faced was the slow pace of development caused by managing separate extract, transform, load (ETL) processes and reporting environments,” said Ligy Terrance, Director of Data Analytics and Integration at IFS. “With Microsoft Fabric, we now have a unified platform that brings all these layers together… Having everything in one place has eliminated integration bottlenecks and made it much easier to deliver insights quickly and efficiently.”
For organizations trying to manage their ever-growing data estate, the implications are significant. OneLake’s approach translates to less data sprawl and lower total costs, less time spent by IT maintaining complex data pipelines and by users looking for data, and faster time to insights for data professionals. With its robust security and governance story, you can help ensure your data is secure while empowering your users with decision-changing data.
Learn more about how OneLake can work with your data estate
Join us for a series of blog posts over the next few months as we explore why Microsoft OneLake is the ideal data platform for the entire data estate. We’ll walk you through how OneLake integrates with each of these platforms, highlight top opportunities and use cases, and feature customers who’ve successfully transformed their existing solutions with OneLake. Check back to the Fabric blog site to find the latest blogs or bookmark this blog and we will update the list below with links to the relevant blogs.
We are planning the following topics:
- OneLake and Azure AI Foundry: Build data-driven agents with curated data from OneLake
- OneLake and Snowflake: Snowflake and Microsoft announce expansion of their partnership
- OneLake and Azure Databases: Coming soon
- OneLake and Azure Databricks: Coming soon
- OneLake and Azure Data Factory: Coming soon
- OneLake and Microsoft 365: Coming soon
- OneLake and Microsoft Copilot Studio: Coming soon
- OneLake and open-source solutions: Coming soon