Microsoft Fabric Updates Blog

Fabric OPENROWSET function (Preview)

We are excited to announce the preview of the OPENROWSET function in the Fabric Data Warehouse and SQL endpoint for Lakehouse. This powerful function allows you to read the content of external files stored in Azure Data Lake Storage and Azure Blob Storage without the need to ingest them into the Data Warehouse.

With the OPENROWSET function, you can effortlessly read the content of Parquet or CSV files and return a set of rows that represent their content. A syntax of OPENROWSET function is shown in the example:

SELECT *
FROM OPENROWSET( BULK <uri> )
WITH ( <column definitions> ) AS file_data

You just need to provide an URI of the file that you want to read, optionally specify the columns, and the OPENROWSET function will return content of the file as a set of rows. This functionality provides an excellent way to access and analyze your data directly from its source.

Refer to the table for an overview of the supported and unsupported functionalities in preview:

 SupportedNot supported
File formatsParquet, CSVDelta, CosmosDB
StorageAzure Blob storage, ADLSOneLake, OneLake shortcuts
AuthenticationEntraID passthrough, public storageSAS/SAK, SPN, Workspace Identity, Managed Identity
Supported and unsupported featues in Fabric OPENROWSET function

There are several interesting scenarios where the OPENROWSET function proves to be highly useful in data warehousing scenarios.

Preview content of files

The OPENROWSET function is invaluable when you need to inspect the content of CSV and Parquet files, discover their schema, and preview the content of the files. By simply providing the URL of your file on Azure Data Lake Storage or Azure Blob Storage, you can quickly access and view the file content along with its columns.

In the following example demonstrates how to preview the content of publicly available file containing information about the Covid cases during the COVID-19 pandemic:

Preview content of the file in Fabric query editor

The OPENROWSET function automatically inspects the underlying files, determines the column schema, and returns the rows from the file as a row set.

In addition to manually inspecting files, you can leverage OPENROWSET to automatically create the table structure needed to load data.

SELECT TOP 0 *
INTO bing_covid_table
FROM OPENROWSET( BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet' ) AS r

This SELECT INTO statement will create a target table that matches the schema of the source files discovered by the OPENROWSET function. By using TOP 0, no rows are inserted, resulting in an empty target table ready to be loaded by COPY INTO statement.

You can generate the CREATE TABLE script for this table and customize the column types before you start loading data.

Customize the data source schema

The OPENROWSET function enables you to customize and fine-tune the schema of the result set by specifying the columns you expect to be returned. The WITH clause in the OPENROWSET function enables you to specify what are the underlying columns that you want to return in the query:

SELECT *
FROM OPENROWSET(BULK 'https://<storage_path>/covid/bing_covid-19_data.parquet')
WITH (updated date, confirmed int, recovered int,
      latitude float, longitude float,
      iso2 varchar(2), country_region varchar(16)) AS data

Unlike the OPENROWSET function that automatically discovers the schema, in this scenario, you can specify which columns you want to return. This feature is particularly useful in the following situations:

  • You don’t need to return all columns, especially if the file has hundreds of columns.
  • You need to specify more accurate types for your columns, as OPENROWSET will infer the largest possible types like BIGINT or VARCHAR(8000) while discovering the underlying schema. By using the WITH clause, you can fine-tune these choices and specify more precise types like SMALLINT or VARCHAR(16) where needed.

Querying partitioned datasets

Another scenario where you will find OPENROWSET very useful is querying partitioned datasets. If your dataset is partitioned in a folder hierarchy where the folder names represent the values that should be returned as columns for the files, you can leverage the filepath() function to extract parts of the URL by using wildcards in the path:

SELECT r.filepath(2) as month, r.filepath(3) as day, r.*
FROM OPENROWSET( BULK 'https://storage.blob.core.windows.net/covid/year=*/month=*/day=*/*.parq') AS r
WHERE r.filepath(1) = ‘2021’

The argument in the filepath() function references the text matched by the corresponding wildcard in the path. In addition to retrieving folder names based on the pattern, you can use the filepath() function in the WHERE clause to perform partition elimination and improve query performance.

Transforming the source data at query time

The COPY INTO statement is the primary mechanism for ingesting data. However, thanks to the flexibility of the T-SQL language, you can leverage the INSERT SELECT statement to read the content of external files and ingest them directly into a table.

Although this should not be your first choice for ingestion, there are scenarios where you might find it useful:

  • Query-time data transformation: With INSERT+SELECT+OPENROWSET patern, you can transform the values you are reading before ingesting them into the target table. This is useful if it is complex to update the values once they are loaded into the table.
  • Loading partition values: OPENROWSET enables you to retrieve values from folder and file names if you need to insert them.
  • Loading partial datasets: OPENROWSET allows you to load individual partitions or files or reduce the data set to a subset of rows based on certain predicates.

An example of using INSERT SELECT with OPENROWSET for advanced filtering and data transformation at the query time:

INSERT INTO target_table
SELECT r.filepath(2) AS month, r.filepath(3) AS day, 
       r.cases, c.recovered, ISNULL(c.iso3, 0) AS iso
FROM OPENROWSET(BULK 'https://storage.blob.core.windows.net/covid/year=*/month=*/day=*/*.csv') AS r
WHERE r.filepath(1) = '2021'
AND cases > 1000

In this example, we are retrieving just three columns from the CSV file (cases, recovered, and iso3), including the values of month and day that are not part of the source files, and limiting the input data set to a single partition with the condition that there are more than 1000 cases. We are also doing transformation of iso3 column and inset the 0 value where the OPENROWSET returns the NULL values. This kind of logic enables you to define advanced predicates and projections at query time, avoiding later data cleaning.

While COPY INTO remains the best and fastest solution for ingesting data into the tables, the OPENROWSET is still very usefull in scenarios where you need to modify data at ingestion time if needed.

Conclusion

The OPENROWSET function is a powerful and flexible tool that can help you implement many data exploration scenarios and improve your data ingestion code.

This function is currently in the preview in Fabric Data Warehouse, and more enhancements will be added soon.

Related blog posts

Fabric OPENROWSET function (Preview)

May 23, 2025 by Ryan Majidimehr

Microsoft Build 2025 delivered a powerful vision for the future of data and AI, with Microsoft Fabric and Power BI at the heart of the story. From AI-powered productivity with Copilot to deep integration with Cosmos DB, this year’s announcements reinforced Microsoft’s commitment to unifying the data experience across roles, tools, and industries. Fabric: The … Continue reading “That’s a wrap for Build 2025!”

May 22, 2025 by Eren Orbey

With AI-powered capabilities in Data Wrangler, you can now do even more to accelerate exploratory analysis and data preparation in Fabric.