OPENROWSET function in Fabric Data Warehouse (Generally Available)
The OPENROWSET function is generally available in Fabric Data Warehouse and Fabric SQL endpoints for Lakehouse and Mirrored databases. The OPENROWSET function enables you to easily read Parquet and CSV files stored in Azure Data Lake Storage and Azure Blob Storage:
SELECT *
FROM OPENROWSET(
BULK 'https://<storage>.blob.core.windows.net/container/file.parquet'
)
With the OPENROWSET function, you can easily browse files before loading them into the Fabric Data Warehouse, allowing you to inspect the schema before creating the target table. The OPENROWSET function supports most of the options available in SQL Server, Azure SQL, and Synapse, facilitating seamless migration and code reuse between these platforms.
In this post, we will explore some data analytics scenarios where you might find this function very useful.
Reading data from the custom folder structures
The OPENROWSET function can reference URI patterns by using *
(wildcard) and /**
(recursive child wildcard) that can match multiple source files with the same pattern or return all files that are recursively placed under the URI. You can use these wildcards to find all file paths and number of rows in them, like in the following example:

With the recursive wildcards you can explore folder structures with arbitrary folder depths and discover the files that you need to ingest.
Reading partitioned data sets
The OPENROWSET function can retrieve the values from the referenced URI and return them with the other columns. This is crucial if you are reading data from a hive-style partition structure in the format /partiition1=*/partition2=*/… and you need to return the partition values matched by the wildcards in the result set.
In the following example you can see how to retrieve year and month values from the referenced URI path:

The filepath(index)
function returns the value matched by the wildcard on the position index
in the URI pattern. For example, for the folder path /puYear=2015/puMonth=4, the filepath(1)
will return ‘2015’, and filepath(2)
will return ‘4’.
This function enables you to treat the matched values from the folder path as any other column in the file. Also, using these functions in the WHERE
predicate you can eliminate the entire partition without reading the files in the folders that are not needed in the query, which might boost the performance of your queries.
Working with Parquet complex types
The OPENROWSET function enables you to read the complex types (such as struct, array, and map) from the Parquet files. The values in complex type columns are returned as JSON text for easier manipulation and analysis.

In this example, the user column that persisted as a Parquet complex type. The values from this column are returned as JSON-formatted text containing all inner properties from the complex type of column.
With the JSON functions that are available in Fabric Data Warehouse (JSON_VALUE, OPENJSON, etc.) you can easily parse these values and retrieve individual properties or elements within the complex type columns.
Customizing result set scheme
The OPENROWSET function allows you to map the result set columns to the source columns and define the optimal column types for all columns using the WITH clause.
In the WITH clause you can specify what are the columns and types from the underlying files that you want to return in the result set.

In addition to more precise types, you can directly specify the column that should reference properties in the complex type columns without the need to use the JSON functions.
Ingesting data with CTAS or INSERT SELECT statements
The OPENROWSET function enables you to ingest data using Create Table As Select (CTAS) or INSERT SELECT statements by using the OPENROWSET as a source and allowing you to modify the values from the source values at ingestion time.
CREATE TABLE GitHubIssues AS
SELECT id, title, body,
ISNULL(state, 'open') as "state",
CAST(JSON_VALUE([user], '$.login') AS VARCHAR(32)) as "login_name",
CAST(JSON_VALUE([user], '$.type') AS VARCHAR(10)) as "type"
FROM OPENROWSET(
BULK 'https://*****.dfs.core.windows.net/samples/github-issues.parquet')
This is crucial in scenario where you need to modify source data that is not in expected format. In the example above, we are setting the default value for the state if it is not defined in the source Parquet files, reading a parquet file with complex type column, and CTAS statement enables you to extract individual properties of the complex column and load them as individual columns with the types specified in the CAST function. You can use any supported T-SQL function to transform the source data set into the target table.
Conclusion
The OPENROWSET function will significantly improve your data exploration and ingestion experience by enabling you to browse files and transform data during ingestion. It also facilitates easier migrations between Synapse, SQL Server, and Fabric Data Warehouse because the same syntax is available everywhere.
This powerful functionality ensures that you can handle complex data types, manage partitioned data efficiently, and customize your result set schema to meet your specific requirements, all while maintaining compatibility with existing SQL options.
We are continuously improving Fabric Data Warehouse and accepted ideas. If you have some idea how to improve the OPENROWSET function feel free to describe the functionality that might be useful for your scenario using the New Idea – Microsoft Fabric Community form or vote for some of the existing ideas such as Add native OPENROWSET(JSON) support or Enable OPENROWSET to read files from One Lake.