Ingest files into your Fabric Data Warehouse using the OPENROWSET function
Data ingestion is one of the most important actions in the Data Warehouse solutions. In Microsoft Fabric Data Warehouse, the OPENROWSET function provides a powerful and flexible way to read data from files stored in Fabric OneLake or external Azure Storage accounts. Whether you’re working with Parquet, CSV, TSV, or JSONL files, the OPENROWSET function makes it easy to bring your data into your warehouse.
The OPENROWSET function is not just for reading data—when combined with an INSERT statement, it becomes a powerful option for file ingestion.
In this article, we’ll explore how to use OPENROWSET within an INSERT statement to load data, and we’ll highlight the key differences between this approach and the standard COPY INTO statement.
What is OPENROWSET?
OPENROWSET is a versatile function that allows you to query the contents of files stored in a data lake. It returns the file data in a relational, tabular format, making it easy to work with. By simply providing the file’s URI to OPENROWSET, you can retrieve its contents as structured rows and columns:

OPENROWSET is more than just a file reader—it’s a feature-rich T-SQL function designed for flexibility and easy to use.
Why it stands out
- Schema inference: Automatically detects column names and data types from the source files.
- Reading partitioned data: Efficiently query HIVE-partitioned folders and returns the partition values from the folder paths.
- File metadata extraction: Retrieve file names and paths for auditing and lineage tracking.
- Supports multiple formats: Parquet, CSV, TSV, JSONL.
- Customizable format options: Define row and field terminators, specify character encodings, infer or skip header rows, and more.
- Lightweight transformations: Enables you to apply filters and small modifications of the result set.
This flexibility makes OPENROWSET ideal for ETL pipelines, where you need to extract and transform data before loading it into your warehouse.
Using OPENROWSET to ingest data
In ETL or ELT processes, OPENROWSET represents the Extract (E) phase. When combined with the INSERT statement, it becomes a powerful tool for loading data into Fabric warehouse tables.
Tip!
- Use COPY INTO for bulk ingestion without transformations (common in ELT).
- Use OPENROWSET when you need filtering or lightweight transformations during ingestion (common in ETL).
Typically, loading data involves two steps:
- Create the destination table – Use a CREATE TABLE statement to explicitly define the columns or use CTAS (Create Table as Select) to infer the column definitions from the source file schema.
- Insert the data – Execute an INSERT statement that reads data from OPENROWSET and ingests it into the target table.
Step 1: Create a Destination Table
Before we start data ingestion, we need to create a target table in Fabric Data Warehouse. We can use CTAS (CREATE TABLE AS SELECT) with schema inference to create table based on sample data:
CREATE TABLE [dbo].[bing_covid-19_data] AS SELECT TOP 0 * FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet' )
We can use the source file intended for ingestion as a template to define the table structure.
- CTAS automatically infers column names and data types from the source file and uses these column types while creating a table.
- Using TOP 0 creates an empty table, allowing you to do some schema adjustments before ingestion.
In this example, I will use the schema as-is without any changes.
Step 2: Ingest Data from Parquet Files
Once the table is created, we can ingest the Parquet file. Here’s an example using INSERT and OPENROWSET to load the Bing COVID-19 dataset:
INSERT INTO [dbo].[bing_covid-19_data] SELECT * FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet') WHERE country_region <> 'Worldwide'
This statement reads the file content, filters out all rows where country_region is ‘Worldwide’, and inserts the remaining rows into the target table.
Key Points
- No authentication is needed for public datasets. You don’t need to specify the format or other options if you have standard Parquet format.
- You can transform values in SELECT or filter rows in the WHERE clause. Use INSERT+OPENROWSET if you need to transform external data before ingestion.
- If no transformations are applied, this approach is equivalent to COPY INTO. COPY INTO is simpler solution in no-transformation scenario.
Here is the summary of OPENROWSET and COPY INTO differences:
| OPENROWSET | COPY INTO | |
| Transformations | Yes (during the query execution) | No (must be done after load) |
| Schema inference | Based on source files | Based on target table |
| Best for | ETL with filtering or minor changes | 1:1 bulk ingestion in ELT |
Conclusion
Let’s recap the scenarios where OPENROWSET is a good choice for data ingestion:
- You need schema inference.
- You want to filter or transform data during ingestion.
- You need to read partitioned data sets.
- You need auditing capabilities by extracting file names and paths form the source files
In summary, understanding the strengths of both COPY INTO and INSERT … OPENROWSET is key to building efficient ingestion workflows. COPY INTO excels in high-volume, production-grade loads with robust error handling and performance optimizations, while INSERT … OPENROWSET offers flexibility for ad-hoc ingestion, schema inference, and quick data exploration. By combining these approaches, you can select the right tool for each scenario—ensuring your data pipelines are both adaptable and optimized for your specific needs.
To learn more about the ingestion with OPENROWSET function, refer to the documentation.