Microsoft Fabric Updates Blog

Ingest files into your Fabric Data Warehouse using the OPENROWSET function

Data ingestion is one of the most important actions in the Data Warehouse solutions. In Microsoft Fabric Data Warehouse, the OPENROWSET function provides a powerful and flexible way to read data from files stored in Fabric OneLake or external Azure Storage accounts. Whether you’re working with Parquet, CSV, TSV, or JSONL files, the OPENROWSET function makes it easy to bring your data into your warehouse.

The OPENROWSET function is not just for reading data—when combined with an INSERT statement, it becomes a powerful option for file ingestion.

In this article, we’ll explore how to use OPENROWSET within an INSERT statement to load data, and we’ll highlight the key differences between this approach and the standard COPY INTO statement.

What is OPENROWSET?

OPENROWSET is a versatile function that allows you to query the contents of files stored in a data lake. It returns the file data in a relational, tabular format, making it easy to work with. By simply providing the file’s URI to OPENROWSET, you can retrieve its contents as structured rows and columns:

Reading Parquet file using the OPENROWSET function

OPENROWSET is more than just a file reader—it’s a feature-rich T-SQL function designed for flexibility and easy to use.

Why it stands out

  • Schema inference: Automatically detects column names and data types from the source files.
  • Reading partitioned data: Efficiently query HIVE-partitioned folders and returns the partition values from the folder paths.
  • File metadata extraction: Retrieve file names and paths for auditing and lineage tracking.
  • Supports multiple formats: Parquet, CSV, TSV, JSONL.
  • Customizable format options: Define row and field terminators, specify character encodings, infer or skip header rows, and more.
  • Lightweight transformations: Enables you to apply filters and small modifications of the result set.

This flexibility makes OPENROWSET ideal for ETL pipelines, where you need to extract and transform data before loading it into your warehouse.

Using OPENROWSET to ingest data

In ETL or ELT processes, OPENROWSET represents the Extract (E) phase. When combined with the INSERT statement, it becomes a powerful tool for loading data into Fabric warehouse tables.

Tip!

  • Use COPY INTO for bulk ingestion without transformations (common in ELT).
  • Use OPENROWSET when you need filtering or lightweight transformations during ingestion (common in ETL).

Typically, loading data involves two steps:

  1. Create the destination table – Use a CREATE TABLE statement to explicitly define the columns or use CTAS (Create Table as Select) to infer the column definitions from the source file schema.
  2. Insert the data – Execute an INSERT statement that reads data from OPENROWSET and ingests it into the target table.

Step 1: Create a Destination Table

Before we start data ingestion, we need to create a target table in Fabric Data Warehouse. We can use CTAS (CREATE TABLE AS SELECT) with schema inference to create table based on sample data:

CREATE TABLE [dbo].[bing_covid-19_data] AS
SELECT TOP 0 *
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet' )

We can use the source file intended for ingestion as a template to define the table structure.

  • CTAS automatically infers column names and data types from the source file and uses these column types while creating a table.
  • Using TOP 0 creates an empty table, allowing you to do some schema adjustments before ingestion.

In this example, I will use the schema as-is without any changes.

Step 2: Ingest Data from Parquet Files

Once the table is created, we can ingest the Parquet file. Here’s an example using INSERT and OPENROWSET to load the Bing COVID-19 dataset:

INSERT INTO [dbo].[bing_covid-19_data]
SELECT *
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet')
WHERE country_region <> 'Worldwide'

This statement reads the file content, filters out all rows where country_region is ‘Worldwide’, and inserts the remaining rows into the target table.

Key Points

  • No authentication is needed for public datasets. You don’t need to specify the format or other options if you have standard Parquet format.
  • You can transform values in SELECT or filter rows in the WHERE clause. Use INSERT+OPENROWSET if you need to transform external data before ingestion.
  • If no transformations are applied, this approach is equivalent to COPY INTO. COPY INTO is simpler solution in no-transformation scenario.

Here is the summary of OPENROWSET and COPY INTO differences:

OPENROWSETCOPY INTO
TransformationsYes (during the query execution)No (must be done after load)
Schema inferenceBased on source filesBased on target table
Best forETL with filtering or minor changes1:1 bulk ingestion in ELT
Comparing ingestion scenarios for OPENROWSET and COPY INTO

Conclusion

Let’s recap the scenarios where OPENROWSET is a good choice for data ingestion:

  • You need schema inference.
  • You want to filter or transform data during ingestion.
  • You need to read partitioned data sets.
  • You need auditing capabilities by extracting file names and paths form the source files

In summary, understanding the strengths of both COPY INTO and INSERT … OPENROWSET is key to building efficient ingestion workflows. COPY INTO excels in high-volume, production-grade loads with robust error handling and performance optimizations, while INSERT … OPENROWSET offers flexibility for ad-hoc ingestion, schema inference, and quick data exploration. By combining these approaches, you can select the right tool for each scenario—ensuring your data pipelines are both adaptable and optimized for your specific needs.

To learn more about the ingestion with OPENROWSET function, refer to the documentation.

Related blog posts

Ingest files into your Fabric Data Warehouse using the OPENROWSET function

November 5, 2025 by Pradeep Srikakolapu

In our earlier announcement, we shared that newly created data warehouses, lakehouses and other items in Microsoft Fabric would no longer automatically generate default semantic models. This change allows customers to have more control over their modeling experience and to explicitly choose when and how to create semantic models. Starting end of October 2025, Microsoft … Continue reading “Decoupling Default Semantic Models for Existing Items in Microsoft Fabric”

October 29, 2025 by Adam Saxton

This month’s update delivers key advancements across Microsoft Fabric, including enhanced security with Outbound Access Protection and Workspace-Level Private Link, smarter data engineering features like Adaptive Target File Size, and new integrations such as Data Agent in Lakehouse. Together, these improvements streamline workflows and strengthen data governance for users. Contents Events & Announcements Fabric Data … Continue reading “Fabric October 2025 Feature Summary”