Microsoft Fabric Updates Blog

Ingest files into your Fabric Data Warehouse using the OPENROWSET function

Data ingestion is one of the most important actions in the Data Warehouse solutions. In Microsoft Fabric Data Warehouse, the OPENROWSET function provides a powerful and flexible way to read data from files stored in Fabric OneLake or external Azure Storage accounts. Whether you’re working with Parquet, CSV, TSV, or JSONL files, the OPENROWSET function makes it easy to bring your data into your warehouse.

The OPENROWSET function is not just for reading data—when combined with an INSERT statement, it becomes a powerful option for file ingestion.

In this article, we’ll explore how to use OPENROWSET within an INSERT statement to load data, and we’ll highlight the key differences between this approach and the standard COPY INTO statement.

What is OPENROWSET?

OPENROWSET is a versatile function that allows you to query the contents of files stored in a data lake. It returns the file data in a relational, tabular format, making it easy to work with. By simply providing the file’s URI to OPENROWSET, you can retrieve its contents as structured rows and columns:

Reading Parquet file using the OPENROWSET function

OPENROWSET is more than just a file reader—it’s a feature-rich T-SQL function designed for flexibility and easy to use.

Why it stands out

  • Schema inference: Automatically detects column names and data types from the source files.
  • Reading partitioned data: Efficiently query HIVE-partitioned folders and returns the partition values from the folder paths.
  • File metadata extraction: Retrieve file names and paths for auditing and lineage tracking.
  • Supports multiple formats: Parquet, CSV, TSV, JSONL.
  • Customizable format options: Define row and field terminators, specify character encodings, infer or skip header rows, and more.
  • Lightweight transformations: Enables you to apply filters and small modifications of the result set.

This flexibility makes OPENROWSET ideal for ETL pipelines, where you need to extract and transform data before loading it into your warehouse.

Using OPENROWSET to ingest data

In ETL or ELT processes, OPENROWSET represents the Extract (E) phase. When combined with the INSERT statement, it becomes a powerful tool for loading data into Fabric warehouse tables.

Tip!

  • Use COPY INTO for bulk ingestion without transformations (common in ELT).
  • Use OPENROWSET when you need filtering or lightweight transformations during ingestion (common in ETL).

Typically, loading data involves two steps:

  1. Create the destination table – Use a CREATE TABLE statement to explicitly define the columns or use CTAS (Create Table as Select) to infer the column definitions from the source file schema.
  2. Insert the data – Execute an INSERT statement that reads data from OPENROWSET and ingests it into the target table.

Step 1: Create a Destination Table

Before we start data ingestion, we need to create a target table in Fabric Data Warehouse. We can use CTAS (CREATE TABLE AS SELECT) with schema inference to create table based on sample data:

CREATE TABLE [dbo].[bing_covid-19_data] AS
SELECT TOP 0 *
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet' )

We can use the source file intended for ingestion as a template to define the table structure.

  • CTAS automatically infers column names and data types from the source file and uses these column types while creating a table.
  • Using TOP 0 creates an empty table, allowing you to do some schema adjustments before ingestion.

In this example, I will use the schema as-is without any changes.

Step 2: Ingest Data from Parquet Files

Once the table is created, we can ingest the Parquet file. Here’s an example using INSERT and OPENROWSET to load the Bing COVID-19 dataset:

INSERT INTO [dbo].[bing_covid-19_data]
SELECT *
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet')
WHERE country_region <> 'Worldwide'

This statement reads the file content, filters out all rows where country_region is ‘Worldwide’, and inserts the remaining rows into the target table.

Key Points

  • No authentication is needed for public datasets. You don’t need to specify the format or other options if you have standard Parquet format.
  • You can transform values in SELECT or filter rows in the WHERE clause. Use INSERT+OPENROWSET if you need to transform external data before ingestion.
  • If no transformations are applied, this approach is equivalent to COPY INTO. COPY INTO is simpler solution in no-transformation scenario.

Here is the summary of OPENROWSET and COPY INTO differences:

OPENROWSETCOPY INTO
TransformationsYes (during the query execution)No (must be done after load)
Schema inferenceBased on source filesBased on target table
Best forETL with filtering or minor changes1:1 bulk ingestion in ELT
Comparing ingestion scenarios for OPENROWSET and COPY INTO

Conclusion

Let’s recap the scenarios where OPENROWSET is a good choice for data ingestion:

  • You need schema inference.
  • You want to filter or transform data during ingestion.
  • You need to read partitioned data sets.
  • You need auditing capabilities by extracting file names and paths form the source files

In summary, understanding the strengths of both COPY INTO and INSERT … OPENROWSET is key to building efficient ingestion workflows. COPY INTO excels in high-volume, production-grade loads with robust error handling and performance optimizations, while INSERT … OPENROWSET offers flexibility for ad-hoc ingestion, schema inference, and quick data exploration. By combining these approaches, you can select the right tool for each scenario—ensuring your data pipelines are both adaptable and optimized for your specific needs.

To learn more about the ingestion with OPENROWSET function, refer to the documentation.

Gerelateerde blogberichten

Ingest files into your Fabric Data Warehouse using the OPENROWSET function

december 3, 2025 door Pradeep Srikakolapu

Deployment Challenges While Solutions Are in Development Microsoft Fabric has revolutionized data analytics with its unified platform, but deploying complex architectures with cross-dependencies remains a significant challenge for organizations. The good news is that the Microsoft Fabric team is actively working on native warehouse deployment capabilities with DacFx, cross-item dependency resolution, and cross-warehouse reference support. … Continue reading “Bridging the Gap: Automate Warehouse & SQL Endpoint Deployment in Microsoft Fabric”

november 24, 2025 door Jovan Popovic

Co-authored with: Ancy Philip Fabric Data Warehouse and SQL analytics endpoints for mirrored items now support large string and binary data using VARCHAR(MAX) and VARBINARY(MAX) types. Large string or binary values in Data Warehouse In Fabric Data Warehouse, this enhancement allows you to ingest, store, process, and analyze large descriptive text, logs, JSON, or spatial … Continue reading “Large string and binary values in Fabric Data Warehouse and SQL analytics endpoint for mirrored items (Generally Available)”