BULK INSERT in Fabric Data Warehouse
We are excited to announce that the BULK INSERT statement in available in Fabric Data Warehouse. The BULK INSERT statement allows you to load CSV files from Azure Data Lake Storage and Azure Blob Storage.
Refer to the example of the traditional BULK INSERT statement commonly used by SQL Server or Azure SQL users to import files from the external storage:
BULK INSERT target_table
FROM 'https://storage.dfs.core.windows.net/path/to/file.csv'
WITH (
CODEPAGE = 'ACP', -- The source file uses the ANSI code page
DATAFILETYPE = 'char', -- The source file is in character format
ROWTERMINATOR = '\n', -- A newline character terminates each row
FIELDTERMINATOR = ',', -- Each field is separated by a comma
FORMAT = 'CSV', -- The source file is RFC 4180 compliant CSV file
FIELDQUOTE = '"', -- The " character encapsulates fields in the CSV
FIRSTROW = 10, -- The first row to be read
LASTROW = 100, -- The last row to be read
FORMATFILE = 'fmtfile.xml', -- Specifies the format file
MAXERRORS = 10 -- The maximum number of errors allowed
)
If you are already using the BULK INSERT statement in your SQL Server or Azure SQL Database, you can easily reuse your existing code in Fabric Data Warehouse.
Note: While Fabric DW supports the BULK INSERT statement to facilitate the reuse of existing code and options without modifications, we recommend using the COPY INTO statement for data ingestion as it is the preferred method in Fabric Data Warehouse in order to leverage some advanced loading options.
In addition to the traditional options available in SQL Server, the BULK INSERT statement in Fabric Data Warehouse offers some of the benefits found in the COPY INTO statement. One notable feature is the support for wildcards in the file path, which allows you to reference multiple files easily.
Differences compared to COPY INTO
While the COPY INTO statement offers standard ingestion options, the BULK INSERT statement provides additional traditional ingestion options that are unique to SQL Server:
- CODEPAGE: This option allows you to load textual files encoded using national encoding.
- DATAFILETYPE: This option enables you to read single- or double-byte file formats or native formats produced by SQL Server on-premises.
- FORMATFILE: This option allows you to define column and type mappings in a separate textual or XML format file.
These options are particularly useful when exporting data from your SQL Server or Azure SQL source system using the bcp.exe tool, which produces format files or exports data in native format.
Unsupported options in BULK INSERT
There are several options that are not supported by BULK INSERT in Fabric Data Warehouse:
- ERRORFILE: This option is not supported by BULK INSERT. For error handling, we recommend using the more modern COPY INTO statement.
- External Data Sources: Due to the absence of external data sources in Fabric Data Warehouse, the following options are not supported: DATA_SOURCE, FORMATFILE_DATA_SOURCE, and ERRORFILE_DATA_SOURCE.
- Ineffective Options: Some options in BULK INSERT are enabled but have no effect in Fabric Data Warehouse, these include KEEPIDENTITY, FIRE_TRIGGERS, CHECK_CONSTRAINTS, TABLOCK, ORDER, ROWS_PER_BATCH, KILOBYTES_PER_BATCH, and BATCHSIZE. While they will not cause syntax errors, they will not have impact on the ingestion process.
Performance
The BULK INSERT statement in Fabric Data Warehouse uses the same underlying code as the COPY INTO statement, so you can expect similar performance for data ingestion.
To demonstrate the performance, we used the Contoso sample data set retrieved from the open-source SQL-BI GitHub account. The SQL-BI GitHub account offers various pre-generated versions of the Contoso database based on the number of orders as a scale factor for generating data.
In this experiment, we used sales.csv files from the Contoso data set, generated at scales of 100k, 1m, 10m, and 100m. These files were uploaded to the storage account in the same region as the Fabric workspace, and the following statements were executed multiple times:
BULK INSERT sales
FROM 'https://<storage>.dfs.core.windows.net/csv/100m/sales.csv';
BULK INSERT sales
FROM 'https://<storage>.dfs.core.windows.net/csv/10m/sales.csv';
BULK INSERT sales
FROM 'https://<storage>.dfs.core.windows.net/csv/1m/sales.csv';
BULK INSERT sales
FROM 'https://<storage>.dfs.core.windows.net/csv/100k/sales.csv';
The results of executing these statements are presented in the following table:
| Source file | Duration | File size | Rows imported | CPU time |
| 100m/sales.csv | 59sec | 17.1 GB | 211.875.108 | 916 |
| 10m/sales.csv | 9sec | 1.7 GB | 21.170.416 | 106 |
| 1m/sales.csv | 7sec | 170 MB | 2.098.633 | 14 |
| 100k/sales.csv | <1sec | 16MB | 199.873 | 0.9 |
The results were retrieved from the built-in queryinsights view in Fabric Data Warehouse, which shows the total duration of the statement, the number of rows affected, and the CPU time. As you can see, BULK INSERT is one of the most efficient ways to import large amounts of rows from CSV files into your Fabric Data Warehouse.
Conclusion
The BULK INSERT statement in Fabric Data Warehouse allows you to easily migrate your existing code and third-party tools that use the BULK INSERT statement without the need to rewrite or replace the code or components. In addition to facilitating easy migration, it also provides optimal ingestion performance that is aligned with the COPY INTO statement and is more performant than other ingestion mechanisms.