Microsoft Fabric Updates Blog

IDENTITY Columns in Fabric Data Warehouse (Preview)

IDENTITY columns in Fabric Data Warehouse are a feature shaped by direct customer feedback and requests. Across the data community, the need for simple, automatic key generation has been a recurring theme. We are closing that gap, providing you with a tool to streamline your workflow and accelerate analytics capabilities.

What Are IDENTITY Columns?

Unlike natural keys, surrogate keys have no business meaning – they exist solely to ensure row uniqueness and to allow relationships between tables when a natural key is not available. IDENTITY columns automatically generate unique numeric values for new rows in a table, removing the need to manually assign surrogate keys or build custom logic to populate those columns. For example, in a table called ‘Orders’, you could define an ‘OrderID’ column as an IDENTITY column, and each time a new record is added ‘OrderID’ will automatically be assigned with a new unique value.

Using IDENTITY columns in Fabric Data Warehouse provides a significant advantage over manual mechanisms for generating surrogate keys. Unlike approaches that rely on custom logic, such as computing MAX(ID)+1, using ROW_NUMBER(), generating hashes or GUIDs, IDENTITY columns offer a system-managed, automatic process for creating unique keys during data ingestion.

This eliminates the need for additional ETL steps or application-side scripts, reducing complexity and operational overhead. By delegating key generation to the Warehouse engine, IDENTITY columns ensure data integrity and key uniqueness at scale, even in a distributed architecture where parallel ingestion occurs. It also eliminates the risk of key duplication errors that can arise from concurrent loads when using other approaches, without impacting data ingestion throughout.

How Do IDENTITY Columns Work in Fabric Data Warehouse?

While IDENTITY columns in Fabric Data Warehouse are modeled after SQL Server, there are fundamental differences arising from the massively parallel, distributed architecture in the Warehouse engine. In SQL Server, data ingestion happens within a single instance. Fabric Data Warehouse’s implementation of IDENTITY columns is designed for a distributed architecture, which means values are generated across multiple backend nodes in parallel during ingestion.

To maximize scalability and performance, each node receives a range of unique values to allocate to rows it will process for ingestion. However, because ranges are assigned independently, the sequence of values is not guaranteed to be continuous or ordered, and gaps may occur in between ranges. As an example, two concurrent ingestion jobs might consume different ranges for the IDENTITY column, leaving unused values between them. Despite these gaps, uniqueness is always guaranteed for the lifetime of the table.

Why does this design matter?

Fabric Data Warehouse implements IDENTITY columns with a distributed, massively parallel architecture, unlike the single-node approach of SQL Server. This design allows multiple compute nodes to generate surrogate keys concurrently without contention, ensuring scalability and high ingestion throughput for cloud-scale workloads. While this model sacrifices strict sequential ordering and reseeding options found in traditional systems, it delivers the performance and reliability required for modern analytics. For scenarios demanding ordered sequences, alternative strategies may still be needed, but for most analytics use cases, Fabric Data Warehouse’s IDENTITY implementation is optimized for efficiency and massive parallelism.

Example

The following table is a subset of the NY Taxi trip data sample. We will add a new column to the table definition, which will be an IDENTITY column:

CREATE TABLE dbo.TripData (
     tripID BIGINT IDENTITY,
     vendorID VARCHAR(50),
     tpepPickupDateTime DATETIME2(6),
     tpepDropoffDateTime DATETIME2(6),
     passengerCount INT,
     tripDistance FLOAT
);

Next, we will ingest data into the new TripData table. Notice that COPY INTO uses a column list to avoid attempting to ingest data into the IDENTITY column:

COPY INTO dbo.TripData (vendorID 1, tpepPickupDateTime 2, tpepDropoffDateTime 3, passengerCount 4, tripDistance 5)
FROM 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=2013/'
WITH (
     FILE_TYPE = 'PARQUET'
)

Finally, let’s have a look at the data:

SELECT TOP 10 *
FROM TripData
ORDER BY tripID

We can observe that even though a value was not passed to the tripID column, a unique value was assigned to each new row:
Table showing taxi trip data with columns for trip ID, vendor ID, pickup and dropoff timestamps, passenger count, and trip distance. All visible rows share the same trip ID and vendor ID ('VTS'), with varying times, passenger counts from 1 to 6, and trip distances ranging from 0.00 to 7.04 miles. The tripID column is highlighted by a red rectangle around it
Notice that the value is an exceptionally large number. In Fabric Data Warehouse, IDENTITY columns are of type BIGINT to ensure we can produce a unique value every time, no matter how many ingestion tasks are running in parallel.

Can I run out of IDENTITY values?

As discussed, IDENTITY column will produce a unique integer of type BIGINT. In T-SQL, the BIGINT data type accommodates 8 bytes, with a range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. However, IDENTITY columns in Fabric Data Warehouse will produce only positive values. This means you still have about 9.22 quintillion distinct values (263 to be exact) that IDENTITY can assign to your column, making the range limitless in practice for any modern dataset. Once a value is used on a table, it will never be reused by IDENTITY again on the same table.

Is IDENTITY_INSERT supported?

During Preview IDENTITY_INSERT is not supported, but we are looking into enabling user ingestion of negative values into IDENTITY columns in a future release. This approach is designed for scenarios where a ‘missing member’ placeholder is required in dimension tables, allowing you to explicitly insert a negative surrogate key while preserving the integrity of the automatically generated positive identity sequence, offering flexibility for specialized cases without compromising the distributed identity allocation model.

Ready to get started?

Try defining IDENTITY columns today and experience seamless key generation firsthand. Explore our tutorial Using IDENTITY Columns in T-SQL to Create Surrogate Keys in Fabric Data Warehouse and the documentation for more details.

Your feedback is important to us. As this feature continues to evolve in Preview, we invite you to share your experiences, report any issues, and suggest improvements on Ideas.

Help us make Fabric Data Warehouse even better for everyone!

Related blog posts

IDENTITY Columns in Fabric Data Warehouse (Preview)

December 18, 2025 by Jovan Popovic

Unlock Flexible Time-Based Reporting with DATE_BUCKET() in Microsoft Fabric DW! Microsoft Fabric Data Warehouse continues to evolve with powerful features that make analytics easier and more adaptable. One of the latest additions is the DATE_BUCKET() function—a game-changer for time-based reporting.

December 16, 2025 by Alex Powers

As 2025 ends, we’re taking a moment to reflect on Microsoft Fabric’s second year in the market and the collective progress made alongside our community, customers, and partners. What began as a unified vision for data and AI has grown into a platform adopted by more than 28,000 organizations worldwide, anchored by OneLake and shaped … Continue reading “Microsoft Fabric 2025 holiday recap: Unified Data and AI Innovation”