Microsoft Fabric Updates Blog

Use Fabric User Data Functions with Pandas DataFrames and Series in Notebooks

We’ve made a major enhancement to the Notebook Integration with Fabric User Data Functions (UDFs)—you can now use Pandas DataFrames and Series as input and output types, powered by native integration with Apache Arrow!

This enhancement brings higher performance, improved efficiency, and better scalability to your Fabric Notebooks—enabling seamless function reuse for large-scale data processing in Python, PySpark, Scala, and R.

Recap: Notebooks Integration with Fabric UDFs (Preview)

As part of our initial preview, we introduced the ability to:

  • Invoke shared UDFs directly from NotebookUtils.
  • Use IntelliSense/autocomplete to find and call functions more easily.
  • Explore function signatures and metadata using display(myFunction.functionDetails).
  • Call UDFs in Python, PySpark, Scala, and R for streamlined, reusable logic across your notebooks.

This helped teams modularize logic, reduce redundancy, and improve productivity across collaborative data science and engineering projects.

What’s New: Pandas Support via Apache Arrow

In this update, Pandas DataFrames and Series are now supported as first-class input and output types for UDFs—enabled by deep integration with Apache Arrow, a highly efficient columnar memory format optimized for analytics workloads.

Benefits of the Arrow Integration:

  • High-performance serialization: Skip costly JSON encoding/decoding.
  • Zero-copy data sharing: Minimize overhead during UDF execution.
  • Scalable: Work with millions of rows in memory with ease.
  • Seamless compatibility with your existing Pandas logic.

Instead of manually converting large datasets to JSON, developers can now natively pass Pandas DataFrames to UDFs, operate on them efficiently, and return processed results—all with minimal latency and memory overhead.

Real-World Example: Revenue Aggregation by Driver

Let’s say you want to aggregate total revenue by driver across a dataset with millions of rows. Now, you can pass a Pandas DataFrame into a shared UDF and perform that operation directly:

Sample Code: Invoking Arrow-Enabled UDFs

PySpark / Python

# Get the function
agg_func = notebookutils.udf.getFunctions("AggregateRevenueByDriver")

# Sample input as Pandas DataFrame
import pandas as pd
df = pd.DataFrame({
    "driver_id": [1, 2, 1],
    "revenue": [100.0, 150.0, 200.0]
})

# Call UDF with DataFrame input and receive DataFrame output
result_df = agg_func.aggregate(df)

# Display result
print(result_df)

Scala

val aggFunc = notebookutils.udf.getFunctions("AggregateRevenueByDriver")

// Sample input
val input = Seq(
  (1, 100.0),
  (2, 150.0),
  (1, 200.0)
).toDF("driver_id", "revenue")

// Call UDF and get DataFrame output
val result = aggFunc.aggregate(input)

// Show result
result.show()

R

agg_func <- notebookutils.udf.getFunctions("AggregateRevenueByDriver")

# Sample input
df <- data.frame(
  driver_id = c(1, 2, 1),
  revenue = c(100.0, 150.0, 200.0)
)

# Call the UDF
result <- agg_func$aggregate(df)

# View result
print(result)

Use Case Highlights

With this Arrow-powered enhancement, you can:

  • Run fast, interactive analysis on large-scale datasets.
  • Simplify cross-team collaboration by sharing tested UDFs across notebooks.
  • Accelerate development-to-production workflows for real-time metrics, feature engineering, and aggregation tasks.

Try the new UDF functionality today by using NotebookUtils in your Fabric Notebook. Start by registering a Pandas-compatible UDF, then pass in your DataFrames and enjoy lightning-fast results with Apache Arrow under the hood.

Get Started

For more information, refer to the NotebookUtils for Fabric documentation.

Related Blog Posts

Related blog posts

Use Fabric User Data Functions with Pandas DataFrames and Series in Notebooks

February 17, 2026 by Virginia Roman

We’re introducing billing reporting updates that make it easier to track AI-related usage in Microsoft Fabric. New AI Functions operation Until now, Fabric AI functions usage was reported under other operations, such as Spark-related operations, or Dataflows Gen2-related operations, depending on where the functions were used. To provide more transparency, Fabric AI functions will have … Continue reading “Billing updates: new operations for Fabric AI functions and AI services”

February 3, 2026 by Arun Ulagaratchagan

Data teams today are under extraordinary pressure. Expectations around analytics and AI have never been higher, yet enterprise data continues to live across a patchwork of systems, tools, and platforms. The result is friction, duplication, and complexity, making it harder for data teams to provide a unified, real-time view of their business. Microsoft and Snowflake … Continue reading “Microsoft OneLake and Snowflake interoperability (Generally Available)”