Microsoft Fabric Updates Blog

Use Fabric User Data Functions with Pandas DataFrames and Series in Notebooks

We’ve made a major enhancement to the Notebook Integration with Fabric User Data Functions (UDFs)—you can now use Pandas DataFrames and Series as input and output types, powered by native integration with Apache Arrow!

This enhancement brings higher performance, improved efficiency, and better scalability to your Fabric Notebooks—enabling seamless function reuse for large-scale data processing in Python, PySpark, Scala, and R.

Recap: Notebooks Integration with Fabric UDFs (Preview)

As part of our initial preview, we introduced the ability to:

  • Invoke shared UDFs directly from NotebookUtils.
  • Use IntelliSense/autocomplete to find and call functions more easily.
  • Explore function signatures and metadata using display(myFunction.functionDetails).
  • Call UDFs in Python, PySpark, Scala, and R for streamlined, reusable logic across your notebooks.

This helped teams modularize logic, reduce redundancy, and improve productivity across collaborative data science and engineering projects.

What’s New: Pandas Support via Apache Arrow

In this update, Pandas DataFrames and Series are now supported as first-class input and output types for UDFs—enabled by deep integration with Apache Arrow, a highly efficient columnar memory format optimized for analytics workloads.

Benefits of the Arrow Integration:

  • High-performance serialization: Skip costly JSON encoding/decoding.
  • Zero-copy data sharing: Minimize overhead during UDF execution.
  • Scalable: Work with millions of rows in memory with ease.
  • Seamless compatibility with your existing Pandas logic.

Instead of manually converting large datasets to JSON, developers can now natively pass Pandas DataFrames to UDFs, operate on them efficiently, and return processed results—all with minimal latency and memory overhead.

Real-World Example: Revenue Aggregation by Driver

Let’s say you want to aggregate total revenue by driver across a dataset with millions of rows. Now, you can pass a Pandas DataFrame into a shared UDF and perform that operation directly:

Sample Code: Invoking Arrow-Enabled UDFs

PySpark / Python

# Get the function
agg_func = notebookutils.udf.getFunctions("AggregateRevenueByDriver")

# Sample input as Pandas DataFrame
import pandas as pd
df = pd.DataFrame({
    "driver_id": [1, 2, 1],
    "revenue": [100.0, 150.0, 200.0]
})

# Call UDF with DataFrame input and receive DataFrame output
result_df = agg_func.aggregate(df)

# Display result
print(result_df)

Scala

val aggFunc = notebookutils.udf.getFunctions("AggregateRevenueByDriver")

// Sample input
val input = Seq(
  (1, 100.0),
  (2, 150.0),
  (1, 200.0)
).toDF("driver_id", "revenue")

// Call UDF and get DataFrame output
val result = aggFunc.aggregate(input)

// Show result
result.show()

R

agg_func <- notebookutils.udf.getFunctions("AggregateRevenueByDriver")

# Sample input
df <- data.frame(
  driver_id = c(1, 2, 1),
  revenue = c(100.0, 150.0, 200.0)
)

# Call the UDF
result <- agg_func$aggregate(df)

# View result
print(result)

Use Case Highlights

With this Arrow-powered enhancement, you can:

  • Run fast, interactive analysis on large-scale datasets.
  • Simplify cross-team collaboration by sharing tested UDFs across notebooks.
  • Accelerate development-to-production workflows for real-time metrics, feature engineering, and aggregation tasks.

Try the new UDF functionality today by using NotebookUtils in your Fabric Notebook. Start by registering a Pandas-compatible UDF, then pass in your DataFrames and enjoy lightning-fast results with Apache Arrow under the hood.

Get Started

For more information, refer to the NotebookUtils for Fabric documentation.

Related Blog Posts

Related blog posts

Use Fabric User Data Functions with Pandas DataFrames and Series in Notebooks

November 10, 2025 by Arun Ulagaratchagan

SQL is having its moment. From on-premises data centers to Azure Cloud Services to Microsoft Fabric, SQL has evolved into something far more powerful than many realize and it deserves the focused attention of a big stage.  That’s why I’m thrilled to announce SQLCon, a dedicated conference for database developers, database administrators, and database engineers. Co-located with FabCon for an unprecedented week of deep technical content … Continue reading “It’s Time! Announcing The Microsoft SQL Community Conference”

November 3, 2025 by Arshad Ali

Additional authors – Madhu Bhowal, Ashit Gosalia, Aniket Adnaik, Kevin Cheung, Sarah Battersby, Michael Park Esri is recognized as the global market leader in geographic information system (GIS) technology, location intelligence, and mapping, primarily through its flagship software, ArcGIS. Esri empowers businesses, governments, and communities to tackle the world’s most pressing challenges through spatial analysis. … Continue reading “ArcGIS GeoAnalytics for Microsoft Fabric Spark (Generally Available)”