User Data Functions now support async functions and pandas DataFrame, Series types
Fabric user data functions empower developers to process and analyze data at scale directly within Microsoft Fabric by writing custom logic in Python. THer are now two new features for User data functions programming model in Microsoft Fabric:
- Support for writing async functions: With an async function you can improve responsiveness and efficiency by handling multiple tasks at once. They are ideal for managing high volumes of I/O-bound operations.
- Support for Pandas DataFrames and Series for input/output types: Fabric user data functions (UDFs) let developers define functions that process batches of input rows as Pandas DataFrames and return results as Pandas arrays or Series, improving speed and performance in large-scale data analysis.
How to enable these features
Explore new features that enhance productivity in Microsoft Fabric’s User data functions.
- Create a new user data function or open an existing one.
- Select Library management.
- Update the
fabric-user-data-functionslibrary to1.0.0.

How to write an async function
Add async keyword with your function definition as shown below in the example. This example function reads a CSV file from a Lakehouse using pandas. Function takes file name as an input parameter.
import pandas as pd
# Replace the alias "<My Lakehouse alias>" with your connection alias.
@udf.connection(argName="myLakehouse", alias="<My Lakehouse alias>")
@udf.function()
async def read_csv_from_lakehouse(myLakehouse: fn.FabricLakehouseClient, csvFileName: str) -> str:
# Connect to the Lakehouse
connection = myLakehouse.connectToFilesAsync()
# Download the CSV file from the Lakehouse
csvFile = connection.get_file_client(csvFileName)
downloadFile = await csvFile.download_file()
csvData = await downloadFile.readall()
# Read the CSV data into a pandas DataFrame
from io import StringIO
df = pd.read_csv(StringIO(csvData.decode('utf-8')))
# Display the DataFrame
result=""
for index, row in df.iterrows():
result=result + "["+ (",".join([str(item) for item in row]))+"]"
# Close the connection
csvFile.close()
connection.close()
return f"CSV file read successfully.{result}"
How to use pandas DataFrames and Series types
Fabric user data functions have better performance by integrating Apache Arrow for handling Pandas data structures. Pandas data within UDFs can be operated using JSON for serialization and deserialization, which, while flexible, introduces overhead, particularly with large datasets. The new Arrow-optimized approach leverages Apache Arrow’s highly efficient columnar memory format and zero-copy mechanisms. This means Pandas DataFrames and Series are now represented using Arrow both when transmitted on the wire and when stored in-memory during UDF execution. This native integration bypasses the costly translation to and from JSON.
Add pandas library to the User data functions item:

Remember to include import pandas as pd in your function_app.py file before getting started. Let’s dive into an example to show how to use pandas DataFrames and Series as input/output types in Fabric user data functions. We’ll create a function that calculates the total revenue earned by each driver.
import pandas as pd
@udf.function()
def total_revenue_by_driver(df: pd.DataFrame) -> pd.Series:
"""
Description: Calculate total revenue earned by each driver. This function sums up all trip fares for each driver to determine their total earnings, useful for driver performance analysis.
Args:
df : pd.DataFrame
Input DataFrame containing trip data with columns:
- 'driver_id': str or int, unique driver
- 'trip_fare': float, fare amount for each trip
Example: Use this example as input to test the function
{
"driver_id": ["D001", "D002", "D001", "D003", "D002"],
"trip_fare": [25.50, 30.00, 22.75, 45.00, 28.50]
}
Returns: pd.Series
Series with driver IDs as index and total revenue as values.
"""
result_series = df.groupby("driver_id")["trip_fare"].sum()
result_series.name = "total_revenue"
return result_series
Publish the changes and then test the function, you should receive the following output:
{"D001":48.25,"D002":58.5,"D003":45}
Invoke functions from a Notebook
With notebooks, you can effortlessly call data processing functions on datasets containing millions of rows. This approach allows you to efficiently aggregate and analyze large volumes of data, such as calculating total revenue by driver, directly within your notebook. Using the same function as shown above, you can test and validate your results on massive datasets, making notebooks a powerful tool for both development and production scenarios.
data= {'driver_id': ['D001', 'D002', 'D001', 'D003', 'D002'],'trip_fare': [25.50, 30.00, 22.75, 45.00, 28.50]}
import pandas as pd
df= pd.DataFrame(data)
# Get functions. Replace
myFunctions = notebookutils.udf.getFunctions('my-user-data-function-name')
# Invoke the function
result= myFunctions. total_revenue_by_driver(df)
# returns a Series object
print(result
Conclusion
These features improve efficiency and performance when working with large datasets containing millions of rows with pandas DataFrame, Series object types support and reduce I/O operations and handle tasks concurrently. Checkout the library in PyPI fabric-user-data-functions·PyPI for latest version updates.
To learn more, refer to the how to user Fabric user data functions SDK documentation.
Get started with free trial today and unlock the full potential of your data with Microsoft Fabric User data functions. Submit your feedback on Fabric Ideas and join the conversation on the Fabric Community.