Transform, Validate and Enrich Data with Python User Data Functions in Your Data Pipelines
In our previous post, we announced the preview of User Data Functions in Microsoft Fabric and how it can empower professionals to unlock new potential in their data development workflows.
Today, we’re excited to continue that journey by diving into two significant updates: Python support and Data Pipelines integration. These new features combined will enhance the flexibility and efficiency of your data processing scenarios. Whether you’re a Data Scientist leveraging the latest Python features or a Data Engineer building seamless pipelines, these updates are designed to elevate your work in Microsoft Fabric.
Let’s explore how!
Python 3.11 support
You can now create functions that use the Python 3.11 runtime in your User Data Functions artifacts. This feature will allow users to to create powerful data applications that leverage the benefits of the Microsoft Fabric platform while using their language of choice.
Getting started is easy, first, create a new User Data Functions artifact and select Python as your desired programming language before creating your new sample function.
You can use the User Data Functions Visual Studio code integration to create, develop and publish your Python functions. To do this, you will need to install Python 3.11 in your local environment. This feature will also allow you to leverage PyPI packages from a set that is included in your functions. You can see all the supported packages in the requirements.txt
file included in your Python project.
Python User Data Functions are also compatible with the rich connectivity story provided by the Fabric platform, leveraging the connectors to Data Warehouses, Lakehouses, Azure SQL Databases and more, using the Manage Connections feature.
Data Pipelines integration
Data pipelines in Fabric provide a simple interface to create and manage large data processing tasks by using Activities. Activities are the fundamental object that represents each step of a data processing task. Users can leverage several interconnected Activities to create large, elaborate data processing solutions.
User Data Functions is now available as an Activity, allowing users to create custom code processing steps for their Data pipelines. You can find them by going to Activities and selecting the Functions activity. After the Functions Activity is inserted in the Data pipeline, you will see the option to use Fabric User Data Functions in the settings tab.
In the Functions Activity, you can select the following configuration settings:
- The Connection that will be used to run the User Data Function.
- The Workspace that contains the User Data Functions you’d like to use.
- The User Data Functions artifact that contains the individual Functions you’d like to invoke.
- The individual Function that you’d like to run in this pipeline activity.
- The parameters that you’d like to pass to this function.
The parameters can be either static values, as specified in the Settings tab, or dynamic content that comes from another Data pipeline Activity, such as a For Each activity.
To pass dynamic content as a parameter, you need to use the Pipeline expression builder, which can take data from previous activities into your Functions Activity. This will be quite useful to create data pipelines that use User Data Functions as a processing step.
And that’s it! Feel free to reach out to us if you have any questions or feedback. You can still apply to participate in the preview of User Data Functions and start using these new features.