Microsoft Fabric Updates Blog

Transform, Validate and Enrich Data with Python User Data Functions in Your Data Pipelines

Pictured: A very colorful Python illustration generated by Copilot

In our previous post, we announced the preview of User Data Functions in Microsoft Fabric and how it can empower professionals to unlock new potential in their data development workflows.

Today, we’re excited to continue that journey by diving into two significant updates: Python support and Data Pipelines integration. These new features combined will enhance the flexibility and efficiency of your data processing scenarios. Whether you’re a Data Scientist leveraging the latest Python features or a Data Engineer building seamless pipelines, these updates are designed to elevate your work in Microsoft Fabric.

Let’s explore how!

Python 3.11 support

You can now create functions that use the Python 3.11 runtime in your User Data Functions artifacts. This feature will allow users to to create powerful data applications that leverage the benefits of the Microsoft Fabric platform while using their language of choice.

Getting started is easy, first, create a new User Data Functions artifact and select Python as your desired programming language before creating your new sample function.

Pictured: Creating a new User Data Functions artifact via the Functions Hub and selecting Python

You can use the User Data Functions Visual Studio code integration to create, develop and publish your Python functions. To do this, you will need to install Python 3.11 in your local environment. This feature will also allow you to leverage PyPI packages from a set that is included in your functions. You can see all the supported packages in the requirements.txt file included in your Python project.

Pictured: Running a Python function by providing the required parameters.

Python User Data Functions are also compatible with the rich connectivity story provided by the Fabric platform, leveraging the connectors to Data Warehouses, Lakehouses, Azure SQL Databases and more, using the Manage Connections feature.

Data Pipelines integration

Data pipelines in Fabric provide a simple interface to create and manage large data processing tasks by using Activities. Activities are the fundamental object that represents each step of a data processing task. Users can leverage several interconnected Activities to create large, elaborate data processing solutions.

User Data Functions is now available as an Activity, allowing users to create custom code processing steps for their Data pipelines. You can find them by going to Activities and selecting the Functions activity. After the Functions Activity is inserted in the Data pipeline, you will see the option to use Fabric User Data Functions in the settings tab.

Pictured: Creating a new Activity and selecting Functions. Then, specifying a Fabric User Data Functions.

In the Functions Activity, you can select the following configuration settings:

  1. The Connection that will be used to run the User Data Function.
  2. The Workspace that contains the User Data Functions you’d like to use.
  3. The User Data Functions artifact that contains the individual Functions you’d like to invoke.
  4. The individual Function that you’d like to run in this pipeline activity.
  5. The parameters that you’d like to pass to this function.

The parameters can be either static values, as specified in the Settings tab, or dynamic content that comes from another Data pipeline Activity, such as a For Each activity.

Pictured: Setting the configuration values of a Functions Activity in Data Pipelines.

To pass dynamic content as a parameter, you need to use the Pipeline expression builder, which can take data from previous activities into your Functions Activity. This will be quite useful to create data pipelines that use User Data Functions as a processing step.

And that’s it! Feel free to reach out to us if you have any questions or feedback. You can still apply to participate in the preview of User Data Functions and start using these new features.

Resources

Related blog posts

Transform, Validate and Enrich Data with Python User Data Functions in Your Data Pipelines

October 17, 2024 by Gabi Lehner

Microsoft Fabric’s new Real-Time Dashboard permissions feature brings granular control to how users interact with Real-Time Analytics. With the introduction of separate permissions for dashboards and underlying data, administrators now have the flexibility to allow users to view dashboards without giving access to the raw data. This separation is key for organizations that need to … Continue reading “Real-Time Dashboards and underlying KQL databases access separation (preview)”

October 15, 2024 by Yael Biss

Extending Microsoft Purview’s Data Loss Prevention (DLP) policies into Fabric lakehouses is now in public preview! This follows the success of DLP for Power BI and the GA of Microsoft Fabric last year. DLP policies help you automatically detect sensitive information as it is uploaded into lakehouses in your Fabric tenant and take risk remediation … Continue reading “Announcement: Microsoft Purview Data Loss Prevention policies have been extended to Fabric lakehouses”