Microsoft Fabric Updates Blog

Transform, Validate and Enrich Data with Python User Data Functions in Your Data Pipelines

Pictured: A very colorful Python illustration generated by Copilot

In our previous post, we announced the preview of User Data Functions in Microsoft Fabric and how it can empower professionals to unlock new potential in their data development workflows.

Today, we’re excited to continue that journey by diving into two significant updates: Python support and Data Pipelines integration. These new features combined will enhance the flexibility and efficiency of your data processing scenarios. Whether you’re a Data Scientist leveraging the latest Python features or a Data Engineer building seamless pipelines, these updates are designed to elevate your work in Microsoft Fabric.

Let’s explore how!

Python 3.11 support

You can now create functions that use the Python 3.11 runtime in your User Data Functions artifacts. This feature will allow users to to create powerful data applications that leverage the benefits of the Microsoft Fabric platform while using their language of choice.

Getting started is easy, first, create a new User Data Functions artifact and select Python as your desired programming language before creating your new sample function.

Pictured: Creating a new User Data Functions artifact via the Functions Hub and selecting Python

You can use the User Data Functions Visual Studio code integration to create, develop and publish your Python functions. To do this, you will need to install Python 3.11 in your local environment. This feature will also allow you to leverage PyPI packages from a set that is included in your functions. You can see all the supported packages in the requirements.txt file included in your Python project.

Pictured: Running a Python function by providing the required parameters.

Python User Data Functions are also compatible with the rich connectivity story provided by the Fabric platform, leveraging the connectors to Data Warehouses, Lakehouses, Azure SQL Databases and more, using the Manage Connections feature.

Data Pipelines integration

Data pipelines in Fabric provide a simple interface to create and manage large data processing tasks by using Activities. Activities are the fundamental object that represents each step of a data processing task. Users can leverage several interconnected Activities to create large, elaborate data processing solutions.

User Data Functions is now available as an Activity, allowing users to create custom code processing steps for their Data pipelines. You can find them by going to Activities and selecting the Functions activity. After the Functions Activity is inserted in the Data pipeline, you will see the option to use Fabric User Data Functions in the settings tab.

Pictured: Creating a new Activity and selecting Functions. Then, specifying a Fabric User Data Functions.

In the Functions Activity, you can select the following configuration settings:

  1. The Connection that will be used to run the User Data Function.
  2. The Workspace that contains the User Data Functions you’d like to use.
  3. The User Data Functions artifact that contains the individual Functions you’d like to invoke.
  4. The individual Function that you’d like to run in this pipeline activity.
  5. The parameters that you’d like to pass to this function.

The parameters can be either static values, as specified in the Settings tab, or dynamic content that comes from another Data pipeline Activity, such as a For Each activity.

Pictured: Setting the configuration values of a Functions Activity in Data Pipelines.

To pass dynamic content as a parameter, you need to use the Pipeline expression builder, which can take data from previous activities into your Functions Activity. This will be quite useful to create data pipelines that use User Data Functions as a processing step.

And that’s it! Feel free to reach out to us if you have any questions or feedback. You can still apply to participate in the preview of User Data Functions and start using these new features.

Resources

Related blog posts

Transform, Validate and Enrich Data with Python User Data Functions in Your Data Pipelines

September 27, 2024 by Naama Tsafrir

Introducing tags – now in public preview. When it comes to data discovery and management, the modern data estate presents a set of daunting challenges for organizations and admins. An explosion in data sources coupled with rapid movement to the cloud is accommodating admins of all type, as well as CDOs and data stewards busy. … Continue reading “Tag your data to enrich item curation and discovery”

September 26, 2024 by Evelina Alroy-Brin

A new Terraform Provider for Microsoft Fabric is now in public preview. This provider empowers users to automate and streamline their deployment and management processes in a declarative manner. With the Terraform Provider for Microsoft Fabric, users can: Enhance Governance and Compliance: Establish and enforce processes, mitigate risks, identify and rectify infrastructure drifts, use policies … Continue reading “Announcing the new Terraform Provider for Microsoft Fabric (Public Preview)”