Microsoft Fabric Updates Blog

Fabric Changing the game: Using your own library with Microsoft Fabric

Unlock the Power of Reusing Your Library in the Lakehouse with This Step-by-Step Guide. Fabric provides you with different options to build your analytics solution. If you decide to use Lakehouse and code your solution with Python you may want to reuse some functions and reuse certain business logic across your code. Once your library is created and imported into Fabric you can even use the endorsement options to flag for other users to use your code. It is quite simple to create your own library with Python and even simpler to reuse it on Fabric. Let’s check the step by step:

Creating library – Why?

Library in the Python context is a set of codes pre-compiled. That code, as it was mentioned before, could be something like a business logic that you reuse across the organization. When reusing this Library in a Fabric environment, we need to package the code and import.

1 . How?

Let’s set our environment:

  1. First, install Visual Studio Code – Visual Studio Code – Code Editing. Redefined
  2. Install Python Extension – Python – Visual Studio Marketplace
  3. Add a folder to the environment that represents what you are trying to achieve. Mine is named as testpackage, as you can see in Fig 1- package_folder:
Fig 1- package_folder

Note: Commands should run at the terminal in the VS Code.

4. Define your interpreter as Fig 2- Interpreter shows:

Fig 2- Interpreter

5. Create the folder where your code will be. Mine is Math_Custom, as we will create a function that will add 2 numbers for the sake of this example. You can create this folder using the following command line at the Visual Studio Code, as Fig 3 – Mkdir shows:

Fig 3 – Mkdir

6. Let’s create a function to be reused: We will define a simple example for this library exercise. As mentioned, we will create a function called add_2. That function will add a number plus the other. Hence, to accomplish that: first, open a new file. (Menu file at the VS Code -> new File -> Python, as Fig 4-NewPy, shows.)

Fig 4-NewPy

Code:


def add_2(x,y):

  return x+y

Run the file add_2.y by clicking on run. Fig 5-Run, shows :

Fig 5-Run

7. Next, let’s create the init file of the project with the function/modules created, __Init__ defines the methods that will be called when you use your import statement to use this library. Inside of the __init__.py use the following code as this example shows:

Code:

from add_2 import add_2

8. We also need a Setup File to create this package and import it to the library later. Setup.py should exist outside of the math_custom folder. Therefore, create a new Python file and save it as setup.py outside the math_custom folder. This file should contain the package metadata, for example:

Code:

from setuptools import setup, find_packages

setup(

  name='testpackage',

  version='0.4',

  author='Liliam Leme',

  packages=['math_custom',],

)

9. The structure of your folder should be something like this at this point:

testpackage ( main folder)

​ —– Setup.py ( setup file)

​ ———– math_custom (Source code folder)

——————–add_2.py ( function code)

——————– __init__.py( init file)

10. Ok. Let’s test our definitions of modules/functions and see how it goes. Create a new py file called Test.py and run the file:

Code:

import add_2

print (add_2.add_2(1,2))

Example at Fig 6 – Compile, show the execution of the module which you should be able to get by clicking in run at the Visual Studio Code:

Fig 6 – Compile

11. License\Readme files:

Readme.me will be a file that you can create in the source folder math_custom and it will contain information about your package as the example below:

   # Example Package

   This is a simple example package. 

As for License, quoting from (Packaging Python Projects — Python Packaging User Guide) “It’s important for every package uploaded to the Python Package Index to include a license. This tells users who install your package the terms under which they can use your package.

The structure would be something like this at this point:

my_custom_code_library:

├── folder_name_of_my_module

│ ├── init.py

│ ├── function_module.py

│ ├── tests\tests.py

├── setup.py

└── README.md

13. The next step is to create the Distribution Package. ref: Glossary — Python Packaging User Guide. Run the following commands at the terminal in the VS Code as Fig 7-Pip shows:

   -m pip install
Fig 7-Pip

Install Wheel, Fig 8 and 9-Wheels show the following steps to create the package:

pip install wheel

Check installation

pip install check-wheel-contents

Fig 8-Wheels
Fig 9-Wheels

Note, In case you face this issue:

pip : The term ‘pip’ is not recognized as the name of a cmdlet, function, script file, or operable program.

Add a Python path to the list of variables by doing the following:

(Ref: https://learn.microsoft.com/en-us/windows/win32/procthread/environment-variables)

Right-click on My Computer -> System from the Control Panel, select Advanced system settings and click Environment Variables.

Click on path and edit the location of the Python path. It should be the one you use when you install, generally speaking, at:

C:\Users\USERNAME\AppData\Local\Programs\Python\PythonvVERSION\Scripts

12. Create the distribution

Your package will be “A versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release“. Fig 10-Package shows.

The command line is:

setup.py bdist_wheel

Fig 10-Package

You can check if the Distribution was created successfully at VS Code Explorer, Fig 11-Dist. exemplifies:

Fig 11-Dist.

12. Import inside of Fabric.

The easiest step of all steps. ​ The easiest step is to import into Fabric.

From Fabric, open the workspace settings by selecting the workspace first( my workspace name is NextSynapse12) -> Data Engineering -> Library management.

Fig 12-Fabric, show the step-by-step:

Ref: Workspaces – Microsoft Fabric | Microsoft Learn

Manage Apache Spark libraries – Microsoft Fabric | Microsoft Learn

From the docs:

  • Upload new custom library: You can upload your custom codes as packages to the Fabric runtime through the portal. The library management module helps you resolve potential conflicts and download dependencies in your custom libraries. To upload a package, select the Upload button under the Custom libraries panel and select a local directory.”
Fig 12-Fabric.

13. Executing

Once your library is imported into the Fabric environment you can do the following to reuse – Fig 13 – Execution, show the result:

Code:

##custom library.

from math_custom import add_2


print (add_2.add_2(1,2))
Fig 13 - Execution

14- Simple as that if you want to flag this notebook code to be reused across your organization you can use the endorsement options for Fabric, available at the items settings. Endorsement overview – Microsoft Fabric | Microsoft Learn

  • Promotion: Promotion enables users to highlight items that they think are valuable, worthwhile, and ready for others to use. It encourages the collaborative spread of content within the organization
  • Certification: Certification means that the item meets the organization’s quality standards can be regarded as reliable and authoritative, and is ready for use across the organization.
Fig 14 – Endorsement

That is, it!

Liliam – UK

References:

https://packaging.python.org/en/latest/tutorials/packaging-projects/

Quick Start — The Hitchhiker’s Guide to Packaging 1.0 documentation (the-hitchhikers-guide-to-packaging.readthedocs.io)

Workspaces – Microsoft Fabric | Microsoft Learn

Manage Apache Spark libraries – Microsoft Fabric | Microsoft Learn

Endorsement overview – Microsoft Fabric | Microsoft Learn

Related blog posts

Fabric Changing the game: Using your own library with Microsoft Fabric

September 25, 2024 by Santhosh Kumar Ravindran

We’re excited to introduce high concurrency mode for notebooks in pipelines, bringing session sharing to one of the most popular orchestration mechanisms for enterprise data ingestion and transformation. Notebooks will now automatically be packed into an active high concurrency session without compromising performance or security, while paying for a single session. Key Benefits: Why Use … Continue reading “Introducing High Concurrency Mode for Notebooks in Pipelines for Fabric Spark”

September 25, 2024 by Jenny Jiang

Fabric Apache Spark Diagnostic Emitter for Logs and Metrics is now in public preview. This new feature allows Apache Spark users to collect Spark logs, job events, and metrics from their Spark applications and send them to various destinations, including Azure Event Hubs, Azure Storage, and Azure Log Analytics. It provides robust support for monitoring … Continue reading “Announcing the Fabric Apache Spark Diagnostic Emitter: Collect Logs and Metrics”