Microsoft Fabric Updates Blog

Fabric changing the game: Logging your workload using Notebooks.

I was working on an example for a customer about logging a file error of execution while you are running multiple notebooks in parallel in a try-and-catch scenario. While thinking about that scenario in a Fabric environment I realized this work is now so much easier. As I mentioned before in other posts, OneLake integration makes everything simple!

Follow some references:

So, for the multiple notebooks: Microsoft Fabric changing the game: Exporting data and building the Lakehouse | Microsoft Fabric Blog | Microsoft Fabric

For the OneLake: Fabric Changing the game – OneLake integration | Microsoft Fabric Blog | Microsoft Fabric

Let’s discuss how to manage to log into OneLake using notebooks. The simplest way to do that in my opinion is using the logging library. The fact I can easily use the OneLake API Path inside of the notebook makes it even easier. Let me show you this step by step.

Step By Step

First, I will create a very generic example using logging and a function. My plan here is to create a function that will divide two numbers and return the result. If the division is possible, it will log as it worked, if not it will log the error. The classic error would be a division by zero.

Please note this is just an example and a feasible way to do it. There are other possibilities, and you may prefer other approaches. As this is coding it is about the developer deciding what they prefer, though I must say I found this one quite easy.

Steps:

  1. Libraries Create the notebook and import the following. Code examples will be using Python in this post:
   
   import logging

   import time

2. Customize date and time: It would be quite nice to have the date and time when the error happened. I am using time and formatting the string accordingly:

Follow the reference: https://docs.python.org/3/howto/logging-cookbook.html#formatting-times-using-utc-gmt-via-configuration




   #date for the name format
   datestr = time.strftime("_%Y%m%d_T_%H%M")



   #date for the log
   datestr_log = time.strftime("%Y-%m-%d - %H:%M:%S:%M")
 

Results are:

  
   _20230725_T_0859

   - 2023-07-25 - 08:59:00:59(UTC)

3. API PATH: Let’s parameterize and define the path where it will be logged. An important consideration here is: You would need to use API Path for the storage and with Fabric you can get it with one click, as Fig. 1 API path, shows.

imgFig 1 API Path

Follow the code:

   

   log_path ='/lakehouse/default/Files/Files/Log_Generic'


   file_name = 'divide_2_number.log_' + datestr



   logging.basicConfig(filename=log_path + file_name+'.txt',  
                                           force = True,
                                           filemode='w',
                                           level=logging.INFO)##or DEBUG


Note*: force = True. That is because, once you define a path for the basicConfig, you do not need to define it again. That is a configuration that you define just once.* Hence, this setting ( force=true) will result in the handlers for the system being closed and removed. That means *it would make it* possible to define a different path for my executions to save the logs. Ref: logging — Logging facility for Python — Python 3.11.4 documentation

I am using here INFO to manage the log file; however, you have other options as follows. For example, Debug is an interesting option that will return even the traceback info of the error:

Reference: Logging HOWTO — Python 3.11.4 documentation

LevelWhen it’s used
DEBUGDetailed information, is typically of interest only when diagnosing problems.
INFOConfirmation that things are working as expected.
WARNINGAn indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.
ERRORDue to a more serious problem, the software has not been able to perform some functions.
CRITICALA serious error, indicates that the program itself may be unable to continue running.

4. Function code: Follow the code for the function that will receive 2 parameters, divide. If there is a failure, log in. If it works, it also logs under the try\catch:

Follow the code:


   #################### Begin of the function #################### 


   def divide_2_number(a,b):


    try:

        result = a/b


        message = f"- {datestr_log}(UTC) - results are: {result} '\n"


        logging.info(message)


        print (message)


        return result


    except Exception as e:



        error_message = f"- {datestr_log}(UTC) - Exception occurred: '\n\n Follow the error:\n\n {e}\n"



        print(error_message)



        logging.critical(error_message)



  #################### End of the function#################### 


#execution of the function
divide_2_number(2,0)

It will take a few minutes for the file to be persisted and be visible in the folder on the OneLake, but it will. Be patience. Fig. 2 Log and Fig. 3 Log File, shows the results:

imgFig 2 Log

imgFig. 3 Log File

If I want to replicate this example on the execution of the parallel notebook that I mentioned earlier, that would be the results:

Follow the code:

Note: Please be careful with indentation when you copy and paste this code as Python is quite sensitive, hence you may need to adjust before running.

   

from concurrent.futures import ThreadPoolExecutor
import logging
import time



#date for the name format
datestr = time.strftime("_%Y%m%d_T_%H%M")

#date for the log
datestr_log = time.strftime("%Y-%m-%d - %H:%M:%S:%M")


log_path ='/lakehouse/default/Files/Silver/Logs/'
file_name = 'Parallel_notebooks_info.log_' + datestr
logging.basicConfig(filename=log_path + file_name+'.txt',  
                                           force = True, 
                                           filemode='w',
                                           level=logging.INFO)##or DEBUG


timeout = 3600


#Define the folder path for the error log

notebooks = [
       {"path": "/Notebook_interactive"
       , "params":  {"parameterString":"Production.Product"}},

       {"path": "/Notebook_interactive"
        , "params": {"parameterString":"AAA"}},

       {"path": "/Notebook_interactive"
       , "params": {"parameterString":"Production.WorkOrder"}}]
   



#################### Begin of the function #################### 
  

def func_notebook_Error_handle(notebook):

    try:

        mssparkutils.notebook.run(notebook["path"], timeout, notebook["params"])

        message = f"- {datestr_log}(UTC) - Notebook executed '{notebook['path']} , {notebook['params']} '\n"

        logging.info(message)

    except Exception as e:

        error_message = f"- {datestr_log}(UTC) - Exception occurred in notebook '{notebook['path']} , {notebook['params']}'\n\n Follow the error:\n\n {e}\n"

        logging.critical(error_message)


#################### End of the function#################### 



# Create a ThreadPoolExecutor and execute the function created

# Submit notebook executions to the executor

with ThreadPoolExecutor() as executor:

   notebook_tasks = [executor.submit(func_notebook_Error_handle, notebook) for notebook in notebooks]




Summary: Using the logging library and the API path from OneLake it takes just a few lines of code to configure log files in your notebooks enabling you to create a better track of failures or successful executions.

Related blog posts

Fabric changing the game: Logging your workload using Notebooks.

October 9, 2024 by Misha Desai

At Fabric, we’re passionate about contributing to the open-source community, particularly in areas that advance the usability and scalability of machine learning tools. One of our recent endeavors has been making substantial contributions back to the FLAML (Fast and Lightweight AutoML) project, a robust library designed to automate the tedious and complex process of machine … Continue reading “Enhancing Open Source: Fabric’s Contributions to FLAML for Scalable AutoML”

September 25, 2024 by Santhosh Kumar Ravindran

We’re excited to introduce high concurrency mode for notebooks in pipelines, bringing session sharing to one of the most popular orchestration mechanisms for enterprise data ingestion and transformation. Notebooks will now automatically be packed into an active high concurrency session without compromising performance or security, while paying for a single session. Key Benefits: Why Use … Continue reading “Introducing High Concurrency Mode for Notebooks in Pipelines for Fabric Spark”