Fabric changing the game: Logging your workload using Notebooks.
I was working on an example for a customer about logging a file error of execution while you are running multiple notebooks in parallel in a try-and-catch scenario. While thinking about that scenario in a Fabric environment I realized this work is now so much easier. As I mentioned before in other posts, OneLake integration makes everything simple!
Follow some references:
So, for the multiple notebooks: Microsoft Fabric changing the game: Exporting data and building the Lakehouse | Microsoft Fabric Blog | Microsoft Fabric
For the OneLake: Fabric Changing the game – OneLake integration | Microsoft Fabric Blog | Microsoft Fabric
Let’s discuss how to manage to log into OneLake using notebooks. The simplest way to do that in my opinion is using the logging library. The fact I can easily use the OneLake API Path inside of the notebook makes it even easier. Let me show you this step by step.
Step By Step
First, I will create a very generic example using logging and a function. My plan here is to create a function that will divide two numbers and return the result. If the division is possible, it will log as it worked, if not it will log the error. The classic error would be a division by zero.
Please note this is just an example and a feasible way to do it. There are other possibilities, and you may prefer other approaches. As this is coding it is about the developer deciding what they prefer, though I must say I found this one quite easy.
Steps:
- Libraries Create the notebook and import the following. Code examples will be using Python in this post:
import logging
import time
2. Customize date and time: It would be quite nice to have the date and time when the error happened. I am using time and formatting the string accordingly:
Follow the reference: https://docs.python.org/3/howto/logging-cookbook.html#formatting-times-using-utc-gmt-via-configuration
#date for the name format
datestr = time.strftime("_%Y%m%d_T_%H%M")
#date for the log
datestr_log = time.strftime("%Y-%m-%d - %H:%M:%S:%M")
Results are:
_20230725_T_0859
- 2023-07-25 - 08:59:00:59(UTC)
3. API PATH: Let’s parameterize and define the path where it will be logged. An important consideration here is: You would need to use API Path for the storage and with Fabric you can get it with one click, as Fig. 1 API path, shows.
Fig 1 API Path
Follow the code:
log_path ='/lakehouse/default/Files/Files/Log_Generic'
file_name = 'divide_2_number.log_' + datestr
logging.basicConfig(filename=log_path + file_name+'.txt',
force = True,
filemode='w',
level=logging.INFO)##or DEBUG
Note*: force = True. That is because, once you define a path for the basicConfig, you do not need to define it again. That is a configuration that you define just once.* Hence, this setting ( force=true) will result in the handlers for the system being closed and removed. That means *it would make it* possible to define a different path for my executions to save the logs. Ref: logging — Logging facility for Python — Python 3.11.4 documentation
I am using here INFO to manage the log file; however, you have other options as follows. For example, Debug is an interesting option that will return even the traceback info of the error:
Reference: Logging HOWTO — Python 3.11.4 documentation
Level | When it’s used |
---|---|
DEBUG | Detailed information, is typically of interest only when diagnosing problems. |
INFO | Confirmation that things are working as expected. |
WARNING | An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected. |
ERROR | Due to a more serious problem, the software has not been able to perform some functions. |
CRITICAL | A serious error, indicates that the program itself may be unable to continue running. |
4. Function code: Follow the code for the function that will receive 2 parameters, divide. If there is a failure, log in. If it works, it also logs under the try\catch:
Follow the code:
#################### Begin of the function ####################
def divide_2_number(a,b):
try:
result = a/b
message = f"- {datestr_log}(UTC) - results are: {result} '\n"
logging.info(message)
print (message)
return result
except Exception as e:
error_message = f"- {datestr_log}(UTC) - Exception occurred: '\n\n Follow the error:\n\n {e}\n"
print(error_message)
logging.critical(error_message)
#################### End of the function####################
#execution of the function
divide_2_number(2,0)
It will take a few minutes for the file to be persisted and be visible in the folder on the OneLake, but it will. Be patience. Fig. 2 Log and Fig. 3 Log File, shows the results:
Fig 2 Log
Fig. 3 Log File
If I want to replicate this example on the execution of the parallel notebook that I mentioned earlier, that would be the results:
Follow the code:
Note: Please be careful with indentation when you copy and paste this code as Python is quite sensitive, hence you may need to adjust before running.
from concurrent.futures import ThreadPoolExecutor
import logging
import time
#date for the name format
datestr = time.strftime("_%Y%m%d_T_%H%M")
#date for the log
datestr_log = time.strftime("%Y-%m-%d - %H:%M:%S:%M")
log_path ='/lakehouse/default/Files/Silver/Logs/'
file_name = 'Parallel_notebooks_info.log_' + datestr
logging.basicConfig(filename=log_path + file_name+'.txt',
force = True,
filemode='w',
level=logging.INFO)##or DEBUG
timeout = 3600
#Define the folder path for the error log
notebooks = [
{"path": "/Notebook_interactive"
, "params": {"parameterString":"Production.Product"}},
{"path": "/Notebook_interactive"
, "params": {"parameterString":"AAA"}},
{"path": "/Notebook_interactive"
, "params": {"parameterString":"Production.WorkOrder"}}]
#################### Begin of the function ####################
def func_notebook_Error_handle(notebook):
try:
mssparkutils.notebook.run(notebook["path"], timeout, notebook["params"])
message = f"- {datestr_log}(UTC) - Notebook executed '{notebook['path']} , {notebook['params']} '\n"
logging.info(message)
except Exception as e:
error_message = f"- {datestr_log}(UTC) - Exception occurred in notebook '{notebook['path']} , {notebook['params']}'\n\n Follow the error:\n\n {e}\n"
logging.critical(error_message)
#################### End of the function####################
# Create a ThreadPoolExecutor and execute the function created
# Submit notebook executions to the executor
with ThreadPoolExecutor() as executor:
notebook_tasks = [executor.submit(func_notebook_Error_handle, notebook) for notebook in notebooks]
Summary: Using the logging library and the API path from OneLake it takes just a few lines of code to configure log files in your notebooks enabling you to create a better track of failures or successful executions.