Microsoft Fabric Updates Blog

Fabric Data Pipelines – Advanced Scheduling Techniques (Part 2: Run a Pipeline on a Specific Day)

Author: Kyungchul (Kevin) Lee | LinkedIn

Introduction

Welcome back to the blog series on Advanced Scheduling techniques in Microsoft Fabric Data pipelines.

In our first blog we covered how to replicate event driven scheduling by polling a storage location (Fabric Data Pipelines – Advanced Scheduling Techniques (Part 1) | Microsoft Fabric Blog | Microsoft Fabric). Good news is that we have recently announced the Public Preview of Storage Event Triggers Data pipelines storage event triggers in Data Factory (Preview) – Microsoft Fabric | Microsoft Learn

Today we want to cover another common scenario that we have heard from the community, that is the ability to schedule a Pipeline on a specific day of the month, including both the start of the month along with the last day of the month. 

While we are working to build in native capabilities to handle this scenario, I hope you will find this blog helpful in the interim.

Scenario

I need my pipeline to run on the first day, 15th, and the last day of the month.  

Similar to that of the first blog, we are going to have a Pipeline that is scheduled to run daily, with logic embedded to determine the current date and if any actions are needed.  

Solution Overview

To compute the desired dates, we can use dynamic expressions and built-in functions to pass the current timestamp to a variable. We use this variable to compare its date with the first day, 15th, and the last day of the month. If the dates are equal, meaning that today’s date is either the first day, the 15th, or the last day of the month, then we can set up subsequent activities to run, such as invoking an existing pipeline.

Solution Details

To put the above scenario into a data factory pipeline in Microsoft Fabric, we first need to set a string type variable that stores a pipeline timestamp.

v_string_now = @convertFromUtc(utcnow(),’Eastern Standard Time’)

Because we need to run a pipeline on specific dates, we use an ‘If Condition’ activity to check if the current date is one of these specific dates. If today is one of those dates, the expression will evaluate to True, and the activities under the ‘true case’ in the ‘If Condition’ will run.

A screen shot of a computer

Description automatically generated

For the expression used in the ‘If Condition’, we use the ‘equals’ function to compare today’s date with the specified dates and the ‘or’ function to check whether the expression evaluates to true. After our expression is evaluated to True, then you can specify a pipeline in ‘Invoke pipeline’ activity and run it on the specific dates.

Dynamic Expressions for If Condition Activity

First Day of the Month

@equals(dayOfMonth(variables(‘v_string_now’)), startOfMonth(variables(‘v_string_now’)))

Compare the current date of the month to the calculated start of the month

Last Day of the Month

@equals(dayOfMonth(variables(‘v_string_now’))
    ,dayOfMonth(
        adddays(
            addToTime(
                startOfMonth(
                    variables(‘v_string_now’)
                )
            , 1
            ,’Month’
            )
        ,-1
        )
    )
)

Compare the current date of the month to start of next month – 1 day

Specific Day of the Month

@equals(dayOfMonth(variables(‘v_string_now’)), <#>)

Compare the current date of the month to a specified integer representing a specific day of the month, such as 15 for the 15th of the month

Putting it all together

If a specific pipeline needs to run on any of those dates, then we can use one ‘If condition’ activity to achieve this.

or(
    or(
        equals(
            dayOfMonth(variables('v_string_now')),
            startOfMonth(variables('v_string_now'))
        ), 
        equals(
            dayOfMonth(variables('v_string_now')),
    	    dayOfMonth(
                adddays(
                    addToTime(
                        startOfMonth(
                        variables('v_string_now')
                        ),
                        1,
                        'Month'),
                -1)
            )
        )
    ), 
    equals(dayOfMonth(variables('v_string_now')), <#>)
)

One thing to note in the above expression is that we can use a pipeline parameter and replace a specific date of the month (<#> in the above expression).

A screenshot of a computer

Description automatically generated

If you have more complex scenarios that require running different pipelines on different dates, you can incorporate additional “if condition” activities and a “switch” activity. By building your logic with dynamic expressions, you can execute subsequent activities based on cases.

There have been concerns about the cost of running a daily pipeline that performs a simple calculation. To address this, the example provided above only consumes approximately 100 capacity throughput, measured in Capacity Unit seconds (CUs) per run, effectively costing less than a penny per run. However, please be aware that any subsequent activities following the evaluation of expressions will affect the total capacity units consumed.

A screenshot of a computer

Description automatically generated

Solution Summary

In this blog post, we explored how we can use a variable and ‘If Condition’ activity to run a pipeline on specific dates. This pipeline will be scheduled to run daily, evaluate ‘If Condition’ expression, and run next activities if the expression is evaluated as true. Thank you for reading, if you have other scheduling scenarios, please let us know in the comments below.

To see the complete code view of the demo pipeline, please visit the Fabric Toolbox on GitHub.

Have any questions or feedback? Leave a comment below!

منشورات المدونات ذات الصلة

Fabric Data Pipelines – Advanced Scheduling Techniques (Part 2: Run a Pipeline on a Specific Day)

أكتوبر 29, 2024 بواسطة Leo Li

We’re excited to announce several powerful updates to the Virtual Network (VNET) Data Gateway, designed to further enhance performance and improve the overall user experience. These new features allow users to better manage increasing workloads, perform complex data transformations, and simplify log management. Expanded Cluster Size from 5 to 7 One of the key improvements … Continue reading “New Features and Enhancements for Virtual Network Data Gateway”

أكتوبر 28, 2024 بواسطة Estera Kot

We’re thrilled to announce that the Native Execution Engine is now available at no additional cost, unlocking next-level performance and efficiency for your workloads. What’s New?  The Native Execution Engine now supports Fabric Runtime 1.3, which includes Apache Spark 3.5 and Delta Lake 3.2. This upgrade enhances Microsoft Fabric’s Data Engineering and Data Science workflows, … Continue reading “Native Execution Engine available at no additional cost!”