Microsoft Fabric Updates Blog

Using Microsoft Fabric’s Lakehouse Data and prompt flow in Azure Machine Learning Service to create RAG applications

Microsoft Fabric’s Lakehouse helps us better unified management of enterprise-level data environments. In the process of transforming to AI, we cannot do without the assistance of these enterprise data. In my previous blog, I mentioned how to build RAG applications based on data in the Microsoft Fabric environment. In this post, I will introduce how to build a RAG application through prompt flow in a more professional machine learning environment – Azure Machine Learning Service combined with Microsoft Fabric’s Lakehouse data.

Azure Machine Learning Service is a machine learning platform that I enjoy using, covering the machine learning process from data, training, testing, deployment, monitoring, etc. We can very quickly introduce Microsoft Fabric Lakehouse data to Azure Machine Learning Service through a short script.

1. Get the ABFS Path of Lakehouse in Microsoft Fabric.

Choose Your Microsoft Fabric’s Lakehouse, Click Files -> Properties.

Copy ABFS Path

abfss://<One Lake workspace name>@msit-onelake.dfs.fabric.microsoft.com/<Lakehouse ID>/Files

2. Create a new Notebook in your local machine. Execute the following code to import Lakehouse data into Azure Machine Learning Service

! pip install azure-ai-ml -U 
! pip install mltable azureml-dataprep[pandas] -U 
! pip install azureml-fsspec -U

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact

subscription_id = "Your Azure Subscription ID" 
resource_group = "Your Azure Machine Learning Service Workspace Resource Group" 
workspace = "Your Azure Machine Learning Service Workspace Name"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

artifact = OneLakeArtifact(
    name=<Lakehouse ID>, 
    type="lake_house"
)
store = OneLakeDatastore(
    name="onelake_lh_for_azureml",
    description="Credential-less OneLake datastore.",
    endpoint="msit-onelake.dfs.fabric.microsoft.com",
    artifact=artifact,
    one_lake_workspace_name=<One Lake workspace name>,
)

ml_client.create_or_update(store)

3. Test the data to see if it is imported successfully.

from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azureml.fsspec import AzureMachineLearningFileSystem

uri = 'azureml://subscriptions/<Your Azure Subscription ID >/resourcegroups/<Your Azure Machine Learning Service Resource Group>/workspaces/<Your Azure Machine Learning Service Workspace Name>/datastores/onelake_lh_for_azureml'

# create the filesystem

fs = AzureMachineLearningFileSystem(uri)

fs.ls()

with fs.open('Files/csv/sales.csv') as f:
    data = f.readlines()
    print(data[0:5])
    f.close()

You can select Data from Azure Machine Learning Service to see if the relevant data is imported successfully.

from azure.ai.ml.entities import Data
import pandas as pd
import mltable

csv_path = 'azureml://datastores/onelake_lh_for_azureml/paths/Files/csv'
my_csv_data = Data(
        path=csv_path,
        type=AssetTypes.URI_FOLDER,
        description="demo",
        name="csv_data_source",
        version="1.0.0"
)

ml_client.data.create_or_update(my_csv_data)

csv_data = ml_client.data.get("csv_data_source", version="1.0.0")

path = {
  'folder': csv_data.path
}

tbl = mltable.from_delimited_files(paths=[path])

df = pd.read_csv( csv_data.path + '/sales.csv')

df

Of course, you can also check the data in the workspace of Azure Machine Learning Service to see if it is synchronized well.

In the previous content we used Semantic Kernel. In this blog, we use prompt flow to build the application. Prompt flow is a development tool designed to streamline the entire development cycle of AI applications powered by Large Language Models (LLMs). As the momentum for LLM-based AI applications continues to grow across the globe, Prompt flow provides a comprehensive solution that simplifies the process of prototyping, experimenting, iterating, and deploying your AI applications. If you’re looking for a versatile and intuitive development tool that will streamline your LLM-based AI application development, then prompt flow is the perfect solution for you.

The biggest feature of prompt flow is to help the Prompt project to be better integrated into the project. Especially in stabilizing the output of LLM, it allows you to choose the best Prompt and combine it with LLM for effective work.

Prompt flow development applications can be developed on Azure Machine Learning Service, on the command line, or on Visual Studio Code. It is recommended that you develop on Visual Studio Code. Firstly, you need to install prompt flow for VS Code extensions.

After successful installation, click on the prompt flow extensions on the left sidebar and select Installation Dependencies. When the environment is successfully configured, you can choose to create and build the Prompt flow application.

Prompt flow can support different connections, such as Azure OpenAI Service, Azure Cognitive Search, Azure Content Safety and also support Custom Connections. You can set relevant content according to your needs.

Custom Connection is often used. You can set some link configurations, mainly in the form of key-value pairs.

Use prompt flow to quickly build a flow for enterprise data. The following are implementations for structured data and unstructured data, as well as a simple example of the Chat flow process. All of this data all comes from our Azure Machine Learning Service (imported from Microsoft Fabric Lakehouse)

This is a RAG application for unstructured data and structured data built by prompt flow

You can download samples in my GitHub Repo

Gerelateerde blogberichten

Using Microsoft Fabric’s Lakehouse Data and prompt flow in Azure Machine Learning Service to create RAG applications

augustus 30, 2024 door Rie Merritt

Welcome to the Fabric Influencers Spotlight, a recurring monthly post here to shine a bright light on the places on the internet where Microsoft MVPs & Fabric Super Users are doing some amazing work on all aspects of Microsoft Fabric. The Microsoft Fabric Community team has created the Fabric Influencers Spotlight to highlight and amplify … Continue reading “Fabric Influencers Spotlight August 2024”

augustus 28, 2024 door Adi Eldar

Anomaly Detector, one of Azure AI services, enables you to monitor and detect anomalies in your time series data. This service is based on advanced algorithms, SR-CNN for univariate analysis and MTAD-GAT for multivariate analysis and is being retired by October 2026. In this blog post we will lay out a migration strategy to Microsoft Fabric, allowing … Continue reading “Advanced Time Series Anomaly Detector in Fabric”