Microsoft Fabric Updates Blog

Using Microsoft Fabric’s Lakehouse Data and prompt flow in Azure Machine Learning Service to create RAG applications

Microsoft Fabric’s Lakehouse helps us better unified management of enterprise-level data environments. In the process of transforming to AI, we cannot do without the assistance of these enterprise data. In my previous blog, I mentioned how to build RAG applications based on data in the Microsoft Fabric environment. In this post, I will introduce how to build a RAG application through prompt flow in a more professional machine learning environment – Azure Machine Learning Service combined with Microsoft Fabric’s Lakehouse data.

Azure Machine Learning Service is a machine learning platform that I enjoy using, covering the machine learning process from data, training, testing, deployment, monitoring, etc. We can very quickly introduce Microsoft Fabric Lakehouse data to Azure Machine Learning Service through a short script.

1. Get the ABFS Path of Lakehouse in Microsoft Fabric.

Choose Your Microsoft Fabric’s Lakehouse, Click Files -> Properties.

Copy ABFS Path

abfss://<One Lake workspace name>@msit-onelake.dfs.fabric.microsoft.com/<Lakehouse ID>/Files

2. Create a new Notebook in your local machine. Execute the following code to import Lakehouse data into Azure Machine Learning Service

! pip install azure-ai-ml -U 
! pip install mltable azureml-dataprep[pandas] -U 
! pip install azureml-fsspec -U

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact

subscription_id = "Your Azure Subscription ID" 
resource_group = "Your Azure Machine Learning Service Workspace Resource Group" 
workspace = "Your Azure Machine Learning Service Workspace Name"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

artifact = OneLakeArtifact(
    name=<Lakehouse ID>, 
    type="lake_house"
)
store = OneLakeDatastore(
    name="onelake_lh_for_azureml",
    description="Credential-less OneLake datastore.",
    endpoint="msit-onelake.dfs.fabric.microsoft.com",
    artifact=artifact,
    one_lake_workspace_name=<One Lake workspace name>,
)

ml_client.create_or_update(store)

3. Test the data to see if it is imported successfully.

from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azureml.fsspec import AzureMachineLearningFileSystem

uri = 'azureml://subscriptions/<Your Azure Subscription ID >/resourcegroups/<Your Azure Machine Learning Service Resource Group>/workspaces/<Your Azure Machine Learning Service Workspace Name>/datastores/onelake_lh_for_azureml'

# create the filesystem

fs = AzureMachineLearningFileSystem(uri)

fs.ls()

with fs.open('Files/csv/sales.csv') as f:
    data = f.readlines()
    print(data[0:5])
    f.close()

You can select Data from Azure Machine Learning Service to see if the relevant data is imported successfully.

from azure.ai.ml.entities import Data
import pandas as pd
import mltable

csv_path = 'azureml://datastores/onelake_lh_for_azureml/paths/Files/csv'
my_csv_data = Data(
        path=csv_path,
        type=AssetTypes.URI_FOLDER,
        description="demo",
        name="csv_data_source",
        version="1.0.0"
)

ml_client.data.create_or_update(my_csv_data)

csv_data = ml_client.data.get("csv_data_source", version="1.0.0")

path = {
  'folder': csv_data.path
}

tbl = mltable.from_delimited_files(paths=[path])

df = pd.read_csv( csv_data.path + '/sales.csv')

df

Of course, you can also check the data in the workspace of Azure Machine Learning Service to see if it is synchronized well.

In the previous content we used Semantic Kernel. In this blog, we use prompt flow to build the application. Prompt flow is a development tool designed to streamline the entire development cycle of AI applications powered by Large Language Models (LLMs). As the momentum for LLM-based AI applications continues to grow across the globe, Prompt flow provides a comprehensive solution that simplifies the process of prototyping, experimenting, iterating, and deploying your AI applications. If you’re looking for a versatile and intuitive development tool that will streamline your LLM-based AI application development, then prompt flow is the perfect solution for you.

The biggest feature of prompt flow is to help the Prompt project to be better integrated into the project. Especially in stabilizing the output of LLM, it allows you to choose the best Prompt and combine it with LLM for effective work.

Prompt flow development applications can be developed on Azure Machine Learning Service, on the command line, or on Visual Studio Code. It is recommended that you develop on Visual Studio Code. Firstly, you need to install prompt flow for VS Code extensions.

After successful installation, click on the prompt flow extensions on the left sidebar and select Installation Dependencies. When the environment is successfully configured, you can choose to create and build the Prompt flow application.

Prompt flow can support different connections, such as Azure OpenAI Service, Azure Cognitive Search, Azure Content Safety and also support Custom Connections. You can set relevant content according to your needs.

Custom Connection is often used. You can set some link configurations, mainly in the form of key-value pairs.

Use prompt flow to quickly build a flow for enterprise data. The following are implementations for structured data and unstructured data, as well as a simple example of the Chat flow process. All of this data all comes from our Azure Machine Learning Service (imported from Microsoft Fabric Lakehouse)

This is a RAG application for unstructured data and structured data built by prompt flow

You can download samples in my GitHub Repo

Relaterade blogginlägg

Using Microsoft Fabric’s Lakehouse Data and prompt flow in Azure Machine Learning Service to create RAG applications

oktober 15, 2024 från Someleze Diko

This session is part of the Microsoft Fabric and AI Learning Hackathon which focuses on how you can leverage Copilot in Microsoft Fabric. It will guide you through the various capabilities that Copilot offers for you to use Microsoft Fabric, empowering you to enhance productivity and streamline your workflows. We will dive deep into practical … Continue reading “Microsoft Fabric and AI Learning Hackathon: Copilot in Fabric”

oktober 10, 2024 från Abhishek Narain

At the Microsoft Fabric Community Conference Europe 2024, we announced the General Availability (GA) of Copilot for Data Factory. It operates like a subject-matter expert (SME), collaborating with you to design your dataflows. Find our Copilot for Data Factory GA announcement blog.   Today, we all brainstorm ideas and draw sketches before formalizing them. As … Continue reading “Use Azure OpenAI to turn whiteboard sketches into data pipelines”