Prebuilt Azure AI services in Fabric
During the recent Ignite 2023 event, we announced the public preview of prebuilt AI services in Fabric. This integration with Azure AI services, formerly known as Azure Cognitive Services, allows for easy enhancement of data with prebuilt AI models without any prerequisites.
Using AI services in Fabric has never been easier! In the past, you had to go through the hassle of provisioning an AI resource in the Azure portal, managing the access key, or setting up a linked service before you could use them. But now, with prebuilt AI services, you can skip those steps altogether! Plus, you can access specific Azure AI services in Fabric right out of the box, without any prerequisites. And the best part? You’ll get integrated billing against Fabric capacity, making it super easy to keep track of your usage.
Currently, prebuilt AI services are in public preview and include support for Azure Open AI service, Azure AI Language, and Azure AI Translator. We are in the process of adding more AI services. If there are any specific AI services that you are expecting to see in Fabric, we encourage you to submit your ideas or upvote existing ones on the Fabric ideas to influence our priority.
It is important to note that the Azure Open AI service is not supported on trial SKUs and only paid SKUs (F64 or higher, or P1 or higher) are supported. We are enabling Azure Open AI in stages, with access for all users expected by March 2024.
In the next section, we will take a look at how you can use the prebuilt AI services to analyze hotel reviews and create a Power BI report that can help you find the best hotel for your needs.
Analyze hotel reviews with AI in Fabric
Imagine you’re planning a trip to Yellowstone National Park and need to book a hotel that meets your preferences. With hundreds of reviews to sift through, it can be overwhelming to find the most relevant information and filter hotels accordingly. However, with Fabric, you can easily translate, extract, and classify hotel reviews with zero setup effort using prebuilt AI services. Then, with Power BI, you can create a visual report that allows you to filter hotels by categories and view their ratings and comments.
Let me walk you through the process. You can also access the notebook and run it in your Fabric workspace.
Load the Data
In Fabric, it’s very easy to get started with Notebooks and to work with data. Just make sure to attach a Lakehouse to your notebook, and you’re good to go!
To download the hotel review data from a public blob and store it in the Lakehouse, simply use the following code.
IS_CUSTOM_DATA = False # if True, dataset has to be uploaded manually
if not IS_CUSTOM_DATA:
# Download data files into lakehouse if it does not exist
import os, requests
remote_url = "https://synapseaisolutionsa.blob.core.windows.net/public/hotel_reviews"
file_list = ["hotel_reviews_demo.csv"]
download_path = f"/lakehouse/default/Files/hotel_reviews/raw/"
if not os.path.exists("/lakehouse/default"):
raise FileNotFoundError(
"Default lakehouse not found, please add a lakehouse and restart the session."
)
os.makedirs(download_path, exist_ok=True)
for fname in file_list:
if not os.path.exists(f"{download_path}/{fname}"):
r = requests.get(f"{remote_url}/{fname}", timeout=30)
with open(f"{download_path}/{fname}", "wb") as f:
f.write(r.content)
print("Downloaded demo data files into lakehouse.")
We will be using SynapseML as a tool to help us analyze hotel reviews. SynapseML is an open-source library that makes it easy to create large scale machine learning pipelines. Fabric has the latest SynapseML package preinstalled and integrated with prebuilt AI models, making it a breeze to create smart and scalable systems for various domains.
import synapse.ml.core
from synapse.ml.services import *
from pyspark.sql.functions import col, flatten, udf, lower, trim
from pyspark.sql.types import StringType
Text Translation using Azure AI Translator
The Azure AI Translator is an AI service that lets you translate documents and text in real-time. It supports more than 100 languages and can handle various scenarios such as translation for call centers, multilingual conversational agents, or in-app communication.
You can use the prebuilt AI translator in Fabric to translate text, convert text in one language from one script to another, detect language, and more. For a comprehensive list of translator functions supported in Fabric, please visit the AI services documentation.
After examining the hotel reviews data, we noticed that the hotel reviews are written in multiple languages. To solve this, we turn to Azure AI translator to translate the reviews into English.
df = spark.read.format("csv").\
option("header","true").\
load("Files/hotel_reviews/raw/hotel_reviews_demo.csv")
display(df)
Translating text in SynapseML is a straightforward process with just a single operation call of `translate()`. If you’re already familiar with SynaspeML, you’ll be happy to know that in Fabric, you no longer need to worry about setting `setSubscriptionKey()` `setLocation()`, or `setLinkedService()`. Simply focus on the core logic and you’re ready to go.
translate = (Translate()
.setTextCol("reviews_text")
.setToLanguage("en")
.setOutputCol("translation")
.setConcurrency(5))
df_en = translate.transform(df)\
.withColumn("translation_result", flatten(col("translation.translations")))\
.withColumn("translation", col("translation_result.text")[0])\
.cache()
df_en = df_en.select(df_en.columns[:6]+ ["translation"])
display(df_en.tail(5))
For example, here is a review in Spanish:
El hotel en general esta OK. Un poco de apatia en el personal de recepcin, y otras ramas de personal.
And here is the translation:
The hotel overall is OK. A bit of apathy in the front desk staff, and other branches of staff.
Key Phrase Extraction using Azure Text Analytics
The Azure AI Language is a cloud-based service that enables you to understand and analyze text with Natural Language Processing (NLP) features. By using the prebuilt AI language in Fabric, you can analyze sentiment, identify key points, and redact sensitive information from the input text. For a detailed list of language functions that Fabric supports, please refer to the AI services documentation.
Now, I will use Azure Text Analytics to extract key phrases from the hotel reviews. Again, you don’t have to provide any subscription key or authentication credentials, it’s all seamless!
model = (AnalyzeText()
.setTextCol("translation")
.setKind("KeyPhraseExtraction")
.setOutputCol("response"))
df_en_key = model.transform(df_en)\
.withColumn("documents", col("response.documents"))\
.withColumn("keyPhrases", col("documents.keyPhrases"))\
.cache()
df_en_key = df_en_key.select(df_en_key.columns[:7]+ ["keyPhrases"])
display(df_en_key.tail(5))
For example, here are some key phrases from the translated review:
[front desk staff, other branches, good service, DELICIOUS food, etc]
Classification using Azure OpenAI
The Azure Open AI service provides REST API access to OpenAI’s powerful language models including the GPT-4, GPT-3.5-Turbo, and Embeddings model series. These models can be easily used to suit your specific needs, such as content generation, summarization, and natural language to code translation.
In Fabric, you can access the prebuilt Azure OpenAI service through REST API, Python SDK or SynapseML. To learn more about the Azure Open AI models that Fabric supports, please refer to the AI services documentation.
Here, I will use Azure OpenAI to classify the reviews into four predefined categories: [Service, Location, Facilities, and Sanitation].
process_column = udf(lambda x: f"Classify the following news article into 1 of the following categories: categories: [Service, Location, Facilities, Sanitation]. news article:{x},Classified categor: ", StringType())
df_en_key_prompt = df_en_key.withColumn("prompt", process_column(df_en_key["translation"])).cache()
display(df_en_key_prompt.tail(5))
deployment_name = "text-davinci-003"
completion = (
OpenAICompletion()
.setDeploymentName(deployment_name)
.setMaxTokens(200)
.setPromptCol("prompt")
.setErrorCol("error")
.setOutputCol("classification")
)
completed_df = completion.transform(df_en_key_prompt)\
.withColumn("class", trim(lower(col("classification.choices.text")[0])))\
.cache()
df_final = completed_df.select(completed_df.columns[:8]+["class"])
display(df_final.tail(5))
For example, for the translated Spanish review, OpenAI buckets it as “Service”.
Build a Power BI Report
Finally, you can write the results to a Lakehouse table, and use Power BI direct lake mode to build a visual report that shows the ratings and comments of the hotels by these categories.
Write the data into a delta table on your Lakehouse:
delta_table_path = "Tables/hotel_review" #fill in your delta table path
df_final.write.format("delta").mode("overwrite").option('overwriteSchema', 'true').save(delta_table_path)
I then use this table to create a semantic model.
- On the left, select OneLake data hub.
- Select the Lakehouse you attached to your notebook.
- On the top right, select Open.
- On the top, select New semantic model.
- Select hotel_review for your new semantic model, then select Confirm.
- Your semantic model is created. At the top, select New report.
- Select or drag fields from the data and visualizations panes onto the report canvas to build your report.
To create the report shown at the beginning of this section, use the following visualizations and data:
- Map chart with:
- Latitude: latitude.
- Longitude: longitude.
- Slider view with:
- Field: reviews_rating.
- Slider view with:
- Field: class.
- Table view with:
- Columns: city, name, translation.
Data Cleaning
The Power BI report classification slider may display unexpected review categories, which could be caused by a phenomenon known as “hallucination” resulting from the use of a large language model.
We can add the following cell to the notebook to clean up the data:
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
def translate(mapping):
def translate_(col):
return mapping.get(col) or col
return udf(translate_, StringType())
mapping = {'customer service': 'service', 'service.':'service','facilities.':'facilities','\nsanitation':'sanitation','\nlocation':'location'}
#df_final.translate(mapping)("class")
df_final = df_final.withColumn("class", translate(mapping)("class"))
And then rewrite it to the lakehouse:
df_final.write.format("delta").mode("overwrite").option('overwriteSchema', 'true').save(delta_table_path)
After returning to the Power BI report, simply click the refresh button. Thanks to the direct lake connection, the report will be automatically updated without the need to reimport any data.
By filtering hotels based on their ratings and categories, you can quickly find the ideal accommodation for your trip. Simply select the desired filters, such as location and rating, to see a list of hotels that meet your criteria. You can also read comments and reviews from previous guests to help inform your decision.
With Fabric’s prebuilt AI Services doing the heavy lifting, you can save time and avoid the hassle of spending hours reading reviews.
To run the example yourself, simply download the notebook and execute it from your Fabric workspace.
Conclusion
Accessing Azure AI services has never been easier with prebuilt AI services available right out of the box and integrated billing against Fabric capacity. If you’re interested in learning more about how to access AI services in Fabric, check out the AI services documentation for more information.
We are constantly adding more AI services to the prebuilt AI services library, and we value your input. If you have any ideas or suggestions for new AI services, we encourage you to submit them or upvote existing ones on Fabric ideas. We look forward to hearing from you!