Microsoft Fabric Updates Blog

Evaluate your Fabric Data Agents programmatically with the Python SDK (Preview)

We’re excited to announce that native support for evaluating Data Agents through the Fabric SDK is now available in Preview. You can now run structured evaluations of your agent’s responses using Python — directly from notebooks or your own automation pipelines.

Whether you’re validating accuracy before deploying to production, tuning prompts for better performance, or benchmarking improvements over time, the new APIs in the Fabric Data Agents SDK will help you test and iterate with confidence.

This blog post will walk you through how to:

  • Create a Fabric Data Agent using the SDK.
  • Connect your Lakehouse data source and select tables.
  • Define a ground truth dataset.
  • Run an evaluation and capture results.
  • Customize how answers are judged using your own prompt.
  • View detailed metrics and logs for debugging.

Prerequisites

Before running evaluations with the SDK, ensure you have a Lakehouse set up and populated with sample data. We recommend using the AdventureWorks dataset for testing purposes. You can follow the official Microsoft Fabric guide to create a Lakehouse and load the required tables. This step is essential to ensure your Data Agent has access to a structured schema for answering questions.

For setup instructions, refer to: Create a Lakehouse with AdventureWorksLH

Step 1: Install the Fabric SDK

To evaluate agents programmatically, first install the fabric-data-agent-sdk in your notebook:

%pip install -U fabric-data-agent-sdk

Step 2: Create or connect to a data agent

Use the SDK to create a new agent or connect to an existing one:

from fabric.dataagent.client import create_data_agent

data_agent_name = "ProductSalesDataAgent"
data_agent = create_data_agent(data_agent_name)

Step 3: Add Your lakehouse and select tables

Once your agent is created, add the Lakehouse as a data source and select the relevant tables:

# Add Lakehouse (optional, if not already added)
data_agent.add_datasource("AdventureWorksLH", type="lakehouse")

datasource = data_agent.get_datasources()[0]

# Select tables from dbo schema
tables = [
    "dimcustomer", "dimdate", "dimgeography", "dimproduct",
    "dimproductcategory", "dimpromotion", "dimreseller",
    "dimsalesterritory", "factinternetsales", "factresellersales"
]
for table in tables:
    datasource.select("dbo", table)

# Publish the data agent
data_agent.publish()

Step 4: Define evaluation questions and expected answers

Create a set of questions with the correct (expected) answers to evaluate how well your agent performs:

import pandas as pd

df = pd.DataFrame(columns=["question", "expected_answer"], data=[
    ["What were our total sales in 2014?", "45,694.7"],
    ["What is the most sold product?", "Mountain-200 Black, 42"],
    ["What are the most expensive items that have never been sold?", "Road-450 Red, 60"]
])

Step 5: Run the evaluation

Call evaluate_data_agent() to run the test set against the agent. You can also specify where to store the output tables and what stage of the agent to test (e.g., sandbox):

from fabric.dataagent.evaluation import evaluate_data_agent

table_name = "demo_evaluation_output"
evaluation_id = evaluate_data_agent(
    df,
    data_agent_name,
    table_name=table_name,
    data_agent_stage="sandbox"
)

print(f"Evaluation ID: {evaluation_id}")

Step 6 (Optional): Use a custom critic prompt

Want more control over how correctness is judged? Pass in your own critic_prompt with {query}, {expected_answer}, and {actual_answer} placeholders:

critic_prompt = """
Given the following query, expected answer, and actual answer, please determine if the actual answer is equivalent to expected answer. If they are equivalent, respond with 'yes'.

Query: {query}

Expected Answer:
{expected_answer}

Actual Answer:
{actual_answer}

Is the actual answer equivalent to the expected answer?
"""

evaluation_id = evaluate_data_agent(
    df,
    data_agent_name,
    critic_prompt=critic_prompt,
    table_name=table_name,
    data_agent_stage="sandbox"
)

Step 7: View summary and detailed results

Use the built-in SDK functions to retrieve the results of your evaluation:

View summary

from fabric.dataagent.evaluation import get_evaluation_summary

eval_summary_df = get_evaluation_summary(table_name)

eval_summary_df
Dataframe of the evaluation summary, showing the  number of true, false, and unclear responses.

View detailed logs

from fabric.dataagent.evaluation import get_evaluation_details

eval_details_df = get_evaluation_details(
    evaluation_id,
    table_name,
    get_all_rows=True,
    verbose=True
)
Dataframe output showing the details for a given evaluation ID. This shows the question, expected answer, judgement, and actual answer.

Try it out today

You can start evaluating your Data Agents in just a few steps:

These resources provide everything you need to build, test, and iterate on your Data Agents using the Fabric SDK.

Gerelateerde blogberichten

Evaluate your Fabric Data Agents programmatically with the Python SDK (Preview)

januari 21, 2026 door Michal Bar

Turning questions into KQL queries just became part of Real-Time Dashboard tile editing experience, using Copilot. This new feature brings the power of AI directly into the tile editing workflow. When editing a tile, you’ll now see the Copilot assistant pane ready to help you turn natural language into actionable queries. Whether you’re new to … Continue reading “Introducing Copilot for Real-Time Dashboards: Write KQL with natural language”

januari 8, 2026 door Adi Eldar

What if generating embeddings in Eventhouse didn’t require an external endpoint, callout policies, throttling management, or per‑request costs? That’s exactly what slm_embeddings_fl() delivers: a new user-defined function (UDF) that generates text embeddings using local Small Language Models (SLMs) from within the Kusto Python sandbox, returning vectors that you can immediately use for semantic search, similarity … Continue reading “Create Embeddings in Fabric Eventhouse with built-in Small Language Models (SLMs)”