Microsoft Fabric Updates Blog

Create Embeddings in Fabric Eventhouse with built-in Small Language Models (SLMs)

What if generating embeddings in Eventhouse didn’t require an external endpoint, callout policies, throttling management, or per‑request costs?

That’s exactly what slm_embeddings_fl() delivers: a new user-defined function (UDF) that generates text embeddings using local Small Language Models (SLMs) from within the Kusto Python sandbox, returning vectors that you can immediately use for semantic search, similarity analysis, and broader NLP workflows across Fabric Eventhouse and Azure Data Explorer. This function currently supports jina-v2-small and e5-small-v2 models.

And the timing couldn’t be better. Vector similarity search in Eventhouse has been getting rapid adoption, making “Eventhouse as a vector store” a genuinely practical architecture for RAG, semantic exploration, and agent memory patterns.

Embeddings without callouts is a workflow unlock

Until now, generating embedding vectors in KQL query typically meant calling an external (Azure OpenAI) endpoint by the ai_embeddings() plugin. That’s powerful, but it introduces operational overhead and cost:

  • You must provision an Azure OpenAI resource and deploy an embedding model.
  • You’ll likely hit throttling at scale and need to handle batching, retries and timeouts.
  • Using AOAI models is not free.

slm_embeddings_fl() flips that script. The model runs locally using (Eventhouse python() plugin), so embedding becomes a natural part of your KQL transformation pipeline, particularly appealing for privacy-sensitive workflows, rapid prototyping, and high-volume embedding generation.

What is slm_embeddings_fl()?

slm_embeddings_fl() is a tabular UDF you invoke on any table-like expression. You tell it which column contains text, which column should receive the embedding vectors, and (optionally) batch/model configuration:

T | invoke slm_embeddings_fl(text_col, embeddings_col [, batch_size ] [, model_name ] [, prefix ])
  • batch_size defaults to 32
  • model_name defaults to ‘jina-v2-small’
  • prefix defaults to ‘query:’ (relevant only for the e5 model)

The python() plugin must be enabled on the Eventhouse; the UDF uses inline Python executed per node, so it scales naturally with your cluster for larger embedding jobs.

The two embedding models

  • e5-small-v2: retrieval-optimized “query/passsage” embeddings

E5 was trained with a simple but important convention: for retrieval-style tasks, prefix inputs with “query:” for the search term and “passage:” for the text corpus (otherwise quality can degrade). This is why slm_embeddings_fl() exposes a prefix parameter.

  • jina-v2-small: long-context embeddings

Jina supports long inputs (up to 8192 tokens), making it very compelling for long documents where chunking overhead is painful.

End-to-end: semantic search in pure KQL

Prerequisite: install slm_embeddings_fl() UDF in your KQL database as explained in its doc.

Embed your documents:

.set stored_query_result slm_e5_test_tbl <|
datatable(text:string) [
    "Machine learning models can process natural language efficiently.",
    "Python is a versatile programming language for data science.",
    "Azure Data Explorer provides fast analytics on large datasets.",
    "Embeddings convert text into numerical vector representations.",
    "Neural networks learn patterns from training data."
]
| extend text_embeddings=dynamic(null)
| invoke slm_embeddings_fl('text', 'text_embeddings', model_name='e5-small-v2', prefix='passage:') // prefix is optional, default is 'query:'

Embed a query, compute cosine similarity using KQL native series_cosine_similarity() function and retrieve the top matches:

let item = "Embeddings vectors are used for semantic search.";
let embedding = toscalar(print query=item | extend embedding=dynamic(null)
| invoke slm_embeddings_fl(text_col='query', embeddings_col='embedding', model_name='e5-small-v2', prefix='query:')
| project embedding);
stored_query_result('slm_e5_test_tbl')
| extend item, embedding
| extend similarity=series_cosine_similarity(embedding, text_embeddings, 1.0, 1.0)
| project item, text, similarity
| top 2 by similarity
itemTextsimilarity
Embeddings vectors are used for semantic search.Embeddings convert text into numerical vector representations.0.85286472533815
Embeddings vectors are used for semantic search.Machine learning models can process natural language efficiently.0.768244175222851

Scenarios unlocked for Fabric Eventhouse

  1. Instant semantic search over logs, tickets, traces, and text columns — Because embedding is now “just another KQL transform,” you can add semantic retrieval capabilities to nearly any dataset: error messages, system logs, incident descriptions, support tickets, app feedback, etc. Pair it with series_cosine_similarity() and you have a compact semantic search implementation inside your Eventhouse.
  2. Low-friction RAG retrieval store and agent memory — Eventhouse’s vector similarity performance makes it realistic to use it as a retrieval store at scale in RAG pipelines (especially when you follow recommended practices (Vector16 encoding, and shard distribution tuning, see Optimizing Vector Similarity Search on Azure Data Explorer).
  3. High-volume embedding generation without endpoint throttling — The ai_embeddings plugin documentation calls out throttling risks and recommends controlling request sizes, timeouts, and retries. Local SLM embeddings shift the constraint from remote rate limits to your cluster resources, which is easier to plan for high volumes.
  4. Long-document semantics with fewer chunks (Jina advantage) — If you’ve ever chunked large documents into dozens of 512-token blocks just to embed them, you know it can lead to more vectors, more storage, more compute, and slower retrieval. Jina v2 Small’s long-context capability (8192 tokens) can reduce chunk proliferation. That can translate directly into a smaller vector table and faster similarity search.
  5. Real-Time Vector Ingestion with Update Policy — The “killer app” for local embedding is combining it with Eventhouse’s Update Policy. You can configure a policy to automatically calculate embeddings as data is ingested. Thus, your data is automatically indexed and ready for semantic search as soon as it’s ingested.

slm_embeddings_fl() vs. ai_embeddings

Choose ai_embeddings plugin when you want:

  • Azure OpenAI managed embeddings with top quality LLM embedding models
  • Centralized model deployment and governance

But expect:

  • dependency on external connectivity and callout policies
  • identity configuration (impersonation or managed identity)
  • throttling management (batching/retries/timeouts)

Choose slm_embeddings_fl() when you want:

  • no callouts (simplicity, privacy, compliance)
  • minimal and predictable cost (no cost per embedding)
  • high throughput embedding jobs without AOAI rate constraints

The two options are complementary tools that let you choose the best operational and quality point per your scenario.

Summary

The addition of slm_embeddings_fl() makes semantic intelligence in Fabric Eventhouse dramatically simpler and more scalable. Whether you’re building RAG pipelines, powering semantic search, or enriching operational data with vector intelligence, local SLM‑based embeddings let you move faster with fewer dependencies and lower cost. Combined with native vector search and Eventhouse’s ingestion pipeline, this marks a significant step toward making AI‑powered analytics first‑class in the Eventhouse ecosystem.

Related blog posts

Create Embeddings in Fabric Eventhouse with built-in Small Language Models (SLMs)

December 16, 2025 by Alex Powers

As 2025 ends, we’re taking a moment to reflect on Microsoft Fabric’s second year in the market and the collective progress made alongside our community, customers, and partners. What began as a unified vision for data and AI has grown into a platform adopted by more than 28,000 organizations worldwide, anchored by OneLake and shaped … Continue reading “Microsoft Fabric 2025 holiday recap: Unified Data and AI Innovation”

December 15, 2025 by Will Thompson (HE/HIM)

At Ignite, we announced operations agents that helps create autonomous agents that monitor data, infer goals, and recommend actions. Soon, we will enable billing for these agents as the Preview period continues. Operations agents will use Fabric Capacity Units (CU) like any other Fabric features. In the Capacity Metrics App, you’ll find the following operations show … Continue reading “Understanding Operations Agent Capacity Consumption, Usage Reporting and Billing (Preview)”