Create Embeddings in Fabric Eventhouse with built-in Small Language Models (SLMs)
What if generating embeddings in Eventhouse didn’t require an external endpoint, callout policies, throttling management, or per‑request costs?
That’s exactly what slm_embeddings_fl() delivers: a new user-defined function (UDF) that generates text embeddings using local Small Language Models (SLMs) from within the Kusto Python sandbox, returning vectors that you can immediately use for semantic search, similarity analysis, and broader NLP workflows across Fabric Eventhouse and Azure Data Explorer. This function currently supports jina-v2-small and e5-small-v2 models.
And the timing couldn’t be better. Vector similarity search in Eventhouse has been getting rapid adoption, making “Eventhouse as a vector store” a genuinely practical architecture for RAG, semantic exploration, and agent memory patterns.
Embeddings without callouts is a workflow unlock
Until now, generating embedding vectors in KQL query typically meant calling an external (Azure OpenAI) endpoint by the ai_embeddings() plugin. That’s powerful, but it introduces operational overhead and cost:
- You must provision an Azure OpenAI resource and deploy an embedding model.
- You’ll likely hit throttling at scale and need to handle batching, retries and timeouts.
- Using AOAI models is not free.
slm_embeddings_fl() flips that script. The model runs locally using (Eventhouse python() plugin), so embedding becomes a natural part of your KQL transformation pipeline, particularly appealing for privacy-sensitive workflows, rapid prototyping, and high-volume embedding generation.
What is slm_embeddings_fl()?
slm_embeddings_fl() is a tabular UDF you invoke on any table-like expression. You tell it which column contains text, which column should receive the embedding vectors, and (optionally) batch/model configuration:
T | invoke slm_embeddings_fl(text_col, embeddings_col [, batch_size ] [, model_name ] [, prefix ])
- batch_size defaults to 32
- model_name defaults to ‘jina-v2-small’
- prefix defaults to ‘query:’ (relevant only for the e5 model)
The python() plugin must be enabled on the Eventhouse; the UDF uses inline Python executed per node, so it scales naturally with your cluster for larger embedding jobs.
The two embedding models
- e5-small-v2: retrieval-optimized “query/passsage” embeddings
E5 was trained with a simple but important convention: for retrieval-style tasks, prefix inputs with “query:” for the search term and “passage:” for the text corpus (otherwise quality can degrade). This is why slm_embeddings_fl() exposes a prefix parameter.
- jina-v2-small: long-context embeddings
Jina supports long inputs (up to 8192 tokens), making it very compelling for long documents where chunking overhead is painful.
End-to-end: semantic search in pure KQL
Prerequisite: install slm_embeddings_fl() UDF in your KQL database as explained in its doc.
Embed your documents:
.set stored_query_result slm_e5_test_tbl <|
datatable(text:string) [
"Machine learning models can process natural language efficiently.",
"Python is a versatile programming language for data science.",
"Azure Data Explorer provides fast analytics on large datasets.",
"Embeddings convert text into numerical vector representations.",
"Neural networks learn patterns from training data."
]
| extend text_embeddings=dynamic(null)
| invoke slm_embeddings_fl('text', 'text_embeddings', model_name='e5-small-v2', prefix='passage:') // prefix is optional, default is 'query:'
Embed a query, compute cosine similarity using KQL native series_cosine_similarity() function and retrieve the top matches:
let item = "Embeddings vectors are used for semantic search.";
let embedding = toscalar(print query=item | extend embedding=dynamic(null)
| invoke slm_embeddings_fl(text_col='query', embeddings_col='embedding', model_name='e5-small-v2', prefix='query:')
| project embedding);
stored_query_result('slm_e5_test_tbl')
| extend item, embedding
| extend similarity=series_cosine_similarity(embedding, text_embeddings, 1.0, 1.0)
| project item, text, similarity
| top 2 by similarity
| item | Text | similarity |
| Embeddings vectors are used for semantic search. | Embeddings convert text into numerical vector representations. | 0.85286472533815 |
| Embeddings vectors are used for semantic search. | Machine learning models can process natural language efficiently. | 0.768244175222851 |
Scenarios unlocked for Fabric Eventhouse
- Instant semantic search over logs, tickets, traces, and text columns — Because embedding is now “just another KQL transform,” you can add semantic retrieval capabilities to nearly any dataset: error messages, system logs, incident descriptions, support tickets, app feedback, etc. Pair it with series_cosine_similarity() and you have a compact semantic search implementation inside your Eventhouse.
- Low-friction RAG retrieval store and agent memory — Eventhouse’s vector similarity performance makes it realistic to use it as a retrieval store at scale in RAG pipelines (especially when you follow recommended practices (Vector16 encoding, and shard distribution tuning, see Optimizing Vector Similarity Search on Azure Data Explorer).
- High-volume embedding generation without endpoint throttling — The ai_embeddings plugin documentation calls out throttling risks and recommends controlling request sizes, timeouts, and retries. Local SLM embeddings shift the constraint from remote rate limits to your cluster resources, which is easier to plan for high volumes.
- Long-document semantics with fewer chunks (Jina advantage) — If you’ve ever chunked large documents into dozens of 512-token blocks just to embed them, you know it can lead to more vectors, more storage, more compute, and slower retrieval. Jina v2 Small’s long-context capability (8192 tokens) can reduce chunk proliferation. That can translate directly into a smaller vector table and faster similarity search.
- Real-Time Vector Ingestion with Update Policy — The “killer app” for local embedding is combining it with Eventhouse’s Update Policy. You can configure a policy to automatically calculate embeddings as data is ingested. Thus, your data is automatically indexed and ready for semantic search as soon as it’s ingested.
slm_embeddings_fl() vs. ai_embeddings
Choose ai_embeddings plugin when you want:
- Azure OpenAI managed embeddings with top quality LLM embedding models
- Centralized model deployment and governance
But expect:
- dependency on external connectivity and callout policies
- identity configuration (impersonation or managed identity)
- throttling management (batching/retries/timeouts)
Choose slm_embeddings_fl() when you want:
- no callouts (simplicity, privacy, compliance)
- minimal and predictable cost (no cost per embedding)
- high throughput embedding jobs without AOAI rate constraints
The two options are complementary tools that let you choose the best operational and quality point per your scenario.
Summary
The addition of slm_embeddings_fl() makes semantic intelligence in Fabric Eventhouse dramatically simpler and more scalable. Whether you’re building RAG pipelines, powering semantic search, or enriching operational data with vector intelligence, local SLM‑based embeddings let you move faster with fewer dependencies and lower cost. Combined with native vector search and Eventhouse’s ingestion pipeline, this marks a significant step toward making AI‑powered analytics first‑class in the Eventhouse ecosystem.