Serve real-time predictions seamlessly with ML model endpoints
Fabric offers a wide variety of data-science capabilities, from automated machine learning with FLAML to batch inferencing with the SynapseML PREDICT function. We’re pleased to announce that ML models can now serve real-time predictions from secure, scalable, and easy-to-use online endpoints. In addition to generating batch predictions in Spark, you can use endpoints to bring the predictive power of your ML models to other Fabric solutions and custom applications.
ML model endpoints expand the reach of your data-science solutions while drastically simplifying the deployment process; let’s take a closer look.
Real-time serving with a single call or click
In Fabric, endpoints are available as built-in properties of most ML models, requiring no setup to kick off fully managed deployments. Models have dedicated endpoints for individual versions and a customizable default endpoint, which serves predictions from a version that you can choose and change. You can activate endpoints with a single call to our REST API or a single click in the Fabric interface; we’ll handle the rest.

Auto-scaling enabled out of the box
Behind the scenes, Fabric manages the container infrastructure to host your model, dynamically adjusting the resources allocated to each endpoint based on incoming traffic. During periods without traffic, we’ll automatically scale down resource usage to zero, saving you Fabric capacity. You can customize this behavior and more programmatically with our API or directly from the Fabric interface, by navigating to your model’s settings.

Sample predictions for sanity testing
Before serving predictions to other Fabric experiences or custom applications, you can preview sample predictions without leaving the product. A low-code interface enables instant testing, letting you key in requests with form fields or a JSON editor and examine responses in real time.

Next steps
Learn more about ML model endpoints from our Serve real-time predictions with ML model endpoints (Preview) or API reference documentation. Before getting started, please make note of a few prerequisites:
- Your administrator needs to enable the tenant switch for ML model endpoints in the Fabric admin portal in order for you to use the feature.
- Your ML model must be registered with a scalar-based schema and no dependencies on private or internal packages in order to support real-time endpoints.
We can’t wait for you to try out ML model endpoints in Fabric. Let us know what you think by submitting feedback on Fabric Ideas or joining the conversation on the Fabric Community.