Shipping genAI with confidence using evaluations in Microsoft Foundry and .NET Aspire

Room 13Tue 27 Oct • 13:15–14:15AI & AgentsIntermediate

LLM apps and agents are non deterministic. Small changes to prompts, tools, or models can shift quality, safety, or behaviour in ways you often only notice after shipping. In this session we build a practical evaluation workflow using Microsoft Foundry Evaluations. We start in Foundry by evaluating an agent, a model, or a dataset using built in and custom evaluators. We focus on the evaluator types you actually need in production such as agent behaviour, quality, and safety, and show how to interpret results as a scorecard you can reuse for regression testing. We then briefly compare this to evaluation approaches teams may already use in MLflow. Next we add a developer loop: Foundry tracing is built on OpenTelemetry and sends telemetry to Azure Monitor Application Insights, giving you visibility into what really happened during an agent run. By exporting OpenTelemetry data to a .NET Aspire dashboard so you can inspect traces locally while iterating and testing. You will leave with a concrete way to combine evaluation gates with tracing so you can ship agents with more confidence than manual spot checks and playground demos.

About the speakers

Arne De Proft

Talk to me for GenAI Apps and cloud-native applications on Azure!

Laura Verghote

Laura is a Solution Engineer for AI and Applications at Microsoft. She helps organizations turn AI use cases into real, production‑ready solutions on Azure. Before joining Microsoft, she led the AI technology strategy for Public Sector Industries in Europe at AWS. Drawing on her background in cloud architecture, technical training and AI, Laura frequently takes the stage at tech events worldwide, where she breaks down modern AI with clarity and enthusiasm.