Evaluate AI agents with Confidence

Hosted by Mahesh Yadav

786 students

In this video

What you'll learn

Assess LLM Suitability for Your Agentic AI

Benchmark your AI’s performance, adaptability, and decision-making quality.

Design a Manual Evaluation Framework for AI Agents

Implement a structured review process for agentic AI performance.

Automate AI Evaluation with Observability & LLM Judges

Use LLMs as autonomous “judges” to scale AI performance assessments.

Why this topic matters

AI agents are only as good as their decision-making, and without proper evaluation, they often fail in real-world applications. Large language models can behave unpredictably, making it essential to have a structured evaluation framework that ensures reliability, adaptability, and performance. You'll learn most of these things with this insightful session.

You'll learn from

Mahesh Yadav

Gen AI product lead at Google, Former at Meta AI, AWS AI, 10k+ AI PM Students

Mahesh has 20 years of experience in building products at Meta, Microsoft and AWS AI teams. Mahesh has worked in all layers of the AI stack from AI chips to LLM and has a deep understanding of how using AI agents companies ship value to customers. His work on AI has been featured in the Nvidia GTC conference, Microsoft Build, and Meta blogs.

His mentorship has helped various students in building Real time products & Career in Agentic AI PM space.

Previously at