10 Days
·Cohort-based Course
Build trust and proof for your AI product through pre-release testing, in-production monitoring, and analytics.
This course is popular
3 people enrolled last week.
10 Days
·Cohort-based Course
Build trust and proof for your AI product through pre-release testing, in-production monitoring, and analytics.
This course is popular
3 people enrolled last week.
Previously at
Course overview
Users are bombarded with AI-powered products every day, but most can’t tell which ones actually work.
In a market flooded with AI marketing claims, trust and proof are what set great products apart.
To stand out, you need data to back up your product’s credibility, not just a clever demo.
That is where AI Evaluations and Analytics come in.
47% of organizations have already faced AI gone wrong (McKinsey, 2025). Even leaders like OpenAI and Anthropic are self-insuring because traditional coverage cannot keep up.
AI systems are unpredictable. The real moat today is how well you evaluate and iteratively improve them.
👉 About This Course
You don’t need another coding course.
This is a methodology and framework course for product and data leaders shaping the next generation of AI products.
Learn a battle-tested framework to evaluate, monitor, analyze and improve your AI products — from testing first prototype to monitoring and iterating in production.
Make AI decisions backed by evidence, not intuition.
You’ll apply the AI Evals & Analytics Playbook — a proven system that combines human and automated tools for both scalability and reliability.
Through real-world examples, exercises, and demos, you’ll learn how to:
✅ Design evaluation workflows that align with your product across different development stages
✅ Set up pre-release evaluations and assessments to enable faster, data-driven iteration
✅ Build post-release monitoring and analytics that keep your AI systems reliable and accountable
By the end, you’ll walk away with an actionable, organization-ready evaluations and analytics plan for your own AI product.
🗓️ Course Plan
This accelerated program condenses four weeks of content into two for a fast, immersive learning experience.
Lesson 1: AI Evaluations - Why & What
- Getting buy-in on AI Evals
- Build your first AI Evals team
- Differentiate Eval, Analytics, and XAI
- Integrate AI evaluation into your product lifecycle
Exercise: Section 1 of your playbook
Lesson 2: How to Evaluate AI Products
- AI Evals & Analytics Framework Overview
- Human evaluation vs. Automated evaluation
- Use LLM-as-a-judge
- Design rubrics and metrics
Exercise: Section 2 of your playbook
Lesson 3: Experiment Design and AI Eval Tools
- Evaluator model selection
- Curate or generate test sets
- Experiment design
- Overview of popular eval tools
Exercise: Sections 2 & 3 of your playbook
Lesson 4: Monitoring & Analytics
- Handle user data
- Product monitoring practices for AI products
- Understand leading & lagging indicators
- Conduct post-launch review and analytics
Exercise: Sections 3 & 4 of your playbook
🎁 Founding Cohort Benefits (Oct 27 - Nov 9) Join our first cohort and help shape the course. Receive personalized guidance, in-depth discussions on your evaluation challenges, and access to our professional community of product leaders, data practitioners, and AI builders.
01
Product Leaders
Building AI products and need a clear playbook for evaluation frameworks, success metrics, and shipping with confidence.
02
Data Leaders
Redefining team scope and structure in the AI era, aligning evaluation, analytics, and accountability.
03
Seasoned Data Scientists
Leveling up with AI product skills: learn AI evals, LLM-as-a-judge, and design AI-specific performance metrics.
Walk Away with Execution-ready Playbook
4 sessions (2 hours each) over 2 weeks. Build your first AI Evals and Analytics playbook for your current product use case.
Build Your First AI Evals Team
Create clear ownership: who writes rubrics, validates metrics, and holds veto power. Keep Legal, Trust & Safety, and SMEs aligned without evaluation becoming a bottleneck.
Execute Comprehensive Pre-launch Testing
Run sniff tests, build quantitative evaluation pipelines, and design experiments that prove your AI beats the baseline. Know when to use human labels vs. LLM-as-a-judge.
Design Experiments for AI products
Handle stochastic outputs and subjective quality with proper experiment design. Choose the right methodology, set sample sizes, and define guardrails that protect business metrics.
Monitor Product and Catch Issues Early
Set up leading indicators (retry rates, confidence scores) and lagging metrics (CSAT, cost). Build escalation procedures and run structured post-launch reviews at 15, 30, and 60 days.
Live sessions
Learn directly from Stella Liu & Amy Chen in a real-time, interactive format.
Your First AI Evals and Analytics Playbook
Create your first AI Evals playbook and apply it on your current projects.
Glossary Sheet
Master the terminology with clear definitions and practical examples for every key concept in AI Evals.
Lifetime access
Go back to course content and recordings whenever you need to.
Community of peers
Stay accountable and share insights with like-minded professionals.
Certificate of completion
Share your new skills with your employer or on LinkedIn.
Maven Guarantee
This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.
4 live sessions • 5 lessons • 5 projects
Jan
17
Jan
18
Jan
24
Jan
25
Head of Applied Scientist, AI Evals
Stella Liu is an AI Evaluation practitioner and researcher, specializing in frameworks for large language models and AI-powered products.
Since 2023, she has led real-world AI evaluation projects in EdTech, where she established the first AI product evaluation framework for Higher Education and continues to advance research on the safe and responsible use of AI. Her work combines academic rigor with hands-on product experience, bringing proven evaluation methods into both enterprise and educational contexts.
Earlier in her career, Stella worked at Shopify and Carvana, where she built large-scale data-driven automation systems that powered product innovation and operational efficiency at scale.
Stella also writes an AI Evals newsletter on Substack: https://datasciencexai.substack.com/
Amy Chen is co-founder of AI Evals & Analytics and an AI partner helping companies with AI engineering, product development, and go-to-market strategy. With over 10 years of experience spanning data science, product management, ML engineering, and GTM, she brings versatile expertise to startups at every stage.
She is Top 1% Mentor in AI/ML Engineering and mentored over 300 data scientists and analysts on ADPList. She posts regularly about AI and data science on LinkedIn and has over 9.5k followers.
Understand What Is AI Evals and Why It Matters Now
Learn Core AI Evals Framework
Get Started with Your Evals Playbook
Get the free recording
Why Product Leadership Needs AI Evals Now
Build Your First AI Evals Team and Culture
Design Your AI Evals Playbook
Get the free recording
A Preview of Our AI Evals Playbook and Framework
Product-focused. Immediately executable AI Evals playbook for your product.
Get this free resource
4-5 hours per week
Live Interactive Lectures + Workshops
Intensive sessions with frameworks, tactics, and exercises.
Join an upcoming cohort
Cohort 2
$2,250
Dates
Payment Deadline