Staging environment

AI Evals and Analytics Playbook

New
·

10 Days

·

Cohort-based Course

Build trust and proof for your AI product through pre-release testing, in-production monitoring, and analytics.

This course is popular

3 people enrolled last week.

Previously at

Shopify.com
Carvana
Harvard University
University of Washington
UCLA

Course overview

Evals & Analytics: The New Core Moat of Great AI Products

Users are bombarded with AI-powered products every day, but most can’t tell which ones actually work.

In a market flooded with AI marketing claims, trust and proof are what set great products apart.


To stand out, you need data to back up your product’s credibility, not just a clever demo.

That is where AI Evaluations and Analytics come in.


47% of organizations have already faced AI gone wrong (McKinsey, 2025). Even leaders like OpenAI and Anthropic are self-insuring because traditional coverage cannot keep up.

AI systems are unpredictable. The real moat today is how well you evaluate and iteratively improve them.


👉 About This Course


You don’t need another coding course.

This is a methodology and framework course for product and data leaders shaping the next generation of AI products.


Learn a battle-tested framework to evaluate, monitor, analyze and improve your AI products — from testing first prototype to monitoring and iterating in production.

Make AI decisions backed by evidence, not intuition.

You’ll apply the AI Evals & Analytics Playbook — a proven system that combines human and automated tools for both scalability and reliability.


Through real-world examples, exercises, and demos, you’ll learn how to:

✅ Design evaluation workflows that align with your product across different development stages

✅ Set up pre-release evaluations and assessments to enable faster, data-driven iteration

✅ Build post-release monitoring and analytics that keep your AI systems reliable and accountable


By the end, you’ll walk away with an actionable, organization-ready evaluations and analytics plan for your own AI product.


🗓️ Course Plan


This accelerated program condenses four weeks of content into two for a fast, immersive learning experience.


Lesson 1: AI Evaluations - Why & What

- Getting buy-in on AI Evals

- Build your first AI Evals team

- Differentiate Eval, Analytics, and XAI

- Integrate AI evaluation into your product lifecycle

Exercise: Section 1 of your playbook


Lesson 2: How to Evaluate AI Products

- AI Evals & Analytics Framework Overview

- Human evaluation vs. Automated evaluation

- Use LLM-as-a-judge

- Design rubrics and metrics

Exercise: Section 2 of your playbook


Lesson 3: Experiment Design and AI Eval Tools

- Evaluator model selection

- Curate or generate test sets

- Experiment design

- Overview of popular eval tools

Exercise: Sections 2 & 3 of your playbook


Lesson 4: Monitoring & Analytics

- Handle user data

- Product monitoring practices for AI products

- Understand leading & lagging indicators

- Conduct post-launch review and analytics

Exercise: Sections 3 & 4 of your playbook



🎁 Founding Cohort Benefits (Oct 27 - Nov 9) Join our first cohort and help shape the course. Receive personalized guidance, in-depth discussions on your evaluation challenges, and access to our professional community of product leaders, data practitioners, and AI builders.

Who is this course for

01

Product Leaders

Building AI products and need a clear playbook for evaluation frameworks, success metrics, and shipping with confidence.

02

Data Leaders

Redefining team scope and structure in the AI era, aligning evaluation, analytics, and accountability.

03

Seasoned Data Scientists

Leveling up with AI product skills: learn AI evals, LLM-as-a-judge, and design AI-specific performance metrics.

What you’ll get out of this course

Walk Away with Execution-ready Playbook

4 sessions (2 hours each) over 2 weeks. Build your first AI Evals and Analytics playbook for your current product use case.

Build Your First AI Evals Team

Create clear ownership: who writes rubrics, validates metrics, and holds veto power. Keep Legal, Trust & Safety, and SMEs aligned without evaluation becoming a bottleneck.

Execute Comprehensive Pre-launch Testing

Run sniff tests, build quantitative evaluation pipelines, and design experiments that prove your AI beats the baseline. Know when to use human labels vs. LLM-as-a-judge.

Design Experiments for AI products

Handle stochastic outputs and subjective quality with proper experiment design. Choose the right methodology, set sample sizes, and define guardrails that protect business metrics.

Monitor Product and Catch Issues Early

Set up leading indicators (retry rates, confidence scores) and lagging metrics (CSAT, cost). Build escalation procedures and run structured post-launch reviews at 15, 30, and 60 days.

What’s included

Live sessions

Learn directly from Stella Liu & Amy Chen in a real-time, interactive format.

Your First AI Evals and Analytics Playbook

Create your first AI Evals playbook and apply it on your current projects.

Glossary Sheet

Master the terminology with clear definitions and practical examples for every key concept in AI Evals.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

4 live sessions • 5 lessons • 5 projects

Week 1

Jan 16—Jan 18

    Getting Ready for Session 1

    2 items

    Jan

    17

    Session 1 - The AI Evaluation Framework

    Sat 1/177:00 PM—9:00 PM (UTC)

    Getting Ready for Session 2

    2 items

    Jan

    18

    Session 2 - How to Evaluate AI Products

    Sun 1/187:00 PM—9:00 PM (UTC)

Week 2

Jan 19—Jan 25

    Getting Ready for Session 3

    2 items

    Jan

    24

    Session 3 - Experiment Design and AI Evaluation Tools

    Sat 1/247:00 PM—9:00 PM (UTC)

    Getting Ready for Session 4

    2 items

    Jan

    25

    Session 4 - Product Monitoring & Analytics

    Sun 1/257:00 PM—9:00 PM (UTC)

Post-course

    Submit your Playbook

    1 item

Bonus

    AI Evaluation and Analytics Glossary

    0 items

    1:1 Playbook Feedback

    0 items

    2-Year All-Access Membership to the DSxAI Newsletter

    1 item

Meet your instructors

Stella Liu

Stella Liu

Head of Applied Scientist, AI Evals

Stella Liu is an AI Evaluation practitioner and researcher, specializing in frameworks for large language models and AI-powered products.


Since 2023, she has led real-world AI evaluation projects in EdTech, where she established the first AI product evaluation framework for Higher Education and continues to advance research on the safe and responsible use of AI. Her work combines academic rigor with hands-on product experience, bringing proven evaluation methods into both enterprise and educational contexts.


Earlier in her career, Stella worked at Shopify and Carvana, where she built large-scale data-driven automation systems that powered product innovation and operational efficiency at scale.


Stella also writes an AI Evals newsletter on Substack: https://datasciencexai.substack.com/


Follow Stella on LinkedIn

Shopify
Carvana
Harvard University
Arizona State University
Amy Chen

Amy Chen

Amy Chen is co-founder of AI Evals & Analytics and an AI partner helping companies with AI engineering, product development, and go-to-market strategy. With over 10 years of experience spanning data science, product management, ML engineering, and GTM, she brings versatile expertise to startups at every stage.


She is Top 1% Mentor in AI/ML Engineering and mentored over 300 data scientists and analysts on ADPList. She posts regularly about AI and data science on LinkedIn and has over 9.5k followers.


Follow Amy on LinkedIn

System1
Uptake
Seasalt.ai
UCLA
University of Washington
Build Your AI Evals & Analytics Playbook

Build Your AI Evals & Analytics Playbook

Understand What Is AI Evals and Why It Matters Now

It's part of your PRD. AI Evals helps you ship products confidently and safely.

Learn Core AI Evals Framework

Learn how to define metrics, test model quality, and align stakeholders from Legal to Trust & Safety.

Get Started with Your Evals Playbook

Understand how to start your first evals plan and start building with best Evals practice

Get the free recording

Bring AI Evals to Product Leadership

Bring AI Evals to Product Leadership

Why Product Leadership Needs AI Evals Now

Most teams can ship AI fast—but few know how to evaluate it. Learn why AI Evals is now core to product strategy.

Build Your First AI Evals Team and Culture

Learn how to bring leadership, product, engineering, and operations together through actionable AI Evals practices.

Design Your AI Evals Playbook

Learn a practical framework to integrate evals into your product quality, performance, safety, and UX.

Get the free recording

Free resource

A Preview of Our AI Evals Playbook and Framework

Product-focused. Immediately executable AI Evals playbook for your product.

Get this free resource

Course schedule

4-5 hours per week

  • Live Interactive Lectures + Workshops

    Intensive sessions with frameworks, tactics, and exercises.


A pattern of wavy dots

Join an upcoming cohort

AI Evals and Analytics Playbook

Cohort 2

$2,250

Dates

Jan 16—26, 2026

Payment Deadline

Jan 11, 2026
Get reimbursed

$2,250

USD

10 Days