3 Weeks
·Cohort-based Course
Gain the key skills for designing effective agents and optimizing their performance. Dive deep into evaluations, tools, MCP, and RL.
This course is popular
50+ people enrolled last week.
3 Weeks
·Cohort-based Course
Gain the key skills for designing effective agents and optimizing their performance. Dive deep into evaluations, tools, MCP, and RL.
This course is popular
50+ people enrolled last week.
Course overview
Modern teams are under pressure to ship LLM agents that actually work in production, but it's difficult to cut through the noise and determine what actually works. There's an ever-growing number of "agent frameworks" promising great results, yet whose abstractions are opaque and difficult to optimize. Blog posts and one-off repos explain pieces of the puzzle, but AI is moving faster than ever.
Many engineers struggle to:
- Choose the right agent pattern for their use case
- Incorporate reliable tool use into agentic workflows.
- Evaluate where and why agents fail.
- Deploy agents which optimize intelligence, cost, and latency.
- Understand when and how to improve agent performance with finetuning and RL.
We keep hearing that 2025 is the Year of the Agents. Everyone’s talking about MCP and A2A and GRPO but no one seems to agree on when you should use them. Agentic interactions are becoming table-stakes consumer features, and investors are eager to see that you’re keeping up with the times.
Popular agent products like Deep Research, Devin, and Manus are built by companies who don’t want to share their tricks. Open-source alternatives often underperform or are complex to understand and adapt. Textbooks don’t exist yet, and sifting through every new paper is basically a full-time job. The latest API models can make for powerful agents, but costs get out of control quickly. Few people outside of the big AI labs have hands-on expertise in optimizing LLM agents using reinforcement learning. Will and Kyle happen to be two of them.
---
What to expect:
Beyond core principles, this course emphasizes hands-on practice for building production-ready agents, including:
- How to integrate MCP tools for popular services like Notion, Linear, and Slack into your agent applications
- How to build your own MCP servers for custom APIs and data
- How to scaffold and prompt agents for complex tool workflows
- How to evaluate and interactively refine agents with human-in-the-loop prompting
- How to use rule-based and LLM-based evaluations as reward signals for RL or synthetic data filtering
- How to train cost-effective agents which outperform models like o3 at a fraction of the cost using GRPO
The course will have 2x weekly lectures for 3 weeks, and we will have additional sections for office hours (see schedule below). Lecture videos will be available to watch asynchronously, and we'll also have a Discord chat for offline discussions.
Lectures will incorporate live coding/prompting with tools like Cursor, Claude Code, and Jupyter notebooks. Familiarity with Python, high-level AI/ML concepts, and LLM APIs is assumed.
You will also receive:
- $100 in Prime Intellect compute credits
- $100 in OpenPipe finetuning credits
- 1 year of Weights & Biases Pro ($600 value)
---
Course schedule:
Lecture 1 (6/17)
Agent Patterns and Principles
- ReAct, MemGPT, Agentic RAG, Multi-Agent (A2A)
- Hands-on demos with HF smolagents + other frameworks
Lecture 2 (6/19)
Model Context Protocol: When and Why
- Client/Server architectures for tool calls
- Approaches to auth
- Hands-on agentic MCP flow demos with Claude Desktop + Claude Code etc.
Lecture 3 (6/24)
Evals for Agents
- Extending eval techniques to agentic workflows
- Rule-based vs LLM-as-judge
- Filtering rollouts for synthetic data collection
- Brief demo of SFT on filtered rollouts
Lecture 4 (6/26)
Reinforcement Learning for Busy Engineers
- Crash course in RL fundamentals without the math
- GRPO vs DPO vs PPO
- Demo of GRPO for training a reasoning model (via HF TRL)
Lecture 5 (6/24)
Formulating Business Problems as RL Tasks
- How to think about reward/rubric design for real-world tasks
- Environment = Tasks + Tools + Verifiers
- Walkthrough of problem formulation for email search (via ART)
Lecture 6 (6/24)
Training Agents with GRPO
- Deep dive into RL experimentation for agent workflows (via ART)
- Broader ecosystem: other RL trainers + integrations with existing agent/tool frameworks (smolagents, MCP)
01
A Senior SWE turned AI Engineer at a Series D SaaS company who's eager to replace brittle pipelines with highly-optimized agents
02
A Founder + CTO of a Series A startup who wants to offer a best-in-class agentic AI experience to discerning customers
03
A Technical Director at a Fortune 500 company responsible for evaluating the best approaches and vendors for agentic AI solutions
Understand key concepts and patterns underlying modern LLM agents, and how to choose the right approach for your use case
Build portable, reliable tools for your agents and data using Model Context Protocol (MCP)
Implement your own Research agents, incorporating custom format instructions and data access
Learn the fundamentals of Reinforcement Learning (RL) and how it applies to agents
Formulate your agentic tasks as RL problems, with evaluation metrics that enable learning from reward feedback
Use RL algorithms like Group-Relative Policy Optimization (GRPO) to train agents which outperform frontier models on your tasks
A holistic understanding of modern principles and techniques for designing production-ready agents and optimizing them with RL
Live sessions
Learn directly from Will Brown & Kyle Corbitt in a real-time, interactive format.
Lifetime access
Go back to course content and recordings whenever you need to.
Community of peers
Stay accountable and share insights with like-minded professionals.
Certificate of completion
Share your new skills with your employer or on LinkedIn.
Credits and giveaways
$100 in Prime Intellect GPU credits, $100 in OpenPipe finetuning credits, 1 year of W&B Pro, and more.
Maven Guarantee
This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.
Production-Ready Agent Engineering: From MCP to RL
9 live sessions • 6 lessons
Jun
17
Lesson 1
Jun
19
Lesson 2
Jun
20
Jun
24
Lesson 3
Jun
26
Lesson 4
Jun
27
Jul
1
Lesson 5
Jul
2
Jul
3
Lesson 6
Will is a Research Lead at Prime Intellect, working on advancing the frontier of open-source agentic RL. He was previously a Machine Learning Researcher at Morgan Stanley and an Applied Scientist at AWS, and completed a PhD in Computer Science at Columbia University focused on multi-agent learning.
Kyle is the CTO of OpenPipe, the RL post-training company. Through OpenPipe, he has helped dozens of companies of all sizes train custom models optimized for their tasks. He has previous ML experience at Y Combinator and Google.
Join an upcoming cohort
Cohort 1
$1,000
Dates
Payment Deadline
4-6 hours per week
Tuesdays & Thursdays
5:00pm - 6:30pm EST
June 17 - July 3
2x weekly lectures and at least 1x weekly office hours with instructors
Weekly projects
2 hours per week
Take-home exercises for more hands-on exposure to the week's topics
Sign up to be the first to know about course updates.
Join an upcoming cohort
Cohort 1
$1,000
Dates
Payment Deadline