Staging environment

Production-Ready Agent Engineering: From MCP to RL

3 Weeks

Cohort-based Course

Gain the key skills for designing effective agents and optimizing their performance. Dive deep into evaluations, tools, MCP, and RL.

Production-Ready Agent Engineering: From MCP to RL

3 Weeks

Cohort-based Course

Gain the key skills for designing effective agents and optimizing their performance. Dive deep into evaluations, tools, MCP, and RL.

Hosted by

Will Brown and Kyle Corbitt

Researchers + Founders at the bleeding edge of Agentic RL

Will Brown and Kyle Corbitt

Researchers + Founders at the bleeding edge of Agentic RL

Course overview

Why should you take this course?

Modern teams are under pressure to ship LLM agents that actually work in production, but it's difficult to cut through the noise and determine what actually works. There's an ever-growing number of "agent frameworks" promising great results, yet whose abstractions are opaque and difficult to optimize. Blog posts and one-off repos explain pieces of the puzzle, but AI is moving faster than ever.

Many engineers struggle to:

- Choose the right agent pattern for their use case

- Incorporate reliable tool use into agentic workflows.

- Evaluate where and why agents fail.

- Deploy agents which optimize intelligence, cost, and latency.

- Understand when and how to improve agent performance with finetuning and RL.

We keep hearing that 2025 is the Year of the Agents. Everyone’s talking about MCP and A2A and GRPO but no one seems to agree on when you should use them. Agentic interactions are becoming table-stakes consumer features, and investors are eager to see that you’re keeping up with the times.

Popular agent products like Deep Research, Devin, and Manus are built by companies who don’t want to share their tricks. Open-source alternatives often underperform or are complex to understand and adapt. Textbooks don’t exist yet, and sifting through every new paper is basically a full-time job. The latest API models can make for powerful agents, but costs get out of control quickly. Few people outside of the big AI labs have hands-on expertise in optimizing LLM agents using reinforcement learning. Will and Kyle happen to be two of them.

---

What to expect:

Beyond core principles, this course emphasizes hands-on practice for building production-ready agents, including:

- How to integrate MCP tools for popular services like Notion, Linear, and Slack into your agent applications

- How to build your own MCP servers for custom APIs and data

- How to scaffold and prompt agents for complex tool workflows

- How to evaluate and interactively refine agents with human-in-the-loop prompting

- How to use rule-based and LLM-based evaluations as reward signals for RL or synthetic data filtering

- How to train cost-effective agents which outperform models like o3 at a fraction of the cost using GRPO

The course will have 2x weekly lectures for 3 weeks, and we will have additional sections for office hours (see schedule below). Lecture videos will be available to watch asynchronously, and we'll also have a Discord chat for offline discussions.

Lectures will incorporate live coding/prompting with tools like Cursor, Claude Code, and Jupyter notebooks. Familiarity with Python, high-level AI/ML concepts, and LLM APIs is assumed.

You will also receive:

- $100 in Prime Intellect compute credits

- $100 in OpenPipe finetuning credits

- 1 year of Weights & Biases Pro ($600 value)

---

Course schedule:

Lecture 1 (6/17)

Agent Patterns and Principles

- ReAct, MemGPT, Agentic RAG, Multi-Agent (A2A)

- Hands-on demos with HF smolagents + other frameworks

Lecture 2 (6/19)

Model Context Protocol: When and Why

- Client/Server architectures for tool calls

- Approaches to auth

- Hands-on agentic MCP flow demos with Claude Desktop + Claude Code etc.

Lecture 3 (6/24)

Evals for Agents

- Extending eval techniques to agentic workflows

- Rule-based vs LLM-as-judge

- Filtering rollouts for synthetic data collection

- Brief demo of SFT on filtered rollouts

Lecture 4 (6/26)

Reinforcement Learning for Busy Engineers

- Crash course in RL fundamentals without the math

- GRPO vs DPO vs PPO

- Demo of GRPO for training a reasoning model (via HF TRL)

Lecture 5 (6/24)

Formulating Business Problems as RL Tasks

- How to think about reward/rubric design for real-world tasks

- Environment = Tasks + Tools + Verifiers

- Walkthrough of problem formulation for email search (via ART)

Lecture 6 (6/24)

Training Agents with GRPO

- Deep dive into RL experimentation for agent workflows (via ART)

- Broader ecosystem: other RL trainers + integrations with existing agent/tool frameworks (smolagents, MCP)

This is the course for you if you're:

A Senior SWE turned AI Engineer at a Series D SaaS company who's eager to replace brittle pipelines with highly-optimized agents

A Founder + CTO of a Series A startup who wants to offer a best-in-class agentic AI experience to discerning customers

A Technical Director at a Fortune 500 company responsible for evaluating the best approaches and vendors for agentic AI solutions

A Senior SWE turned AI Engineer at a Series D SaaS company who's eager to replace brittle pipelines with highly-optimized agents

A Founder + CTO of a Series A startup who wants to offer a best-in-class agentic AI experience to discerning customers

A Technical Director at a Fortune 500 company responsible for evaluating the best approaches and vendors for agentic AI solutions

What you’ll get out of this course

Understand key concepts and patterns underlying modern LLM agents, and how to choose the right approach for your use case

Build portable, reliable tools for your agents and data using Model Context Protocol (MCP)

Implement your own Research agents, incorporating custom format instructions and data access

Learn the fundamentals of Reinforcement Learning (RL) and how it applies to agents

Formulate your agentic tasks as RL problems, with evaluation metrics that enable learning from reward feedback

Use RL algorithms like Group-Relative Policy Optimization (GRPO) to train agents which outperform frontier models on your tasks

A holistic understanding of modern principles and techniques for designing production-ready agents and optimizing them with RL

What’s included

Live sessions

Learn directly from Will Brown & Kyle Corbitt in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Credits and giveaways

$100 in Prime Intellect GPU credits, $100 in OpenPipe finetuning credits, 1 year of W&B Pro, and more.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

11 live sessions • 6 lessons

Week 1

Jun 16—Jun 22

Lesson 1 - Agent Patterns and Principles (Will)

Jun
17
Lesson 1
Tue 6/179:00 PM—10:30 PM (UTC)

1 more item

Lesson 2 - MCP + Production-Grade Agents (Will)

Jun
19
Lesson 2
Thu 6/197:00 PM—8:30 PM (UTC)
Jun
20
Lesson 2 (Repeat)
Fri 6/209:00 PM—10:30 PM (UTC)

2 more items

Jun

Office Hours (Will)

Fri 6/207:00 PM—8:00 PM (UTC)

Week 2

Jun 23—Jun 29

Lesson 3 - Agent Evals and Optimization (Will)

Jun
24
Lesson 3
Tue 6/249:00 PM—10:30 PM (UTC)

1 more item

Lesson 4 - Reinforcement Learning for Busy Engineers (Will + Kyle)

Jun
26
Lesson 4
Thu 6/269:00 PM—10:30 PM (UTC)

1 more item

Jun

Office Hours (Kyle)

Fri 6/277:00 PM—8:00 PM (UTC)

Week 3

Jun 30—Jul 4

Lesson 5 - Formulating Business Problems as RL Tasks (Kyle)

Jul
1
Lesson 5
Tue 7/19:00 PM—10:30 PM (UTC)

1 more item

GRPO Implementation Details (Will) [Optional]

Jul
2
Bonus Lesson: GRPO Details
Wed 7/28:00 PM—9:30 PM (UTC)

Jul

Office Hours (Will)

Wed 7/27:00 PM—8:00 PM (UTC)

Lesson 6 - Training Agents with GRPO (Kyle)

Jul
3
Lesson 6
Thu 7/39:00 PM—10:30 PM (UTC)

1 more item

What students are saying

Meet your instructors

Will Brown

Will is a Research Lead at Prime Intellect, working on advancing the frontier of open-source agentic RL. He was previously a Machine Learning Researcher at Morgan Stanley and an Applied Scientist at AWS, and completed a PhD in Computer Science at Columbia University focused on multi-agent learning.

Kyle Corbitt

Kyle is the CTO of OpenPipe, the RL post-training company. Through OpenPipe, he has helped dozens of companies of all sizes train custom models optimized for their tasks. He has previous ML experience at Y Combinator and Google.

Be the first to know about upcoming cohorts