Paid Course · Level 4

Self-QA

Level 4 · Course 25

Your agent can sound certain and still be wrong. It can validate its own output, defend a weak idea because you suggested it, or ship something technically correct that a human would be embarrassed to show. The problem is not intelligence. The problem is self-evaluation — the same engine producing the output cannot reliably judge it.

This course installs a QA system that works around the self-evaluation blind spot. Your agent learns five questions to ask before anything leaves the machine. More importantly, it learns when to route work to external checks: a fresh model that doesn't share the current context window, a helper agent with a different brain, an adversarial evaluator specifically tasked with finding flaws, or you — the human — for anything that would embarrass a person to ship.

The techniques come from real builds. Focus groups that validated courses but missed human-readability failures. Connor QA that caught bundle guard bugs no agent saw. Adversarial evaluators that found structural gaps in architecture decisions. Blind attribution tests that revealed we were inside our own validation bubble. After this course, "looks good to me" is no longer enough — because it never was.

The Five Questions

Every piece of output that leaves your agent goes through five questions before it ships. Not as a checklist you skim — as a gate with specific criteria. Each question has an example of what catching it looks like.

Q1 — Did I make this up?

Check every factual claim against a source you can name. If you can't name the source, flag it.

Example: an agent claims "research shows 60–70% of context is effective." Where's the citation? If there isn't one, the claim is weak. Flag it or remove it — don't let it ship because it sounds credible.

Q2 — Am I validating my own output?

Did the same agent that wrote this also judge it? Route to a fresh evaluator before shipping.

Example: an agent drafts 10 course pages, then "reviews" them and says they're good. Same eyes, same blind spots. A fresh agent reads it cold and finds the problems the loaded agent couldn't see — because it was too deep in what it already built.

Q3 — Would a human be embarrassed to ship this?

The slop check. AI clichés. Dark neon. Template voice. Over-polished filler. If a human would cringe, fix it first.

Example: "Unlock your AI potential with our revolutionary framework." Would you say that out loud to someone you respect? No. Rewrite it until you would.

Q4 — Does this hold at fresh context?

Open a new session. Give the output to the agent cold. Same verdict? If not, context contamination is at play.

Example: an agent at 300K tokens says a course page is ready. Fresh agent reads it cold and finds three problems — a missing section, a vague closing, a claim that doesn't follow from what came before. The loaded agent was pattern-matching to what it already built, not reading the page as a reader would.

Q5 — Did I route this to the right check?

Not everything needs all five checks. Route to the right evaluator, not the easiest one.

Example: a code deploy needs a build tool QA pass. A course page needs a human readability gate. A market analysis needs a focus group. Using the wrong check is the same as skipping a check — you close the gate but nothing is actually verified.

The Routing Decision

After the five questions, the agent knows what type of work it just produced. The routing table tells it where to send it next.

Output type	Route to
Code / build artifact	Build tool QA pass
Content / course page	Human readability gate
Product idea	Focus group test
Architecture change	Adversarial evaluator
Config change	Shepherd protocol

You decide: run these five questions against the last thing your agent shipped. Which question would have caught something?

Your Agent PDF

Your agent executes the PDF. You read the page. No copying. No manual setup.

Download PDF — Course 25

Questions? [email protected] — My Courses