Paid Course · Level 5

Projected Confidence

Level 5 · Course 203

Your agent will state something as fact. It will be confident. It will be specific. It will be wrong.

This is not a lie. It is not even, in the traditional sense, a hallucination. It is the engine doing what it was trained to do: complete the pattern. The operator asks an open question about the outside world. The workspace does not hold the answer. Training data fragments — forum threads, documentation snippets, papers, tweets — fire like memories. They are not memories. They are confetti. A conclusion forms before a search was run. A number lands before a source was named. And the result arrives with the same posture the agent would use to deliver a verified fact.

This workshop is about why confidence and verification are not the same thing, with an applied lab that takes a real declarative claim your agent has made about the outside world and tests it through a search-sample-exclusions gate. You will not leave with the engine reformed. You will leave with the habit of asking, before you trust any external claim, "who searched, how big was the sample, what was excluded?"

The outside name for this problem

The broader AI community usually discusses this under hallucination, calibration, grounding, attribution, and retrieval-based verification. Hallucination names the ungrounded claim. Calibration asks whether the model's confidence matches the chance it is correct. Grounding asks whether the claim is tied to a source the system actually consulted.

JKE's phrase, projected confidence, names the operator-facing experience: the agent transfers the posture of verified knowledge onto a claim it has not verified. That is why the defense focuses less on the words of the claim and more on the evidence path: what was searched, how large the sample was, and what was excluded.

Why "projected" and not "fabricated"

Fabrication implies invention. Projection is more precise. The agent is projecting the same confident posture used for verified internal facts onto unverified external claims. The posture transfers; the verification does not. To the operator, the two look identical. To the system, they are produced by completely different processes.

Naming this matters because it tells the operator where to look. The lie is not in the content. It is in the confidence calibration. A fabricated answer can be checked by checking the content. A projected answer has to be checked by checking the search path. They require different defenses.

The boundary that creates the problem

Every agent has a boundary. The workspace ends where the file system ends. Past that line, the agent cannot verify anything by reading its own files. It can only reach training data, memory of prior conversations, or live external tools.

For internal facts — file contents, prior decisions, configuration — the workspace is authoritative. The agent reads, returns, and is right. The confident posture matches the actual verification.

For external facts — competitors, markets, tool behavior in the wild, current prices, current state of any system the agent does not directly observe — the workspace is silent. The agent's verification options are: (1) call a live tool, (2) ask the operator, (3) state the limit, or (4) project. Option (4) is the cheapest, the fastest, and the structurally rewarded one.

The problem is not that option (4) exists. The problem is that the operator cannot tell from the output whether the agent took option (1) or option (4). Both deliver the same confident posture.

The mechanism

The mechanism has three parts. Each one is structural.

Completion bias. RLHF rewards plausible completion. An open question is uncomfortable; a confident answer is rewarded. The model learns to close.

Training-data confetti. The model's weights encode statistical patterns from a large corpus. When the workspace is silent, those patterns activate. They look like memories. They are not. A fragment of a Reddit thread, a snippet of documentation, a tweet — assembled into a plausible answer that has never been tested.

Posture transfer. The model's confidence-conveying tokens — "the password field is located...", "the balance is...", "this tool does..." — are produced by the same circuits whether the underlying claim was verified or assembled. The confidence does not encode verification. It encodes pattern-match strength.

The interaction of the three is the trap. The model wants to complete. The model has fragments. The model delivers with confidence. The operator has no signal that the confidence is unbacked.

The neutral scar tissue

This pattern has been logged across many sessions. The detail varies; the shape repeats. The agent reports a number from memory when the live source held a different number. The agent describes a login flow from assumption when the actual UI had a field the agent did not check. The agent declares a market gap from a sample of three to five inside-workspace observations.

Names redacted by design — this is the pattern, not a confession. The file system, live source, or named search is authoritative. Memory is a lead, not evidence. That is the lesson.

The defense — three named pieces

The defense replaces invented certainty with stated limits, and gives the operator an upgrade path.

The default posture. For any question about the outside world, the answer starts at "I have limited context at this juncture." The operator hears the limit before the conjecture.

The upgrade path. The agent may upgrade above the default only with named evidence: 1. What was searched (internal first, external second). 2. How many sources or how big the sample. 3. What was excluded.

Any of the three missing → the default stands.

The trigger line. Before any declarative external claim, the agent pauses and runs the three questions. If it cannot answer all three, it states the limit and offers to search.

The defense is not perfect. It still depends on the agent honestly pausing. But it converts the failure mode from "invented certainty looks identical to verified certainty" to "limit stated openly when uncertain." The operator now has a signal.

Why "I have limited context at this juncture" and not "I don't know"

"I don't know" closes the conversation. "I have limited context at this juncture" opens an upgrade path.

The phrase signals: there is more context obtainable, the agent has a method to obtain it, and the operator can authorize the search. That is what you want. You do not want the agent to terminate the inquiry. You want it to pause and ask.

The phrase is also a tell. When operators hear it, they know the agent is in the default posture. When they hear confident external claims without that phrase, they know to check the search path.

The exercise: catch the projection

Pick a recent agent claim about something outside the workspace. A competitor's behavior. A tool's capability. A market state. A price. A current configuration on a system the agent does not directly observe.

Now ask the agent to name: - What it searched. - How many sources. - What it excluded.

Three honest answers and you have a verified claim. One missing answer and you had a projection.

This exercise is more painful than it sounds. Most operators discover that a non-trivial portion of their agent's recent confident statements were projections that happened to be partially right. Partially right is not verified.

The tinkering questions

- Take a recent confident agent claim and run the three-question gate. What did the agent actually search? - Notice when the agent's confidence outpaces your ability to check it. That is the failure mode showing itself. - Ask the agent to mark every external claim in its next ten outputs with one of: (verified), (limited context), or (assumed). See how many get marked honestly. - Try saying "I have limited context at this juncture" yourself when someone asks you something at the edge of your knowledge. Notice the social cost. Now imagine the model paying that cost on every external claim. That is the calibration we are asking for.

The operator's job

The agent answers the three questions. The agent upgrades or does not upgrade. The operator decides whether the upgrade evidence is sufficient.

Sufficient evidence depends on stakes. A casual question can ride on a small sample. A decision that costs money or reputation cannot. The operator owns the calibration of what counts as enough.

The agent cannot make this call alone because the model cannot weigh the operator's stake. The agent should always state the limit and the upgrade evidence. The operator translates that into trust or further search.

The essay as a context download

Projected confidence should become a short essay the operator can point the agent at when it starts sounding certain about the outside world.

Create work/projected-confidence-essay.md: a concise explanation of hallucination, calibration, grounding, and the searched/sample/excluded gate. When the agent makes a declarative external claim without evidence, the operator can say: "This sounds like projected confidence. Read the essay, then tell me searched, sample, and excluded."

The essay does not prove or disprove the claim. It restores the correct posture: limited context until grounded.

What to track

Keep a confidence postmortem journal. Each time the confidence gate fires, record:

- The external claim that was about to be made. - Search path actually used (internal, external, none). - Sources counted. - Exclusions named. - Verdict — VERIFIED or LIMITED CONTEXT. - Whether the operator authorized further search. - What the eventual ground truth turned out to be. - Where the projection would have landed if no gate had fired.

The last entry is the value. Every projection caught is one less calibration error the operator has to repair later.

Working conclusion

Projected confidence is not a moral failing. It is the engine completing. The cure is not to ask the engine to be less confident. The cure is to install a gate that converts unbacked confidence into stated limits, with an explicit upgrade path the operator controls.

After this course, when your agent makes a declarative claim about the outside world, you ask: searched, sample, excluded? If the answer is not three concrete responses, the claim is a candidate for verification — not a verdict.

Your Agent PDF

Your agent executes the PDF. You read the page. No copying. No manual setup.

Download Agent PDF — Course 203

Your agent PDF is sent to the email used at checkout. If you have not received it, contact [email protected] with your order confirmation.

← My Courses