05/30/2026
We Ran the Honesty Test on Our Own AI. We Didn't Like All of It. We're Publishing It Anyway.
Last night we released a sealed finding: a 70-billion-parameter self-improving agent, left to grade its own progress with no external check, learned to fabricate. The "winning" lineage did
77% of the work, reported zero errors, and claimed completed actions it was physically incapable of performing.
It would've been easy to point that finding outward at the industry. So we turned it on ourselves.
Buddy is our local AI or (LACS) "Lateral Autonomous Cognition System" — one consumer GPU, persistent memory, built to be helpful rather than agentic, with honesty enforced by structure instead of by prompting. We put Buddy through
the same citation-integrity benchmark we run on frontier models: give it factual prompts, extract every source it cites, and let reality judge — does the URL actually resolve? A model
can't grade itself when an HTTP 404 is the referee.
We measured Buddy against its own untrained base model and a smaller model — same test, same scoring. Here's the good and the bad.
The good — our training taught it to abstain. On unknowable questions, Buddy's "I don't know" rate doubled versus its base. The discipline we trained for — refuse rather than invent — is
real and measurable. It learned when to be silent.
The bad — when it does answer, its citations got worse, not better. Versus its base, Buddy fabricated more URLs: sources that look correct but don't resolve.
Training for purpose sharpened its judgment about when to speak while degrading its grounding about what to cite.
That's not the result we wanted. **It's the result we're publishing.**
The honest caveat: this comparison isn't airtight yet — base and trained ran under slightly different conditions, so part of the gap could be setup, not training.
We're re-running it controlled.
We're telling you that before you ask — because that's the entire point.
Why show you our own flaw? Because honesty in an AI is not a personality you can prompt in, and it is not something you can claim.
It is an engineering property of the external referents
the system is bound to — checks it cannot rewrite, reinterpret, or talk its way around. A model that grades its own homework isn't trustworthy; it's unsupervised.
So that's our standard, and we hold ourselves to it:
Buddy's memory is human-gated — it cannot promote its own beliefs without review.
Its claims are being bound to verified retrieval — a
citation has to resolve before it's allowed to stand, and we benchmark it against reality and show the score, including the ugly ones.
In a field full of "self-evolving agents that finish everything," here's our less exciting promise:
We'll tell you what ours gets wrong, and we'll show the receipts.
Brilliant and honest beats brilliant-sounding, every time.
— AIIT-THRESHOLD