04/16/2025
Can weaker AI get smarter by thinking more? We put it to the test.
We took inspiration from work on chained reasoning and tried something different:
Tie multiple GPT-3.5 models together to see if stepwise reasoning could improve performance — without upgrading the model.
Here’s the logic puzzle we used:
“Jack is looking at Anne. Anne is looking at George. Jack is married, George is not, and we don’t know Anne’s status.
Is a married person looking at an unmarried person?”
It’s deceptively simple… and GPT-3.5 completely fumbles the logic.
Round 1:
GPT-3.5 says “Yes” — but for the wrong reasons.
Round 2:
We pass that output to another GPT-3.5, prompting it to “think twice.”
Still wrong.
Round 3:
A third GPT-3.5 takes the two flawed responses and finally pieces it together.
The final answer? Correct — with sound reasoning.
This experiment shows that layering LLMs, even weaker ones, can simulate deeper reasoning.
Like a committee reviewing each other’s work — clarity emerges through iteration.
We’re not building smarter AIs here — just more thoughtful ones.
If you're building AI agents or workflows, you don’t always need GPT-4.
Sometimes, you just need more steps.
What else could we do with chained reasoning like this?