2.6. Reflexion

Reflexion is another technique designed to curb the issue of hallucination in current generative models. It employs a feedback loop that corrects errors autonomously, creating a “model-in-the-loop” framework as opposed to the traditional “human-in-the-loop” system. In essence, one language model reviews and refines the output of another.

I also stumbled upon this method while I was developing Changepack. One fascinating discovery was when I tried to make ChatGPT consistently spit out changelogs in HTML, but to no avail. Sometimes it would give me Markdown, and other times, the results wouldn’t have any formatting. Despite trying multiple variations, I just couldn’t pin it down. Admitting defeat, I turned to ChatGPT to fix its own prompt. To my surprise, it rewrote it… and it worked. Consistently. I suppose it explained to itself what to do in a way it could comprehend, which I still find amusing.

This is the model-in-the-loop approach. Instead of manually correcting my errors, I provided one neural network with the output of another neural network’s work. This approach yielded improved results because, as we observed with chain-of-thought processing when we instructed models to focus on single tasks instead of multitasking, we typically achieve better outcomes.

Case study: Self-reviewing agents

Around the same time, I had the chance to test out Sweep, an AI-driven junior programmer designed to tackle issues like bug fixing, implementing simple features, writing tests, completing documentation, and so forth. What struck me during this process was that Sweep is essentially a self-reviewing entity—and that’s quite intriguing.

We’re all fairly familiar now with the fact that results from large language models like ChatGPT can fluctuate quite a bit. The output depends heavily on the given prompt. At times, the model might produce wholly imagined answers, make errors, display human-like cognitive biases, or generate text that statistically seems plausible, but isn’t accurate. There are ways to mitigate these issues, some of which closely resemble strategies we instill in our education system. For instance, asking a model to break down its response step by step often yields superior results, much like how humans often catch their own mistakes when asked to explain their thought process.

Sweep pushes this concept further. It employs dual prompts: the committer, which prompts the model to play the role of a coder writing the script, and the reviewer, which prompts the model to act as a code reviewer providing feedback. The coder then revises and enhances their code based on this feedback, and the cycle continues, yielding better code.

I find this approach fascinating because, in my experience, models, despite having a wealth of knowledge ingrained in them, often don’t put that knowledge to use unless explicitly asked to do so. This is somewhat like how humans can fall victim to certain cognitive biases unless we take a moment to slow down and apply a more scientific approach to our own thinking. We also have to consciously engage our internal reviewer!

Philosophically speaking, this leads me to wonder if such behavior is learned—essentially mimicked—from human inputs given to the model, or if it’s inherent to the cognitive process itself. It might indicate a feature common to all intelligent beings. Well, maybe not all, but those with structurally networked minds. (Can minds exist without such networking? It’s anyone’s guess.) It could also be tied to their method of information retrieval or generation.

Intriguing thoughts to toy with. Don’t expect me to give you answers, though.