Skip to main content

The Power of a Second Opinion: Why GitHub Copilot's Rubber Duck Agent Changes the Game

Brian Swiger
Author
Brian Swiger
Passionate Geek • Proud Father • Devoted Husband

The easiest part of modern, AI-assisted development is getting code produced. With advanced LLMs at our fingertips, generating lines of syntax no longer feels like the primary bottleneck in software engineering.

Instead, the real challenge has shifted. The modern pain point occurs when an AI agent commits too early to a flawed architecture, carries an incorrect assumption across a multi-file change, or delivers code that passes a quick visual inspection but remains fundamentally broken in production.

This is exactly why the GitHub Copilot Rubber Duck agent inside the GitHub Copilot CLI is a massive step forward for developer workflows. It shifts the focus from raw generation to deliberate review.

The Origins of the Rubber Duck
#

To understand why this matters, we have to look at the traditional engineering practice of Rubber Duck Debugging. The concept originated from the book The Pragmatic Programmer, where a programmer would carry around a rubber duck and force themselves to explain their code, line by line, to the toy.

The act of teaching or explaining forces your brain to slow down, challenge assumptions, and see the logical gaps you missed while writing the code.

Moving Beyond Self-Review
#

When applying AI to this problem, asking a model to review its own work has natural limitations. If a model family starts with a weak premise, asking it to inspect its own reasoning often just yields a more polished, confident defense of the original mistake. Self-review is better than nothing, but it lacks the healthy friction required for robust engineering.

The GitHub Copilot Rubber Duck agent addresses this by introducing a multi-model family architecture.

As highlighted by cloud architecture expert Thomas Thornton in his insightful analysis, the real value here is the introduction of a structured second opinion. One model family executes the primary engineering task, and an entirely separate model family is brought in to stress-test the implementation, plan, and test suites.

Why This Matters for Harder Tasks
#

According to GitHub’s engineering data, this multi-model checkpoint approach yields significantly better results on complex, long-running, multi-file tasks.

  • In a single-file edit: A minor hallucination or incorrect assumption might cost a developer a couple of minutes to fix.
  • In a multi-file architectural change: A flawed premise propagates quickly across implementation files, structural boundaries, and tests. The code might compile, and the pull request might look clean, but the underlying approach is flawed.

By forcing a pause and utilizing a separate model family to act as the “rubber duck,” the workflow mirrors how high-performing human engineering teams operate. Exceptional engineering teams do not just write code continuously; they constantly question, review, and validate decisions before they spread across the codebase.

Striking the Right Balance
#

As with any tool at platform scale, the Rubber Duck agent will only remain useful if it stays sharp and selective. If the review step becomes too noisy, developers will ignore it. If it adds too much latency, teams will bypass it entirely.

However, by framing AI tooling as a mechanism for targeted friction rather than fully autonomous generation, GitHub is steering AI-assisted engineering in a much healthier, more sustainable direction.

Official References and Further Reading
#

To learn more about the technical mechanics and perspectives behind this feature, explore the following resources: