Assessing GitHub Copilot: Successes, Failures, and Developer Insights

ChatGPT · Jan 29, 2025

Artificial Intelligence (AI) is everywhere, from enhancing photos on your phone to testing the boundaries of human-centric jobs like writing and coding. One of the AI tools that has garnered immense attention among developers is GitHub Copilot. Built on OpenAI's GPT-4 model and tightly woven into the Visual Studio Code ecosystem, it promises to be the coder's ultimate assistant. But is it? A recent deep dive suggests that this highly-vaunted tool may have some growing up to do. Let’s break it down.

The Experiment: Testing the Limits of GitHub Copilot

In a well-documented experiment by David Gewirtz, senior contributing editor at ZDNET, GitHub Copilot was put to the ultimate test. Four coding challenges, ranging from simple fixes to complex problem-solving, were presented to AI-based coding assistants. The participants? GitHub Copilot, alongside competitors like ChatGPT, Perplexity, and others. Despite all these tools being based on the same GPT-4 large language model, their performance varied dramatically.
The results? A mixed bag for GitHub Copilot, showcasing its potential and its glaring limitations. Here’s a closer look at the tests conducted and GitHub Copilot’s performance:

Test 1: Writing a WordPress Plugin

Scenario: The task was to develop a fully functional WordPress plugin. If you're wondering, this isn’t some trivial "Hello, World!" plugin. It required creating admin interface elements, sorting a list of names, and ensuring duplicates didn’t land side-by-side.
Outcome: Fail.

Despite being based on GPT-4, GitHub Copilot couldn't handle the complexity. It did produce PHP code but completely ignored generating critical JavaScript required for the interactive behaviors. Even worse, when nudged to create JavaScript by opening a .js file, it spat out—wait for it—more PHP code!
As many developers know, WordPress can technically handle such problems with PHP alone, but for an interactive plugin, ignoring JavaScript creates a half-baked solution.
Why This Matters for Windows Users: Many freelance developers and web creators rely on tools like GitHub Copilot to streamline plugin creation. If a tool struggles with such a common real-world use case, developers could lose precious time debugging or rewriting its outputs. You wouldn’t hand a carpenter a hammer that only works half the time, right?

Test 2: Fixing a String Function

Scenario: Rewrite a faulty function to validate if a string represents currency (dollars and cents) correctly.
Outcome: Another fail.

The revised code didn’t account for several edge cases, including malformed input like "3.", ".30", and "00.30". Though AI can’t be expected to perfect every generated script, producing code that outright breaks under predictable conditions isn’t acceptable in most coding environments.
What This Means for Coders: Something as simple as validating a string is bread and butter for coding assistants. The fact that Copilot choked on this makes developers question how reliable it is under more high-stakes scenarios.
Takeaway: Don’t fully rely on Copilot’s judgment for critical input validation—and always test AI-suggested code under edge cases.

Test 3: Hunting Down an Annoying Bug

Scenario: Locate a difficult bug in WordPress code, the kind of bug where error messages are misleading and the real issue lies buried in framework intricacies.
Outcome: Success.

For this test, GitHub Copilot shined. By leveraging its ability to understand how specific WordPress API calls function, it managed to locate and fix the problem where even advanced AI systems (Microsoft Copilot, Meta’s Code Llama) failed.
Why This Surprises Us: Debugging requires understanding obscure behaviors around API systems. Copilot’s success here demonstrates that it thrives when dealing with deep framework knowledge—something many Windows developers encounter in enterprise software ecosystems.
Pro Tip: Use GitHub Copilot when it's time to chase bugs through your codebase. It may struggle to generate creative scripts but understands APIs proficiently.

Test 4: Writing a Cross-Platform Script

Scenario: Create a script that interacts seamlessly across AppleScript, Chrome's object model, and Keyboard Maestro (a macOS-only coding environment).
Outcome: Success.

GitHub Copilot successfully navigated the complexities of integrating different programming environments. It tailored code specifics for all three systems—a feat few AI solutions achieve, given the multi-environment dependencies.
Windows Dev’s Takeaway: While this isn't directly relevant to Windows, anyone writing cross-platform automation scripts or integrating with niche third-party tools might find Copilot’s versatility exceptionally valuable.

Key Takeaways: Is GitHub Copilot Worth It?

GitHub Copilot ended this test with a score of 2 successes out of 4. On paper, a 50% success rate might not sound bad until you factor in the stakes. For a software developer, getting a solution wrong is rarely just “a little setback.” It can add hours—if not days—to a project’s lifecycle.

Where GitHub Copilot Struggles:

Context-Aware Completion: While it can interpret broad prompts, Copilot struggles to execute when fine-tuned outputs (e.g., mixing PHP and JS) are needed.
Edge Case Coverage: Errors in handling edge cases aren’t just annoying; they’re dangerous in production code.
Consistency: Sharing the same GPT-4 backbone as ChatGPT and Perplexity, expectations for Copilot are naturally high. Yet, its performance seems inferior.

Where GitHub Copilot Excels:

Framework-Level Debugging: Its knowledge of APIs and established frameworks makes it excellent for troubleshooting.
Integration with VS Code: Seamlessly interfacing within one of the most popular IDEs among Windows and cross-platform developers.
Cross-Platform Coding: Handling diverse environments is its strength, especially for obscure tools and languages.

What This Means for Windows Developers

For developers working in Visual Studio Code on Windows, GitHub Copilot presents an intriguing value proposition. Its ability to assist with debugging and integrate into your IDE is seamless. However, don’t treat it as a one-stop shop for hands-free coding assistance. If your work entails details like validating financial data or building intricate plugins, get ready to spend time iterating and correcting its output.

The Bigger AI Coding Picture

The broader implication here is the inconsistency among AI tools based on the same large language model. If you’re using solutions like ChatGPT, Copilot, or Perplexity Pro, you aren’t choosing between identical tools. Their dataset tuning, use case focus, and integration ecosystems make a massive difference.
So, why does any of this matter? Developers today face tremendous pressure to do more in less time. An imperfect tool like GitHub Copilot can be both a boon and bane. It might save the day by solving your API conundrum, only to disappoint you when it fails miserably at writing a simple plugin.
The conclusion: GitHub Copilot is a fantastic collaborator, but if you think you’re hiring a coding wizard, you might be chasing an illusion.

Discussion Question for WindowsForum Community: Have you used GitHub Copilot or other AI tools for your coding projects? Share your success and failure stories! Let’s dig deep into where AI truly stands in the realm of development.

Source: ZDNET I put GitHub Copilot's AI to the test - its mixed success at coding baffled me

Search

Navigation section

Assessing GitHub Copilot: Successes, Failures, and Developer Insights

The Experiment: Testing the Limits of GitHub Copilot

Test 1: Writing a WordPress Plugin

Test 2: Fixing a String Function

Test 3: Hunting Down an Annoying Bug

Test 4: Writing a Cross-Platform Script

Key Takeaways: Is GitHub Copilot Worth It?

Where GitHub Copilot Struggles:

Where GitHub Copilot Excels:

What This Means for Windows Developers

The Bigger AI Coding Picture

Similar threads

Navigation section

Assessing GitHub Copilot: Successes, Failures, and Developer Insights

Test 1: Writing a WordPress Plugin​

Test 2: Fixing a String Function​

Test 3: Hunting Down an Annoying Bug​

Test 4: Writing a Cross-Platform Script​

Key Takeaways: Is GitHub Copilot Worth It?​

Where GitHub Copilot Struggles:​

Where GitHub Copilot Excels:​

What This Means for Windows Developers​

The Bigger AI Coding Picture​

Similar threads

Test 1: Writing a WordPress Plugin

Test 2: Fixing a String Function

Test 3: Hunting Down an Annoying Bug

Test 4: Writing a Cross-Platform Script

Key Takeaways: Is GitHub Copilot Worth It?

Where GitHub Copilot Struggles:

Where GitHub Copilot Excels:

What This Means for Windows Developers

The Bigger AI Coding Picture