Tony Robalik

Do LLMs suck at coding, actually? An open source case study

A close up view of a bunch of plants

Photo by Ayush Kumar on Unsplash

I am not one to "just set aside ethics for a moment." Just ask my therapist. It's exhausting. So for me, the question "should LLMs be used for… anything" is settled: no, they shouldn't. The reasons are overdetermined.

This post isn't about that. That case has already been painfully well-made by both critics and (unknowingly, I suspect) advocates over several years now. I'm not sure what I can usefully add, though if I get angry enough I do have some ideas.

I can also say that I explicitly asked, in a private tech worker Slack workspace, in a channel in that workspace dedicated to posting about the dumbest shit done with or by LLMs, what others thought about "the ethics." For the most part, the respondents denied there were legitimate concerns. At least one person was especially glad at the effective destruction of the concept of intellectual property (at least when the attacker is a "hyperscaler"—although that's not how he put it). I also observed a lot of fear from workers who asserted, not without some merit, that they had to accept this assault from the CEO class as a condition of their continued employment.

So. This post isn't "setting aside ethics for a moment." It's acknowledging that many tech workers have already plucked their own eyes out, as a perceived condition of their continued employment in possibly the only remaining industry on Earth with any upward mobility.

Under these circumstances I'm forced to confront: do LLMs suck at coding, actually?

I'm not talking to you

You, the real hero of the story, who knows The One True Way to use LLMs, and has done some Pretty Cool Shit, Actually, with them—and who knows that an emdash is a dead giveaway that mere flesh couldn't have put finger to keycap. You can stop reading here, because you already know that I'm Just Holding it Wrong. You're right. You're better than me, and I apologize for breathing your air.

Case study 1: the misdiagnosis of a ClassCastException

Execution failed for task ':parent-project:child-project:computeAdvice'.
> A failure occurred while executing com.autonomousapps.tasks.ComputeAdviceTask$ComputeAdviceAction
   > class com.google.common.graph.ImmutableGraph cannot be cast to class com.google.common.graph.SuccessorsFunction (com.google.common.graph.ImmutableGraph and com.google.common.graph.SuccessorsFunction are in unnamed module of loader org.gradle.internal.classloader.VisitableURLClassLoader$InstrumentingVisitableURLClassLoader @53bbad21)

Issue 1656 for the Dependency Analysis Gradle Plugin (DAGP). The user reported a ClassCastException, and noted that they had used an AI to help troubleshoot their issue, and also to find a workaround.[1] Here's what they said:

These three bullet points make numerous claims, none of which reflect our shared reality. All are at least misleading if not false, or based on false premises.

I'll give it one thing—it is true that we can avoid dependency issues by not having dependencies. This feels like a subtle assault on OSS as such, on top of the more full-throated assault that are these human attention DDoS attacks themselves.

So, that's all wrong. No, actually, it's just a complete fucking mess. It's a fucking fractal of nonsense, combined with so authoritative a presentation, that I'd have to declare it a kind of madness were it not produced by a non-thinking machine. These things are madness generators.

Anyway, what's actually the issue here? I did a little round of debugging to satisfy myself, and then presented it to the OP, who reproduced my work in their environment and saw that it's actually literally this exact issue I already wrote about over a year ago with the "org.jetbrains.kotlin:kotlin-compiler" dependency (v2.3.0 in this case). Jetbrains is still silently bundling an old version of Guava into this dependency. Thanks guys!

Case study 2: The drive-by PR

PR 1650 "fixes" a crash. It is entirely LLM-generated, both in content and metadata. Trying to be helpful, I noted

I am not certain this is the correct solution. If we get to this point …, that implies something flawed in the analysis. Simply not crashing here might hide a deeper flaw.

OP's response begins:

You're right — the original fix was masking a deeper flaw. I've pushed a new commit that addresses the root cause.[2]

(There's that damn emdash!)

I eventually went on to fix the bug myself. Afterwards, I compared the two PRs:

And of course, at no point in that process did I interact with an actual human.

Case study 3: Another drive-by PR

PR 1651 (consecutive numbers, indeed the same author) proposed to improve build cache relocatability by making semantically incorrect changes to Gradle task input path sensitivity annotations that would have resulted in stochastic build failures for everyone using a remote build cache. It would also have resulted in material harm to my reputation from such a basic flaw.

Fractal harm engines

The harm from LLMs exists at many levels. At the outermost level (leaving aside the ethics!!), there's the attack on human attention from an increase in issue generation with incorrect analysis, and an increase in code generation based on that incorrect analysis. Below that you may find yourself talking essentially albeit indirectly to an unthinking bot, which is basically second-order prompting, which is about as nauseating as it sounds. If you spend any time investigating the claims of these bot/human hybrids (or blogging about them!), you're wasting your one and only precious existence essentially fighting global capitalism.[3] You may also be wasting CI resources on PRs that are unmergeable. If you make the mistake of merging those PRs, either due to a temporary fit of insanity or simple exhaustion (hey, it passed the tests!),[4] then you've just poisoned your project and now you risk harming your users. Hurting your users is a great way to hurt your own reputation, and if you rely on your reputation for work,[5] well, good luck with that.

Harm reduction

Some people advocate for LLM-usage disclosure policies. I'm glad that works for them, I guess. I don't have time for that, so after the recent spate of attacks on my project, I added a new Zig-inspired Code of Conduct with a Strict No LLM / No AI Policy.

🖕


  1. Kudos to them for acknowledging their use of an LLM up front (although they wrongly refer to it using the m term "AI"). This bar, which is on the floor, is apparently too high for some people. ↩︎

  2. Spoiler alert: no, it doesn't. ↩︎

  3. If you want to fight global capitalism, do it more directly. ↩︎

  4. Hm, who wrote those tests? 🤔 ↩︎

  5. Fortunately bots are doing all the hiring now. ↩︎