Development

Why LLMs Aren't Junior Developers/Modelers (And What They Actually Are)

Tyler Wolfe-Adam

Why LLMs Aren't Junior Developers/Modelers (And What They Actually Are)

Whenever I see a claim at how LLMs are like a “junior developer”, I have to hold back my eye rolling. It would be one thing if the claims were just referring to raw capabilities, but they're often suggesting a trajectory where these tools will eventually replace human developers entirely. It’s certainly attention grabbing, but it misrepresents the technology’s current capabilities and its actual value proposition.

This isn’t to say I haven’t been ‘wowed’ by an LLMs output, nor do I think they’re completely overhyped, but if we’re going to seriously consider more widespread adoption of this technology, it’s vitally important to actually understand their usefulness and limits.

Let’s start with a ground truth: On their own, LLMs are only ever generating one statistically likely response to whatever was inputted to it, without any sort of thought process or reflection. This snippet of a video by the channel 3Blue1Brown demonstrates it nicely:

There are techniques like Chain-of-Thought reasoning, where the LLM is instructed to derive an answer by breaking it down into steps before coming up with a final answer. This certainly helps improve the output, but the technology still fundamentally works the same way.

Now let’s compare it to how we (humans) think – and let me use simulation modeling as a perfect use case to describe this.

In building a simulation model, the developer must have high-level systems thinking (i.e., seeing the big picture) while also focusing on the minute details of the implementation. They need to understand how components interact as a whole system while also breaking down complex processes into discrete, programmable elements. They also have to think in multiple layers of abstraction simultaneously: the real-world phenomenon, their conceptual model, and the computational implementation. There’s also the logic which is often based on probabilities and which has temporal complexity (i.e., it shifting over time).

In short, the entire effort is akin to juggling multiple spinning plates. Consideration needs to be had as each additional layer (or plate) is added on, adjusting how much is taken on depending on the success of each part of the execution.

This fundamental, unavoidable way in which simulation models are built is in opposition to how LLMs generate output. Because of this, I don’t think that any amount of training will allow LLMs to be able to come close to even the most junior of modelers (or developers, generally).

That’s not to say they can’t be used for this effort though! To truly capture the model-building process, you’d need more than just an LLM - you need a system orchestrated by human developers to apply not only the values and decision-making framework, but also the actions taken by them (e.g., for verification and validation).

When I see claims from colleagues that LLMs will “never” be able to build more than toy simulation models, I agree! But dismissing LLMs in this way is like dismissing calculators because they can't solve engineering problems end-to-end. Of course they can't - but they're incredibly powerful as part of an engineer's toolkit for handling computational steps.

Now, when you go to use ChatGPT, Claude, Gemini, or any others, you are actually interfacing with a system on top of the LLM itself. However this sort of system is specifically designed for general-purpose usages, so its applicability in specific purposes (like simulation modeling) is limited.

Directly testing an LLM (or through any providers) on building simulation models can be a decent way to gauge a baseline, but it’s not scalable. To have something that’s viable as a marketable solution (i.e., for businesses), the focus needs to be on using LLMs for exactly what they’re best at and augmenting the human developers with these abilities so that they can build something better than what was possible before.

As I see it, there are three major areas where LLMs are “best” at a certain task with very specific conditions for each one. Each dramatically improves on what was possible before:

1. Creating Intelligent Translation Layers Between Intent and Action

TL;DR LLMs excel when choosing from possible actions which are well-defined, specific, and limited in scope.

This is the simplest one; it focuses on improving user experience and streamlining usage of a product.

Think about those frustrating phone trees where you're forced to “press 1 for billing, press 2 for technical support.” LLMs can replace this rigid system by understanding natural language intent and routing users appropriately. Instead of forcing customers into predefined categories, they can simply say what they need.

The same applies to basic chatbots that are pre-programmed to detect actions based on phrasing. It might get “I want a refund” but completely miss “I don't want this item I bought.” LLMs bridge this gap by understanding (the most likely) intent regardless of how it's expressed.

In the simulation space, one area this might be applicable in is experiment and scenario execution, specifically the setup and maintenance of it. Take a grocery store simulation where you want to compare trying out a few different low and number of check out counters – say, 1, 2, 7, and 8 counters.

Where it starts to get tricky is that scenarios often have parameters whose values are dependent on other parameter values. Maybe you want a 2-hour stocking frequency when fewer than 4 counters, but a 3-hour restocking frequency otherwise.

Explaining it in plain English is easy, but translating into specific mathematical expressions or needing to split it up across multiple experiments (depending on what your software provides) becomes tedious or cumbersome.

2. Generating Specific Content from Precise Prompts

TL;DR When you know exactly what you want but need help with the specific wording or implementation.

This is perhaps the most common application. When you have a clear, specific outcome in mind but need help generating the actual text or code, LLMs can be incredibly effective. The key here is that you're not asking the LLM to interpret or make creative decisions, you're providing a detailed specification and letting it handle the mechanical generation.

For example, if you need code that uses a particular library in a very specific way, you can provide detailed parameters and let the LLM generate it.

Because of how non-linear simulation modeling typically is, this scenario is only applicable to create dynamic boilerplate code (i.e., no logical implementations) or execution on a pre-determined implementation task.

The next blog post will look deeper into this scenario and show an example of how it can go wrong if you’re not specific enough.

3. Probability-Based Reasoning for Better User Experience

TL;DR When you need to provide helpful explanations instead of technical jargon or generic error messages.

We've all encountered software bugs that either give us cryptic error codes or unhelpfully generic messages like "Something went wrong." LLMs can analyze the situation and provide probability-based explanations that help users understand what likely happened and what they might do next.

This isn't about replacing technical logging for developers, but about creating a more intuitive interface between complex systems and end users who need actionable guidance.

As I see it, the path forward with LLMs isn't about expecting them to become human-level developers, nor is it about dismissing them as overhyped toys. Instead, it's about understanding exactly what they do well and building systems that leverage these strengths while acknowledging their fundamental limitations. The real opportunity lies not in replacing human developers, but in augmenting their capabilities in specific, well-defined areas.