← RETURN_TO_CORE

Why Your LLM NPCs Sound the Same (And How to Fix It)

DATE: 2026-02-22
ACCESS: PUBLIC

“Let me tell you, my good friend…”

If you’ve built LLM-powered NPCs, you’ve heard this. Your character starts strong, maybe even impressive. But by turn 20, something’s wrong. They’re repeating themselves. The same phrases. The same gestures. The same rhythm.

I ran 150 turns of dialogue with an LLM-powered character last week. Here’s what I found:

PatternFrequency
“takes a long swig from his glass”59% of responses
“my good friend/man/fellow”30% of responses
“eyes unfocused”15% of responses
“let me tell you”13% of responses

My character—a boozy ex-politician I’ll call Frank—couldn’t stop drinking. Not because it was dramatically appropriate. Because the LLM fell into a rut.

This isn’t a model problem. It’s a systems problem. And it’s solvable.


Why LLMs Repeat Themselves

LLMs are trained to predict the most likely next token. For character dialogue, this means:

  1. They latch onto successful patterns (things that “worked” in training)
  2. They lack session memory of what they’ve already said
  3. They have no variety pressure pushing them away from repetition

Your character config might say “Frank drinks whiskey” once. But the LLM sees that pattern succeed and keeps deploying it. Token by token, response by response, your character becomes a caricature of themselves.

The technical term is mode collapse in generation—the model converges on a small set of high-probability outputs.


The Research-Backed Fix: Diversity Penalties

In 2016, researchers published “Diverse Beam Search” (arXiv:1610.02424), introducing diversity penalties to neural sequence generation. The idea: penalize the model for repeating itself.

The technique works. But there’s a catch: it’s buried in inference code, inaccessible to game writers who actually craft the characters.

The insight: What if we made diversity penalties configurable by writers?


Solution: Config-Driven Novelty Guards

Here’s the approach I built and tested while prototyping an AI dialogue system:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
  "novelty_rules": {
    "catchphrase_limits": {
      "let me tell you": 4,
      "my good friend": 5,
      "back in my day": 4
    },
    "action_limits": {
      "takes a swig": 3,
      "takes a long": 4,
      "eyes unfocused": 3
    },
    "variety_prompts": [
      "Note: Vary your approach this turn. Your catchphrases work best when they land occasionally, not constantly.",
      "Note: Show a different facet of your character here.",
      "Note: Find a fresh angle for this moment."
    ]
  }
}

How it works:

  1. Track phrase usage during the session
  2. When a limit is hit, inject a variety prompt into the next generation
  3. The model gets a gentle nudge: “you just did that, try something else”

This isn’t rejection sampling (expensive, unpredictable). It’s guidance—working with the model, not fighting it.


The Other Half: Action Vocabulary

But novelty guards are only half the fix. The other half: give the model better options.

In an earlier prototype, I discovered that action vocabulary tables dramatically improved variety:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
  "action_vocabulary": {
    "drunk": [
      "(sloshing whiskey onto the carpet, not noticing)",
      "(gripping the armrest to steady himself)",
      "(squinting as if the room is too bright)",
      "(fumbling with his glass, nearly dropping it)"
    ],
    "defensive": [
      "(straightening his tie with trembling fingers)",
      "(stepping back, bumping into the sideboard)",
      "(glancing toward the door)"
    ],
    "nostalgic": [
      "(staring into the fire, eyes distant)",
      "(running a thumb along his wedding ring)"
    ]
  }
}

Instead of the model inventing “takes a swig” over and over, it selects from a curated vocabulary. The variety is baked in.

Key insight: The system provides the palette; the AI paints with it.


Prompt Placement Matters

One more trick from the research: recency bias.

LLMs pay more attention to the end of prompts than the middle. If your “don’t repeat yourself” rules are buried in the middle of a long system prompt, they’ll be ignored.

Move your constraints to the last position:

[[[YI--CSMoMhceuPDNaemrOoErnoRVaersTNEcyiAORtcgNTeocnTrrnoaoetnt-ppcetueeoxerVnantxeAtt.tRte..pIhyx..hEeot].rTu.]aYsr.sa.eRms]sUeiLg(EwnuSaas:yteutr1we-i2cpehpreiarnsesaciernnoewc)o:ns"elceuttimveetteulrlnsyou","mygoodfriend"

This simple reordering can reduce repetition by 20-30% with zero code changes.


Results: 75-Turn Validation

I ran a complete 75-turn extended narrative playthrough. Here’s what happened:

Before Fix (59% Repetition Problem)

PatternFrequencyAssessment
“takes a long [swig]”59%🔴 Annoying
“my good friend/man”30%🔴 Robotic
Unique action variety~3 per 20 turns🔴 Boring

After Fix (Novelty Guards + Action Vocabulary + Prompt Placement)

CharacterSignature PhraseFrequencyAssessment
Character A (servant)“Begging your pardon, sir”12%✅ Sweet spot
Character B (professional)“Let’s talk numbers”5%✅ Perfect
Characters C, D, othersVarious0%🟡 Need encouragement

Key findings:

  • Characters with rich config-defined vocabulary hit the 8-15% sweet spot naturally
  • Characters with sparse definitions underperformed—they need richer options
  • The variety prompts prevented any phrase from exceeding limits

The Sweet Spot

StateOccurrenceAssessment
Original59%🔴 Annoying
Overcorrected0%🔴 Robot (no personality)
Calibrated8-15%✅ Character voice

Catchphrases ARE character voice, not bugs. A servant character SHOULD say “begging your pardon” more than a professional one. The goal isn’t elimination—it’s balance.


The Bigger Picture: Flavor vs. Mechanics

Here’s what I’ve learned: mechanics are easy, flavor is hard.

I can build a dialogue system that runs 75 turns without crashing. That’s table stakes. The hard part is making those 75 turns interesting—making a character feel alive instead of like a chatbot wearing a costume.

The techniques in this post—novelty guards, action vocabularies, prompt placement—are all about injecting flavor into mechanical systems. They’re the difference between:

Generic:

“Let me tell you, my good friend, everyone has secrets.”

Specific:

“(gripping the armrest to steady himself) The ‘98 fundraiser…” (stops) “You wouldn’t know about that. Before your time.”

The second one has texture. A specific reference. A self-interruption. Physical struggle. It earns attention.


Implementation Checklist

If you’re building LLM NPCs:

  1. Measure your repetition rate. Grep your transcripts. You’ll be surprised.

    1
    
    grep -c "takes a.*swig\|my good friend" transcript.log
    
  2. Add catchphrase limits to your character configs. Start with 3-5 uses max.

    1
    
    { "catchphrase_limits": { "signature phrase": 4 } }
    
  3. Build an action vocabulary per emotional state. 4-5 options each.

    1
    
    { "nervous": ["(fidgets)", "(glances at door)", "(wrings hands)"] }
    
  4. Move variety rules to the end of your prompts. Use recency bias.

  5. Encourage usage, don’t just limit it. Change “Use phrases like X” to “Your signature phrases (use 1-2 per scene): X, Y, Z”

These aren’t silver bullets. But they’re the difference between “neat demo” and “compelling experience.”


What’s Next

I’m exploring how these techniques scale to longer sessions and more complex narrative structures: persistent memory, multi-character scenes, and what happens when characters start surprising you.


Got questions? Found a better approach? Reach out: [email protected]

person
Paul Kyle // Director — Phase Space