“Let me tell you, my good friend…”
If you’ve built LLM-powered NPCs, you’ve heard this. Your character starts strong, maybe even impressive. But by turn 20, something’s wrong. They’re repeating themselves. The same phrases. The same gestures. The same rhythm.
I ran 150 turns of dialogue with an LLM-powered character last week. Here’s what I found:
| Pattern | Frequency |
|---|---|
| “takes a long swig from his glass” | 59% of responses |
| “my good friend/man/fellow” | 30% of responses |
| “eyes unfocused” | 15% of responses |
| “let me tell you” | 13% of responses |
My character—a boozy ex-politician I’ll call Frank—couldn’t stop drinking. Not because it was dramatically appropriate. Because the LLM fell into a rut.
This isn’t a model problem. It’s a systems problem. And it’s solvable.
Why LLMs Repeat Themselves
LLMs are trained to predict the most likely next token. For character dialogue, this means:
- They latch onto successful patterns (things that “worked” in training)
- They lack session memory of what they’ve already said
- They have no variety pressure pushing them away from repetition
Your character config might say “Frank drinks whiskey” once. But the LLM sees that pattern succeed and keeps deploying it. Token by token, response by response, your character becomes a caricature of themselves.
The technical term is mode collapse in generation—the model converges on a small set of high-probability outputs.
The Research-Backed Fix: Diversity Penalties
In 2016, researchers published “Diverse Beam Search” (arXiv:1610.02424), introducing diversity penalties to neural sequence generation. The idea: penalize the model for repeating itself.
The technique works. But there’s a catch: it’s buried in inference code, inaccessible to game writers who actually craft the characters.
The insight: What if we made diversity penalties configurable by writers?
Solution: Config-Driven Novelty Guards
Here’s the approach I built and tested while prototyping an AI dialogue system:
| |
How it works:
- Track phrase usage during the session
- When a limit is hit, inject a variety prompt into the next generation
- The model gets a gentle nudge: “you just did that, try something else”
This isn’t rejection sampling (expensive, unpredictable). It’s guidance—working with the model, not fighting it.
The Other Half: Action Vocabulary
But novelty guards are only half the fix. The other half: give the model better options.
In an earlier prototype, I discovered that action vocabulary tables dramatically improved variety:
| |
Instead of the model inventing “takes a swig” over and over, it selects from a curated vocabulary. The variety is baked in.
Key insight: The system provides the palette; the AI paints with it.
Prompt Placement Matters
One more trick from the research: recency bias.
LLMs pay more attention to the end of prompts than the middle. If your “don’t repeat yourself” rules are buried in the middle of a long system prompt, they’ll be ignored.
Move your constraints to the last position:
This simple reordering can reduce repetition by 20-30% with zero code changes.
Results: 75-Turn Validation
I ran a complete 75-turn extended narrative playthrough. Here’s what happened:
Before Fix (59% Repetition Problem)
| Pattern | Frequency | Assessment |
|---|---|---|
| “takes a long [swig]” | 59% | 🔴 Annoying |
| “my good friend/man” | 30% | 🔴 Robotic |
| Unique action variety | ~3 per 20 turns | 🔴 Boring |
After Fix (Novelty Guards + Action Vocabulary + Prompt Placement)
| Character | Signature Phrase | Frequency | Assessment |
|---|---|---|---|
| Character A (servant) | “Begging your pardon, sir” | 12% | ✅ Sweet spot |
| Character B (professional) | “Let’s talk numbers” | 5% | ✅ Perfect |
| Characters C, D, others | Various | 0% | 🟡 Need encouragement |
Key findings:
- Characters with rich config-defined vocabulary hit the 8-15% sweet spot naturally
- Characters with sparse definitions underperformed—they need richer options
- The variety prompts prevented any phrase from exceeding limits
The Sweet Spot
| State | Occurrence | Assessment |
|---|---|---|
| Original | 59% | 🔴 Annoying |
| Overcorrected | 0% | 🔴 Robot (no personality) |
| Calibrated | 8-15% | ✅ Character voice |
Catchphrases ARE character voice, not bugs. A servant character SHOULD say “begging your pardon” more than a professional one. The goal isn’t elimination—it’s balance.
The Bigger Picture: Flavor vs. Mechanics
Here’s what I’ve learned: mechanics are easy, flavor is hard.
I can build a dialogue system that runs 75 turns without crashing. That’s table stakes. The hard part is making those 75 turns interesting—making a character feel alive instead of like a chatbot wearing a costume.
The techniques in this post—novelty guards, action vocabularies, prompt placement—are all about injecting flavor into mechanical systems. They’re the difference between:
Generic:
“Let me tell you, my good friend, everyone has secrets.”
Specific:
“(gripping the armrest to steady himself) The ‘98 fundraiser…” (stops) “You wouldn’t know about that. Before your time.”
The second one has texture. A specific reference. A self-interruption. Physical struggle. It earns attention.
Implementation Checklist
If you’re building LLM NPCs:
Measure your repetition rate. Grep your transcripts. You’ll be surprised.
1grep -c "takes a.*swig\|my good friend" transcript.logAdd catchphrase limits to your character configs. Start with 3-5 uses max.
1{ "catchphrase_limits": { "signature phrase": 4 } }Build an action vocabulary per emotional state. 4-5 options each.
1{ "nervous": ["(fidgets)", "(glances at door)", "(wrings hands)"] }Move variety rules to the end of your prompts. Use recency bias.
Encourage usage, don’t just limit it. Change “Use phrases like X” to “Your signature phrases (use 1-2 per scene): X, Y, Z”
These aren’t silver bullets. But they’re the difference between “neat demo” and “compelling experience.”
What’s Next
I’m exploring how these techniques scale to longer sessions and more complex narrative structures: persistent memory, multi-character scenes, and what happens when characters start surprising you.
Got questions? Found a better approach? Reach out: [email protected]