ROLECALLFeatures
Features

Generation Controls

Deep dives into every tool on stage

Generation Controls

Prompting wing and Sampler Settings — the preset, sliders, and post-processing controls side by side

Generation Controls are the dials that decide how the AI writes — independent of what you've told it to write. Two sliders tug it toward "careful" or "wild." A few number boxes decide how much of the conversation it remembers, and how long it can ramble. A single wing in the chat — Quick Play — collects every one of those controls plus model selection, lorebook toggles, Story Director, and image automation in one place.

This page covers two connected surfaces:

  • Sampler settings — the generation parameters (temperature, top-p, penalties, response length, context window)
  • Quick Play — the in-chat wing that surfaces those samplers plus model, lorebooks, Story Director, and image automation toggles for the current scene

Samplers in RoleCall live inside Presets. There's no separate "sampler profile" type to publish or fork — when you fork a preset, you fork its samplers with it. The Quick Play wing lets you ride on top of the preset's defaults with per-scene overrides that only apply to this one chat. The full preset editor (in your dashboard) exposes every sampler. The Quick Play wing exposes the handful that actually matter for tuning a scene without leaving the conversation.

Image generation — attaching pictures, auto-generating scene illustrations, and the manual image modal — has its own dedicated page. See Image Generation for that surface in full.


Sampler Settings

Samplers shape the AI's choice of each next word. They don't change what the AI was told to do — that's the prompt's job — they change how it picks among the words it could possibly say next.

A low temperature with a tight top-p produces careful, predictable prose. Crank both up and the AI starts taking weirder, riskier choices. Layer in penalties and it stops leaning on the same words and phrases.

You don't need to master every sampler. Most chats run fine on the preset's defaults. The Quick Play wing exposes the handful that actually matter for tuning a scene.

How a Sampler Decision Actually Works

Skim this if you're new — skip it if you've been tuning models for years. Each time the AI picks the next word, every possible word starts with a probability. Samplers run in this rough order:

  1. Top P, Top K, Min P, Top A all act as filters — they take the full vocabulary and throw away unlikely options.
  2. Temperature reshapes what's left — flattening probabilities (higher temperature = more random) or sharpening them (lower = more deterministic).
  3. Frequency Penalty, Presence Penalty, Repetition Penalty push down the probability of words the AI has used recently, so it stops looping on the same phrase.
  4. The model picks one word from the surviving candidates, weighted by their adjusted probabilities.

This is why stacking too many filters at once starves the model: Top P + Top K + Min P + Top A all in series can leave the AI with one or two viable words at every step, which feels weirdly mechanical. Pick the one filter that addresses your problem and leave the rest at defaults.

Samplers in the Quick Play Wing

These are the controls you can move during a chat without leaving the conversation. They override whatever the active preset sets — only for this chat.

SamplerRangeWhat It Does
Temperature0.0 – 2.0Controls randomness. 0.00.7 is focused and predictable. 0.81.2 is the sweet spot for most chats. Past 1.5 the AI starts taking real risks.
Max Tokens50 – 64,000The longest a single AI reply can be. Lower (256–1024) for snappy back-and-forth, higher (2048–8192) for paragraphs of prose. NemoAI and local-served models cap at 12,000.
Context ModeTokens / MessagesChoose Tokens (budget by raw token count) or Messages (keep the last N exchanges).
Max Context1,024 – 1,000,000When in Tokens mode: the maximum number of tokens of context (system prompt + character + lore + history) sent on each turn. Older messages get trimmed to fit. Auto-caps at whatever the active model actually supports.
Max Messages0 – 1,000When in Messages mode: the number of prior messages the AI sees. The current user message is always included on top of this count. A live token estimate shows under the slider so you can see what that budget actually costs.
Top P0.0 – 1.0Filters which words the AI considers, by cumulative probability. 0.90.95 is typical. Lower values keep the AI on safer vocabulary.
Top K0 – 500Limits the AI to the top N candidate words at each step. 0 disables it. Only some models support this — the slider hides itself when the active model doesn't.
Freq. Penalty-2.0 – 2.0Discourages words the AI has already used a lot. Positive values reduce repetition; negative values encourage it.
Pres. Penalty-2.0 – 2.0Discourages any word the AI has used at all this chat, regardless of how often. Positive values push the AI toward new topics.

Each slider shows the live value, and the model picker quietly disables controls the current model doesn't support — when you switch from a model that supports Top K to one that doesn't, the slider vanishes rather than silently doing nothing.

Samplers in the In-Scene Preset Wing

The Preset wing inside a chat shows a condensed sampler bar — the everyday controls, plus a setting that doesn't exist in Quick Play:

SamplerRangeWhat It Does
Temperature0.0 – 2.0Same as Quick Play.
Top P0.0 – 1.0Same as Quick Play.
Top K0 – 100Same as Quick Play.
Max Response Length256 – 64,000The preset's max tokens, as a slider.
Max Context / Max Messagesup to 2M / 1,000Same Tokens/Messages toggle as Quick Play; the slider caps at 2M tokens to match the largest models on the market.
Prompt Post-ProcessingenumHow the chat history gets normalized before going to the provider. See below for the five modes.

When you move a slider in either Quick Play or this in-scene Preset bar, a small dot appears next to the slider to mark "this is being overridden for the current chat." Both wings share the same underlying override store — moving the slider in one is the same as moving it in the other.

Samplers in the Preset Editor (Dashboard)

The full preset editor (the page you reach from the dashboard or by clicking Edit Preset) opens a Sampler Settings panel that exposes every sampler RoleCall supports. This is the place to set sensible defaults that ship with your preset.

Preset SamplerRangeWhat It Does
Temperature0.0 – 2.0The preset's default temperature. Quick Play can override per chat.
Top P0.0 – 1.0The preset's default top-p.
Top K0 – 500The preset's default top-k. 0 disables it.
Top A0.0 – 1.0A squared variant of Top P. Cuts low-probability words more aggressively. 0 disables it. Most models ignore Top A; primarily useful with local models.
Min P0.0 – 1.0A probability floor. Words below min_p × probability_of_most_likely_word get cut. Lower values (0.010.05) keep more variety. Most useful with local models.
Frequency Penalty0.0 – 2.0Preset-level frequency penalty. In the preset editor, this is a one-sided slider — Quick Play exposes the full -2 to 2 range.
Presence Penalty0.0 – 2.0Preset-level presence penalty. Same one-sided range as Frequency Penalty.
Repetition Penalty1.0 – 2.0A generic repetition penalty. 1.0 = none, 1.11.3 discourages repeats. Less nuanced than frequency/presence penalty. Most often used with local models.
Max Context1,000 – 2,000,000The preset's default context budget in Tokens mode.
Max Messages1 – 1,000The preset's default Messages-mode cap.
Max Tokens50 – 64,000The preset's default max response length.
Context ModeTokens / MessagesWhich trim strategy the preset defaults to.
Prompt Post-ProcessingenumThe five modes described below.

Prompt Post-Processing

Different model providers expect different message structures. Some choke on consecutive system messages. Some want strict user/assistant alternation. RoleCall's Prompt Post-Processing setting normalizes your prompt before it leaves the building.

ModeWhat It DoesWhen to Use
NoneSend everything as-is, in the exact role order RoleCall built it.Default. Works for most modern providers.
MergeCombine consecutive same-role messages (two user messages in a row become one).Models that get confused by "user, user, assistant" patterns.
SemiMerge consecutive same-role messages and convert mid-conversation system messages to user.Older Claude builds and providers that reject mid-chat system roles.
StrictForce the conversation to start with user and strictly alternate user, assistant, user, assistant… Mid-chat system messages get pulled into the prior user message.The original Anthropic Claude API contract; some local model servers.
SingleCollapse the entire prompt into one user message.Models that only accept a single-shot completion call instead of a multi-turn conversation.

If the AI is suddenly producing garbled output, dropping context, or returning provider errors, swap modes one at a time. Most chats run on None; if your provider needs something stricter, the model's documentation usually says so.

Per-Preset Defaults vs. Per-Scene Overrides

Sampler values follow a strict precedence:

  1. The preset's defaults ship with the preset. When you fork a preset, you inherit them.
  2. Per-scene overrides in Quick Play (and the in-scene Preset wing) live in this one chat. Move a slider and a small dot appears next to the slider in the Preset wing to signal "this is being overridden for the current chat."
  3. The model's supported parameters are the final gate. The Quick Play wing reads what each model actually accepts and disables (or hides) sliders the model ignores.

Overrides save automatically. Reset a single slider by matching it back to the preset's value, or clear the override from the Preset wing.

Picking Temperature: Quick Guide

Use CaseTemperatureWhy
Technical / factual writing0.3–0.6Predictable, less off-topic
Steady character voice0.7–0.9Consistent but not robotic
General roleplay0.85–1.1The default sweet spot
Wild creative scenes1.2–1.5Unexpected choices
Chaos / dream sequences1.5–2.0Genuinely strange output

If a model starts making typos or going off the rails, lower temperature before touching anything else. If it feels repetitive, raise temperature and nudge the frequency/presence penalties up by 0.1–0.3 before reaching for more exotic samplers.

Troubleshooting: Which Sampler When

SymptomFirst Thing to TryWhy
AI keeps using the same phrase ("she chuckled softly", "he tilted his head")Freq. Penalty → 0.3–0.6Penalizes words it's already used a lot.
AI keeps returning to the same topicPres. Penalty → 0.3–0.6Penalizes any word it's touched at all this chat.
AI writes too cautiously, dialogue feels stiffTemperature → up by 0.1–0.2Loosens word choice.
AI is hallucinating wild details, going off-scriptTemperature → down by 0.1–0.2Tightens word choice.
AI replies are too shortMax Tokens → upThe model is hitting the response cap mid-thought.
AI replies are wandering, never landingMax Tokens → down, or write a clearer promptThe cap is high enough that the model rambles.
Older turns vanishing from the AI's memoryMax Context → up (within model's actual limit)Context budget is too small.
Replies feel disjointed in a long arcSwitch to Messages mode with N=50–100Token budget is trimming too aggressively.
Output is garbled or returning provider errorsPrompt Post-Processing → try Merge, then Semi, then StrictThe provider needs a different message structure.

Quick Play — The Fast Setup Wing

The Quick Play wing is the single panel where you can swap models, toggle lorebooks, retune samplers, flip Story Director on, and decide whether the AI illustrates the scene — without opening any other wing.

It's organized as one scrollable column. The header has a Save Loadout button so you can snapshot the entire setup as a reusable loadout you can apply to future chats.

What's in the Wing

The wing renders the following sections in order, top to bottom:

SectionPurpose
Character CardThe character you're chatting with — avatar, name, tagline, the first few trait tags. Read-only summary.
Active PresetThe preset driving this chat, with its rough prompt token cost. Read-only summary; edit in the Preset wing.
Session StatsThree numbers: Msgs (message count this chat), Tokens (current context token total), Avg (mean AI response time in seconds).
Context Usage BarA live gauge of how full the context window is. Turns orange past 80% and red past 95%.
Active LorebooksEvery lorebook attached to this chat, with token count and entry count, plus per-lorebook on/off toggles.
ModelThe current model with provider, context length, and a Setup Required badge if the model's provider isn't configured. Click to open the model picker.
GenerationThe sampler sliders described above.
Story DirectorCompendium, DM Assistant, Narrator Agent, Web Search (coming soon), Scene Images, Strip Chat Images.

The Model Picker

Click the model card to open the picker. Models are grouped by source:

  • BYOK providers you've connected (OpenAI, Anthropic, Google, OpenRouter, NanoGPT, etc.) — your own keys, your own models. Each provider appears as its own labeled group.
  • RoleCall In-House — the platform's hosted models, shown last in the list.

Type in the search box to narrow the list across all groups. Models marked Setup Required belong to a provider that isn't configured yet — selecting them is disabled until you connect that provider. Models marked Local run on RoleCall's own servers and don't need a BYOK key.

Each model row shows its context length (e.g. 200K ctx) so you can pick the right size for the conversation. The slider for Max Context automatically caps at whatever the selected model supports.

The chosen model also gates which sampler sliders show up. If you pick a model that doesn't support Top K, the Top K slider disappears from the Generation section rather than silently doing nothing. Same logic for the penalties — when the active model doesn't accept Freq./Pres. Penalty, those sliders hide.

Reading the Context Usage Bar

The context bar lives high in the Quick Play wing and shows two numbers separated by a slash — current tokens used and the model's context limit. The bar itself is a gradient that fills as you fill the context window:

  • Up to 80% — bar uses the wing's accent color. Plenty of room.
  • 80–95% — bar turns orange. The model still works fine, but older turns will start getting trimmed on the next message.
  • Past 95% — bar turns red with a glow. You're at the edge — every new message will push something out. Either summarize, branch into a new chat, or lower Max Context to force aggressive trimming.

The bar updates live as you type and as the AI replies, so you can watch a long scene fill toward its budget in real time.

Saving Your Setup as a Loadout

The Save Loadout button in the wing header captures everything you've configured for the current scene:

  • Preset (and which prompts you've toggled off)
  • Persona
  • Samplers — your per-scene overrides
  • Active lorebooks and their on/off state
  • Model + provider selection
  • Story Director / DM settings — mode, personality, DM model, Storyboards on, Legacy Trackers on
  • Scene Images settings — on/off, frequency, image model
  • Post-Production action chain
  • Immersion modules — every Storyboard and Legacy Tracker toggle, plus sub-feature toggles
  • Stagecraft props
  • Regex rules (which ones are on)
  • Author Note
  • Guides in use

You can then apply that loadout to any future chat. See the Loadouts area in your dashboard for managing saved loadouts.

Story Director Toggles in Quick Play

The Story Director section lets you switch in-scene AI helpers on and off. Each toggle's switch lives in the wing; tapping a switch reveals deeper sub-controls inline.

ToggleWhat It Does
CompendiumAn AI-managed lorebook with typed entries — the model can read from and write to a structured Compendium that auto-organizes characters, locations, items, and other recurring entities. When on, exposes a Compendium Model picker (BYOK-supported) so you can run retrieval and bookkeeping on a different model than the main narrator.
DM AssistantThe Story Director sidecar. When on, takes over narrative direction, tracker updates, and "behind the curtain" coordination. Reveals a deep sub-panel of controls described below.
Narrator AgentLets the narrating AI call tools mid-generation — search lore, update relationships, advance quests, modify inventory. Independent of DM Assistant. Reveals a Max Tool Rounds spinner (1–20, default 8).
Web SearchLets the AI pull real-time web results into context. Coming soon — disabled in the current build. When enabled, a per-worker selector lets you allow Narrator, DM, and/or TV to use it.
Scene ImagesAI illustrates each scene inline as you play. See the Image Generation section below for the sub-controls.
Strip Chat ImagesSends text-only requests when a backend rejects vision payloads — an escape hatch when a BYOK provider says it supports vision but errors on the actual request.

DM Assistant Sub-Controls

When DM Assistant is on, an indented panel exposes:

  • DM Personality — pick from the curated DM personalities (Balanced, Cinematic, Rules-Light, etc.).
  • Story Director Model — a BYOK-supported model picker dedicated to narrative direction and tracker updates. Often a smaller, faster model is fine here.
  • Storyboards — five first-class story state cards. Each card shows live/off, its color, and any sub-features as small dots. Enabling a Storyboard wires it into Story Director rounds.
    • Quest Board — goals, blockers, deadlines, rewards, consequence pressure.
    • Cast — character files, relationships, memories, hidden knowledge, drift.
    • Calendar — story date, timelines, deadlines, prophecies, moons, temporal pressure.
    • Map — locations, regions, travel state, discoveries, hazards, movement history.
    • Renown — public standing, heat, favors banked and owed, oaths, blackmail, influence.
  • Legacy Trackers — twelve optional mechanical panels (Combat, Stats, Inventory, Knowledge, Party, Corruption, Bonds, Events, Cycles, Spellbook, Memories, Creature Codex). These predate Storyboards and are kept for users running game-like scenes. Storyboards above are the first-class system.
  • Default AI Awareness — number input (099) for how often (in turns) immersion state is injected into the AI prompt. 0 means "manual only." Per-panel overrides live in each wing panel's settings menu.
  • Max Tool Rounds — spinner (120, default 4). How many tool-call rounds the DM can use before forcing a text output.
  • DM Temperature — slider (02, step 0.05, default 0.7). Controls randomness in Story Director responses. Independent of the chat's main temperature.
  • Auto-retry invalid tool responses — toggle. When the AI returns a malformed tool call, automatically retry once. Costs extra tokens but recovers cleanly from minor schema slips.
  • Refresh Enabled Boards — runs an AI-powered re-initialization across every enabled Storyboard and Legacy Tracker, one at a time. The button shows progress (Refreshing Cast (2/5)) so you can see which panel is being rebuilt.

These are settings on the chat. Switching one off doesn't delete the data the system has gathered; it just stops the model from being asked to use it for now.

See Story Director and Compendium for the full feature deep-dives.


Image Generation

Image generation has its own dedicated page. RoleCall handles three image flows — uploading attachments through the paperclip / paste, auto-generated scene illustrations via the Scene Images toggle in Quick Play, and the manual Generate Image modal — all of which share the same in-chat Image Gallery and the same per-user rate limit.

The Quick Play toggles for Scene Images and Strip Chat Images (described above under Story Director) are the in-chat entry points; everything else — vision-model requirements, the manual modal's full field list, the image gallery's filters, error messages, and rate limits — lives on the Image Generation page.


When NOT to Use These Controls

Don't yank Temperature first when a model feels "off." Most of the time, the problem is the preset's prompts, the character description, or the chat history — not the sampler. Read what the AI just said and ask whether your prompt is asking for that.

Don't enable every sampler at once. Min P + Top A + Top K + Top P + Repetition Penalty all stacking together is rarely useful. Each one biases the choice; combining them can starve the model of word choices. Pick the right one for the job.

Don't crank Max Context to 2M on a tiny model. The slider lets you, but a model with a 32K context window will still trim at 32K — the higher number is just ignored. Match Max Context to what your model actually supports.

Don't ship a preset with extreme samplers. When publishing a preset, defaults should be sensible (temperature ~0.9, top_p ~0.95, mild penalties). Let Quick Play be where users push to extremes. A preset that hardcodes temperature 1.7 will surprise people in a bad way.


Tips & Common Patterns

Move sliders in small steps. Temperature, frequency penalty, and presence penalty react sensitively to small changes. Bumping temperature from 0.8 to 0.85 is a real, noticeable difference. Bumping it from 0.8 to 1.5 in one go is hard to read.

Tune one slider at a time. When the AI feels off, change one thing, regenerate, see how it feels. Changing temperature, top-p, and frequency penalty simultaneously makes the cause impossible to identify.

Trust the preset's defaults at first. A well-built preset has been tuned by its creator. If a chat feels wrong, start by asking whether the preset matches the model and the character — not by twisting samplers.

Use Messages mode for episodic scenes. If your chats are short, self-contained scenes, "last 50 messages" is a clean way to bound context. If you're running a long arc with persistent memory, stay on Tokens mode and let the budget trim old turns naturally. The live token estimate under the Max Messages slider tells you exactly what a given cap costs.

Save loadouts liberally. If you tuned a great setup for a noir detective scene and you want to run a new one tomorrow with the same character but different lorebooks, save the current one as a loadout before swapping things out.

Match the picker to your day. Cinematic prose at temperature 1.1 with mid-range frequency penalty for evening writing; tighter settings at temperature 0.7 for grinding through a long arc. The two loadouts are the same {{char}} and {{user}} — they're just two different "moods."

Reset overrides individually. A single overridden slider doesn't lock you out of the rest of the preset's defaults. Match that one slider back to the preset's value (or clear the override from the in-scene Preset wing) and the rest of the preset still applies.

Refresh Enabled Boards after a wild swing. If you've been playing for a while and the Story Director's state feels drifted, Refresh Enabled Boards in the DM Assistant panel rebuilds every active Storyboard and Legacy Tracker from the actual chat history. One at a time, so each board gets the AI's full attention.

See Presets for the full preset editor, Story Director for narrative AI helpers, Compendium for the AI-managed lorebook, and Providers & Keys for connecting BYOK models.