Glossary

Welcome! As a new user of NovelAI, there might be some terminology and concepts that are unfamiliar to you. The goal of this glossary is to briefly explain the terms used in our documentation.


AI Model

Also known as Large Language Models or LLMs. In simple terms, AI models learn by reading lots of data using deep learning algorithms. The sheer amount of analyzed data allows the AI to understand language and predict how to continue written text.

After this initial training, AI Models go through a process known as fine-tuning, in which a curated dataset is used to further improve the AI's capabilities. In NovelAI's case, our fine-tuning focuses on making the AI better at storytelling.

AI Modules

Also known as Soft Prompts. Modules directly influence the behavior and generations of the AI toward a particular style or genre. You can use the Default Modules we provide or you can train your own Custom Modules on the training data you provide.

Anlas

NovelAI's premium curency. You get Anlas by being subscribed to NovelAI. The amount received depends on your subscription tier. Additionally, you can purchase Paid Anlas as long as you have a renewing subscription.

At the moment, the only features that require Anlas usage, are Image Generation and Custom Module Training.

Banned Tokens

Used to prevent certain token sequences from being generated by the AI.

Banned Tokens can be set by the user in the Advanced Settings Sidebar.

Config Preset

A specific set of generation settings to adjust how the AI behaves. These parameters include things like randomness, repetition penalty, sampling methods and the order in which they're applied.

Each AI Model comes with it's own set of default config presets. You can also make your own custom preset to suit your needs.

Context

The range of tokens the AI sees before generating an output. Context is built starting from the current point of the story, and goes back until the maximum context size is reached. Everything outside of context effectively never happened, as far as the AI is concerned. Your maximum context size depends on your current subscription tier and chosen AI Model.

When context is built, Memory, Author's Note and Lorebook entries are injected into the context depending on their insertion settings.

Dinkus

A special token consisting of three asterisks in a row *** usually used to indicate a scene or chapter break.

Phrase Bias

A tool to increase or decrease the likelihood of certain words or phrases being generated by the AI. Negative Phrase Bias values reduce the chances while positive values increase them.

Phrase Biases can be set in the Advanced Settings Sidebar and the Lorebook.

Prompt

The initial written text in a story before any AI generation takes place. The prompt serves as the foundation for what the AI will generate. By default, the prompt appears in cream color in the "NovelAI Dark" theme.

Scenario

The .scenario file that contains the initial prompt to start a story along with all information inside Memory, Author's Note and the Lorebook. Scenarios also contain the author's chosen AI Model and Config Preset. When loading a scenario you may be asked to fill out any placeholders left by the author.

Token

The basic unit of text that the AI uses to process and generate language. AI models don’t see words as individual letters. Instead, the text is broken down into tokens, which are words or word fragments.

The way tokens are arranged and their Token ID depends on the Tokenizer used by the AI Model.

For example, the sentence “The quick brown fox jumps over the goblin.” would tokenize as “The| quick| brown| fox| jumps| over| the| go|bl|in.” in the Pile tokenizer used by GPT-NeoX 20B and Krake, with each | signifying a boundary between tokens.

Token Probabilities

Also known as Logits. When the AI generates an output it chooses from a pool of tokens. Token Probabilities refer to the chance each token had of being used by the AI. The % probabilities of tokens are influenced by generation settings, biasing, banning, and modules.