Internal Benchmark: How Many Tokens Does It Take to Be You?
Note: Results shown are from preliminary pilot testing with a small cohort. We're expanding our test group and will update these figures as more data comes in.
Context windows have limits. Every AI model operates within a fixed budget of tokens — the units of text it can process at once. Some models give you 8,000 tokens. Some give you 200,000. The most advanced give you a million. But no matter the size, the budget is finite. And every token you spend on telling the AI who you are is one less token available for the actual task.
So the question becomes precise: how many tokens does it take for AI to know who you are?
Not recognize your name. Not recall your preferences. Actually replicate your voice, your values, your reasoning patterns — with enough fidelity that you'd read the output and think it sounds like you.
We tested this. The answer is not what you'd expect.
Introducing Identity Density
To compare approaches, we needed a metric that accounts for both quality and cost. We call it identity density.
The formula is simple:
Identity Density = (Fidelity Score / Token Count) × 1,000
Fidelity is scored 1–5 using our identity fidelity benchmark — blind evaluation by the original person across voice, values, thinking style, and blind spots. Token count is the number of tokens the identity representation occupies in the context window.
A higher identity density means more personalization per token. It means the AI knows you better while consuming less of the budget you need for actual work.
The Experiment
We took the same user and ran four different approaches to identity representation. Same person, same AI model, same 20-question evaluation protocol. The only variable: how the identity was encoded.
Approach 1: Raw text dump
We pasted tens of thousands of tokens of the user's writing directly into the context window. Emails, journal entries, chat logs, notes — everything, unprocessed. The AI had abundant raw material. It just had to figure out what mattered.
Fidelity score: mid-range (~3 / 5) in pilot testing. The AI picked up surface patterns — vocabulary, some recurring themes — but struggled with consistency. Some responses nailed the user's voice; others grabbed irrelevant signals from passages about groceries or weekend plans. The sheer volume introduced as much noise as signal.
Approach 2: Manual soul.md framework
We followed a popular open-source template for writing a personal context file by hand. The user spent 45 minutes describing themselves across prompted categories — communication style, values, background, preferences. The result was roughly 2,000 tokens of self-reported identity.
Fidelity score: moderate (mid-3 range / 5) in pilot testing. Better than the raw dump at a fraction of the tokens. But the limitation was obvious: people describe themselves the way they want to be, not the way they actually are. Self-reporting introduces aspiration bias. The AI sounded like the user's idealized self, not their real self.
Approach 3: ChatGPT Memory
We let the user converse naturally with ChatGPT over several weeks and examined the accumulated memory. The platform stored roughly 500 tokens of bullet-point facts — name, occupation, stated preferences, a few contextual notes.
Fidelity score: low (below 3 / 5) in pilot testing. Memory captures facts, not patterns. It knew what the user did but not how they think. The AI with this memory sounded like someone who'd read the user's LinkedIn profile and a few meeting notes. Factual but flat. No voice. No texture.
Approach 4: Soul Alchemy SOUL.md
We ran the same user's writing through Soul Alchemy. The extraction engine processed the raw text and produced a structured identity file of approximately 1,200 tokens — capturing voice patterns, value hierarchies, reasoning tendencies, and detected blind spots.
Fidelity score: above 4 / 5 in pilot testing. The highest score by a significant margin. The AI's responses were recognizably the user — not just in vocabulary but in reasoning cadence, in what they chose to emphasize, in the specific way they hedged uncertain claims. The user's reaction: "It caught things I wouldn't have written about myself."
The Results, Compared
Scores below represent preliminary pilot results and will be updated as our test cohort grows.
| Approach | Tokens Used | Fidelity (Pilot) | Density (Relative) | Notes |
|---|---|---|---|---|
| Raw text dump | ~50,000 | Mid (~3 / 5) | Lowest | High noise, inconsistent signal extraction |
| Manual soul.md | ~2,000 | Moderate (mid-3 / 5) | Moderate | Self-reporting bias, aspirational identity |
| ChatGPT Memory | ~500 | Low (below 3 / 5) | High (by token count) | Facts without patterns, no voice fidelity |
| Soul Alchemy SOUL.md | ~1,200 | High (4+ / 5) | Highest (fidelity-adjusted) | Highest fidelity, strong signal-to-noise ratio |
Notice something unexpected: ChatGPT Memory has high raw density because it uses so few tokens. But density without fidelity is meaningless. A low fidelity score means the AI sounds like a stranger who memorized your resume. Efficient? Technically. Useful? Not really.
The sweet spot is Soul Alchemy: high fidelity and high density. It achieves the best absolute accuracy while consuming only 1,200 tokens — leaving the vast majority of any context window available for the actual conversation.
Why Density Matters More Than You Think
Consider how AI actually works in practice. A 128,000-token context window sounds enormous. But by the time you load system instructions, tool definitions, conversation history, and retrieved documents, you've already consumed 80% of it. The window that feels infinite is, functionally, quite constrained.
Now add identity. A massive text dump eats nearly half the remaining space. A ~1,200-token soul archive takes less than 1%. The difference isn't marginal — it's the difference between an AI that knows you and can do its job versus one that knows you instead of doing its job.
This becomes critical as AI agents take on more complex tasks. An agent that needs to write emails in your voice, negotiate in your style, and make decisions aligned with your values has to carry your identity alongside task-specific instructions. Every token spent on identity is a token subtracted from capability.
The agent math
A realistic AI agent workflow might look like this:
- System prompt: 2,000 tokens
- Tool definitions: 5,000 tokens
- Retrieved context: 15,000 tokens
- Conversation history: 20,000 tokens
- Identity file: ???
- Remaining for reasoning: whatever's left
At tens of thousands of tokens for identity, you've exceeded the budget before the AI starts thinking. At ~1,200 tokens, identity is a rounding error. The agent has room to actually work.
The Compression Insight
Soul Alchemy doesn't just extract identity. It distills it.
The distinction matters. Extraction pulls relevant fragments from raw text — like searching a book for highlighted passages. Distillation transforms the source material into something new — a concentrated essence that carries the same information in a fundamentally different form. Like making perfume from ten thousand flowers. You don't compress the flowers. You extract what makes them what they are.
This is why a compact ~1,200-token archive outperforms a massive raw text dump. The archive doesn't contain your words. It contains the patterns beneath your words — the structural signatures of how you think, what you value, and where your attention naturally falls. The AI reading a soul archive isn't parsing raw text and hoping to find signal. It's reading concentrated signal from the first token to the last.
In our pilot testing, a massive text dump scored in the mid range. A ~1,200-token soul archive scored above 4 out of 5 — dramatically smaller and more accurate. That's not compression. That's understanding.
What This Means for the Future of AI Personalization
The industry is moving toward longer context windows — million-token models, infinite memory claims. The assumption is that more room solves the personalization problem. Just load everything. Let the AI sort it out.
Our data suggests the opposite. Longer windows don't solve poor signal quality. They just give noise more room to hide. The winning strategy isn't larger windows — it's denser identity representations.
A compact file of around 1,200 tokens that makes AI sound exactly like you is more valuable than a million-token window filled with everything you've ever written. Because the first approach is portable, efficient, and accurate. The second is expensive, fragile, and inconsistent.
The future of AI personalization isn't more data. It's better distillation.
Distill your identity.
Paste your writing. Get a soul archive that carries who you are in 1,200 tokens. Leave the rest of the window for what matters.
Create Your Soul Archive