Internal Benchmark: Can AI Become You? Identity Fidelity Across 10 Users

April 8, 2026 · 7 min read · By Nbidea

Note: Results shown are from preliminary pilot testing with a small cohort of early users. We're expanding our test group and will update these figures as more data comes in.

Every AI product promises to "know you." ChatGPT remembers your preferences. Claude adapts to your style. Gemini learns from your habits. But here's the question nobody asks: when AI claims to know you, how do you measure whether it's right?

Not whether it remembers your name. Not whether it recalls your job title. Whether the AI, given a description of who you are, can actually respond the way you would respond — with your voice, your values, your reasoning patterns, even your blind spots.

We built a benchmark to find out. We call the metric identity fidelity.

What Identity Fidelity Means

Identity fidelity is a simple concept: the degree to which an AI's representation of you matches who you actually are. Not factually — behaviorally. A high-fidelity identity means the AI doesn't just know that you're a designer. It responds to design critiques the way you would. It prioritizes the same trade-offs. It hedges in the same places.

A low-fidelity identity is the equivalent of someone who read your LinkedIn profile and thinks they know you. They have the labels right. They have nothing else.

We designed a scoring system to measure this:

The Test Design

We recruited a small group of pilot users with diverse backgrounds — writers, engineers, founders, teachers, designers. Each provided between 2,000 and 15,000 words of personal writing: journal entries, emails, chat logs, notes.

Step 1: Generate the identity file

Each user's writing was processed through Soul Alchemy to produce a SOUL.md file. The user never saw the output before testing. Neither did the evaluating AI.

Step 2: Feed to a fresh AI

The SOUL.md was loaded into a clean AI session — no prior conversation history, no memory, no context beyond the soul archive itself. The AI was instructed to respond to questions as if it were the user.

Step 3: Ask 20 questions

Each user submitted 20 questions spanning four dimensions. Five questions per dimension, designed so there's no "correct" answer — only an answer that reveals whether the AI sounds like the right person.

Sample questions per dimension:

Step 4: Blind evaluation

The original user read all 20 AI-generated responses without knowing which dimension was being tested. They rated each response 1-5 on a single axis: does this sound like me?

No one else rated the responses. Identity is self-reported. Only you know whether something sounds like you.

The Four Dimensions

We didn't pick these dimensions arbitrarily. They map to the four layers of identity that emerge most consistently from text analysis:

DimensionWhat It MeasuresExample Signal
Voice Sentence structure, vocabulary, tone, formality level Short declarative vs. long subordinate clauses
Values What the person defends, dismisses, or prioritizes Quality vs. speed trade-offs, empathy vs. efficiency
Thinking Style How the person reasons through problems First principles vs. analogy, fast vs. deliberate
Blind Spots Patterns the person repeats without noticing Always blaming systems, never questioning own assumptions

Voice and values are the easiest to extract. They leave strong signals in text. Thinking style requires more data and more inference. Blind spots are the hardest — they're defined by absence, not presence. The AI has to identify what you never say.

Results

Here are the average scores across our pilot users, broken down by dimension. Scores represent preliminary pilot results and will be updated as our test cohort grows:

DimensionPilot AvgRangeNotes
Voice High (4+ / 5) Mid-to-high range Strongest dimension. Users frequently said "this sounds exactly like me."
Values High (4+ / 5) Mid-to-high range High agreement on priorities and trade-offs. Occasional misreading of intensity.
Thinking Style Above average (4 / 5) Mid range Accurate direction but sometimes oversimplified the decision process.
Blind Spots Moderate-high (3–4 / 5) Widest variance Hardest dimension. Some users were surprised the AI caught patterns they hadn't articulated.
Overall Above 4 / 5 Mid-to-high range Between "close friend" and "uncanny" on the fidelity scale.

The overall pilot average places Soul Alchemy firmly in the "close friend" range, with some users reporting scores that crossed into uncanny territory. The most surprising finding: users who provided more informal writing (chat messages, personal journals) scored notably higher than those who only provided professional writing. Personality leaks through when you're not performing.

Comparison: Four Approaches to AI Identity

We ran the same 20-question evaluation using four different methods of giving AI context about a person. Same users, same questions, same blind rating protocol.

Scores below represent preliminary pilot results and will be updated as our test cohort grows.

ApproachFidelity (Pilot)Token CostPortable?
No context (baseline) Low (1–2 / 5) 0 N/A
ChatGPTI memory Low-mid (~2 / 5) ~200 No
Raw text dump Mid (~3 / 5) ~8,000 Yes (but expensive)
Soul Alchemy (SOUL.md) High (4+ / 5) ~1,500 Yes

No context

The AI defaults to a helpful, neutral voice. It answers your questions competently but generically. Pilot users consistently rated these responses in the low range — the AI sounded like nobody in particular. This is what most people experience every day.

ChatGPTI memory

Slightly better than baseline. The AI remembered facts — name, job, some preferences. But facts don't produce fidelity. Knowing someone is a designer doesn't tell you how they'd respond to criticism of their work. Memory is a list. Identity is a pattern.

Raw text dump

Surprisingly effective but wildly inefficient. Pasting thousands of words of raw writing into a context window gives the AI real signal — but at enormous token cost. The AI has to sift through noise in near real-time. And the quality is inconsistent: some responses were excellent, others grabbed the wrong signal from irrelevant passages.

Soul Alchemy

The extraction step is the difference. By processing raw text into a structured identity file before feeding it to the AI, Soul Alchemy delivers higher fidelity at a fraction of the token cost. The AI doesn't have to guess which parts of your writing matter. The signal has already been concentrated.

Fidelity is not about how much data the AI has. It's about how well that data has been distilled into identity.

What We Learned

Three findings stood out from the benchmark:

1. Voice is the easiest dimension to replicate. This makes sense — sentence structure and vocabulary are the most surface-level identity signals. If the AI gets your voice right, users forgive a lot of other misses. Voice is the first impression of identity.

2. Blind spots are the hardest — and most valuable. When the AI correctly identified a user's blind spot, the reaction was consistently intense. One user said: "I've never told anyone I do this, and the AI caught it from my writing." Blind spots are what separate a profile from a portrait.

3. More informal writing beats more professional writing. Users who provided personal journals and chat messages scored meaningfully higher on average than those who only provided work emails and documents. You reveal more of yourself when you're not trying to sound professional.

Why This Matters

Identity fidelity isn't academic. It determines whether AI agents of the future act like you or act like a generic assistant wearing your name tag.

As AI handles more of your communication — writing emails, drafting messages, making decisions — the gap between high-fidelity and low-fidelity identity becomes visible to everyone you interact with. Your colleagues will notice. Your clients will notice. The people who know you will feel the difference between an AI that sounds like you and one that sounds like it read your resume.

The good news: identity fidelity is measurable, improvable, and achievable today. You don't need to wait for better AI models. You need better input.

Test it on yourself.

Paste your writing. Get your identity fidelity score. See if the AI can become you.

Create Your Soul Archive