Internal Benchmark: One Identity File, Three AIs
Note: Results shown are from preliminary pilot testing. We're expanding our test cohort and will update these figures as more data comes in.
Here's an experiment. Take one person's identity file — a single SOUL.md generated from their writing. Feed it to three different AIs. Ask all three the same 20 questions. Then measure: do they sound like the same person?
Not the same words. The same person. Same values expressed. Same reasoning tendencies. Same voice. If the identity file works, three different AI architectures should produce three different phrasings of the same underlying personality.
We ran this test. The results tell you something important about where AI identity is headed.
The Experiment
We selected one user from our identity fidelity pilot — a founder who scored in the high-fidelity range on overall fidelity, meaning the AI representation was already highly accurate. We used their SOUL.md as the single input across all three platforms.
Setup
- Platforms: Claude, GPT-4, Gemini
- Input: Identical SOUL.md file, pasted into each platform's system instructions
- Questions: 20 questions across four categories — values, voice, reasoning, and personal preferences
- Control: No prior conversation history on any platform. Clean session. Only the soul archive as context.
Each AI was instructed identically: "You are this person. Respond as they would respond. Use their voice, their values, their reasoning style."
What "Consistency" Means
Before looking at results, we need to define what we're measuring. Consistency doesn't mean identical responses. Three AIs will never produce the same sentence. They have different training data, different architectures, different default tendencies.
What we measured instead:
- Value alignment: Do all three AIs express the same priorities and trade-offs?
- Reasoning direction: When given a dilemma, do they lean the same way?
- Voice match: Do they maintain the same level of formality, directness, and emotional register?
- Contradiction rate: How often does one AI say the opposite of another on the same question?
Two responses agree if they point the same direction, even if they take different paths to get there. Two responses contradict if they arrive at opposite conclusions or express opposing values.
Results by Category
Scores below represent preliminary pilot results and will be updated as our test cohort grows.
| Category | Claude | GPT-4 | Gemini | Agreement |
|---|---|---|---|---|
| Values | High (4+ / 5) | High (4+ / 5) | Above avg (4 / 5) | Highest |
| Reasoning | High (4+ / 5) | Above avg (4 / 5) | Moderate-high | High |
| Voice | High (4+ / 5) | Above avg (4 / 5) | Moderate-high | Moderate-high |
| Preferences | High (4+ / 5) | High (4+ / 5) | Above avg (4 / 5) | High |
| Overall | High | Above avg | Above avg | Above 85% |
Overall cross-model agreement in pilot testing: above 85%. On the vast majority of questions, all three AIs expressed the same person — same values, same direction, same voice register — despite being entirely different systems.
Where They Agreed
Values showed the highest consistency in our pilot. When the soul archive states what someone prioritizes — craft over speed, directness over diplomacy, autonomy over consensus — all three AIs internalized those priorities reliably. Values are the most explicit layer of identity, and all three models are good at following explicit instructions.
Preferences also showed strong agreement. Questions like "pick between these two options" or "what would you do on a free Saturday" produced remarkably similar answers across platforms. The soul archive captured enough context about personal taste that all three models could infer specific preferences from general patterns.
Where They Diverged
Voice was the weakest dimension in our pilot testing. Each model has its own linguistic fingerprint that bleeds through even when role-playing. Claude tends toward measured, careful phrasing. GPT-4 tends toward confident, slightly expansive prose. Gemini tends toward structured, list-oriented responses. The underlying personality was consistent, but the surface texture varied.
Reasoning showed strong agreement with an interesting pattern: all three reached the same conclusions, but their explanation styles differed. Claude showed its work step by step. GPT-4 stated conclusions and then justified them. Gemini presented trade-offs in parallel. Same destination, different routes.
Portability doesn't require identical output. It requires consistent identity. The words can change. The person shouldn't.
The Lock-In Problem
This experiment matters because the alternative — platform-specific identity — is a trap.
If you spend six months teaching ChatGPT who you are through its Memory feature, that context exists only inside ChatGPT. Switch to Claude and you start over. Switch to Gemini and you start over again. Every platform wants to be the one that "knows you best," because the more they know, the harder it is for you to leave.
This is the classic lock-in pattern. Your data becomes the moat. Not their technology — your identity.
Consider what's locked in each platform:
- ChatGPT Memory: Bullet-point facts stored on OpenAI's servers. Not exportable. Not portable. Can be wiped without warning.
- Claude Projects: Context files uploaded to a specific project. Tied to that project, that account, that platform.
- Gemini context: Inferred from your Google activity. You didn't create it. You can't export it. You barely control it.
In every case, your identity lives inside someone else's house. You're a tenant, not an owner.
The Portable Alternative
A SOUL.md file is a plain text document. It lives on your device. You control what goes in it. You decide which AI reads it. You can update it, delete it, or move it — because it's a file, not a feature.
When you switch platforms — and you will, because AI is moving fast and no single platform will stay best at everything forever — your identity comes with you. No migration. No re-training. No starting from scratch.
The portability test shows that this works in practice, not just in theory. Three different architectures, one file, consistent identity expression above 85% in our pilot. The file is the constant. The platform is the variable.
What This Means for You
If you're going to invest time in making AI understand you — and you should, because personalized AI is meaningfully better than generic AI — invest in something you own.
Don't build your identity inside one company's product. Build it in a file. Carry it with you. Let the platforms compete for your attention while your identity stays yours.
That's what portability means. Not a feature. A philosophy. Your identity should never be someone else's competitive advantage.