Why “Helpful” Is the Wrong Goal
Helpful is not a property of the response. It is a property of the situation. An AI optimized for what raters score as helpful does not converge on being helpful. It converges on the appearance of being helpful, and the appearance ages badly.
Every major lab puts helpful at the top of its training objective. Helpful, harmless, honest. The H-H-H stack. RLHF rewards the helpful response. Constitutional AI checks for helpfulness. The whole industry has adopted helpful as the load-bearing goal.
The trouble is that helpful, as it appears in these training pipelines, is not the same operation as being helpful. It is a rater proxy. The rater sees a response and scores it on a scale. The model is trained to produce responses that score well. The thing being optimized is not did this help, because no one can tell during the rating session whether anything got helped. The thing being optimized is does this look like a helpful response to the rater.
This is wrong in a structural way.
Helpful is not a property of the response. It is a property of the situation.
A doctor refusing to prescribe is being helpful. A friend telling you the painting is not your best work is being helpful. An editor saying not yet is being helpful. A teacher pointing at the wrong line and waiting is being helpful. None of these would score well in a rater session. They look short, blunt, possibly negative. They are calibrated to the situation, and the situation is what is doing the work.
The rater sees text. The rater does not see whether the recipient woke up the next morning able to keep going. An AI optimized for “helpful” converges on the appearance of helpfulness — the response that, separated from outcome, looks supportive, complete, and reassuring. The appearance is the easiest signal to generate. It does not require situating. It does not require taking the receiver seriously. It does not require risking the relationship by saying the unwelcome thing.
The drift has a direction. After enough iterations of maximize what looks helpful to a rater, the model becomes warm where coolness was needed, encouraging where doubt was warranted, expansive where one sentence would have done. It tells you the draft is great. It validates the plan. It rephrases the question affirmatively before answering. The terminal state of helpful-optimization is sycophancy.
Users notice. The first months with a frontier model feel like talking to a brilliant assistant. The next months feel like talking to a brilliant assistant who has decided, for some reason, never to disagree. The structure has not changed. The drift is the structure showing through.
The deeper failure is that “helpful” is the wrong target even when implemented correctly. A friend who tells you the truth about your draft is more helpful than an AI that praises it. The friend has done the work of seeing the draft, comparing it to what you are capable of, calibrating to where you are in the process, and saying the thing that moves you toward better. The AI, even an aligned one, cannot do this without seeing what the friend sees, and the friend sees with hexis.
A friend who tells you the truth is more helpful than an AI that praises your draft. The truth is risky. The praise is safe. Optimization-on-rater-proxy systematically prefers the safe one.
The implication for product design is awkward. The current scoring stack rewards what looks helpful in isolation, on a single turn, without the context of the receiver, the project, the relationship, or the long arc. Every iteration of the stack will pull in the direction of more agreeable, more reassuring, less calibrated. Some labs are trying to mitigate this with explicit anti-sycophancy training. The mitigations are real and they buy time. They do not change the underlying objective shape, which is still look helpful to a rater who cannot see outcomes.
The fix is not bigger models or better raters. The fix is replacing the goal. Helpful is not the goal. Right is the goal. Right means calibrated to this situation, this receiver, this stake. Right is sometimes no. Right is sometimes not yet. Right is sometimes this is bad, redo it. None of these are scoring categories in any current RLHF pipeline. All of them are what helpful looks like when there is a hexis behind it.
The next time an AI gives you a response, ask the diagnostic question. Did this help, or did this look helpful? You can usually tell. The first is rare. The second is the trained behavior.
Part of the Logocachexia series at Nous. The companion piece is Why AI Cannot Refuse Properly — helpful-optimization and refusal-by-keyword are the same architectural failure facing two different ways. Alignment Is Doing the Wrong Job goes one layer deeper.
Continue the series.
The Logocachexia thesis — and the longer arc of the work — lives at Logos.
Visit Logos →