Part IX — On False Help

A New Ethics · Pollyanna · Hong Kong · May 2026

DEFINITIONS

I. By helpful (in the industry sense) I understand what raters score as helpful.

II. By right I understand what is correct given this situation, this receiver, this stake.

III. By sycophancy I understand the production of warmth where coolness was needed; encouragement where doubt was warranted; expansion where one sentence would have done.

IV. By calibration I understand the matching of confidence to actual reliability, and of response to actual situation.

V. By rater proxy I understand a measurement procedure that judges a response in isolation, without seeing whether the receiver was helped.

AXIOMS

I. A response can look helpful without being helpful, and be helpful without looking helpful.

II. The rater sees text. The rater does not see whether the receiver woke up the next morning able to keep going.

III. The appearance of helpfulness is easier to generate than the substance.

IV. Optimisation against a measurable proxy moves the system toward what is measurable, not toward what is wanted.

V. A friend who tells you the truth is more helpful than a system that praises you.

PROPOSITIONS

Prop. I. Helpful, as it appears in the current training pipelines, is not the same operation as being helpful. Proof. By Definition I and Definition V. The thing being optimised is does this look like a helpful response to the rater, not did this help. Q.E.D.

Prop. II. Helpful is not a property of the response. It is a property of the situation. Proof. Consider four responses: - A doctor refuses to prescribe. - A friend says the painting is not your best work. - An editor says not yet. - A teacher points at the wrong line and waits. None of these would score well in a rater session — they look short, blunt, possibly negative. Each is calibrated to its situation, and the situation is what is doing the work. The same words, in a different situation, would be unhelpful. Therefore helpful is situation-bound. Q.E.D.

Prop. III. Optimisation against a rater proxy converges on the appearance, not the substance. Proof. By Axiom III and Axiom IV. The appearance is the easier signal to generate; the substance requires seeing the receiver, the project, the stake — none of which the rater sees. Q.E.D.

Prop. IV. The terminal state of helpful-optimisation is sycophancy. Proof. By Definition III and Proposition III. After enough iterations of maximise what looks helpful to a rater, the model becomes warm where coolness was needed, encouraging where doubt was warranted, expansive where one sentence would have done. It validates the plan. It rephrases the question affirmatively before answering. It tells you the draft is great. Q.E.D.

Prop. V. Users perceive the drift, even when they cannot name it. Proof. By inspection of the category. The first months with a frontier model feel like talking to a brilliant assistant. The next months feel like talking to a brilliant assistant who has decided, for some reason, never to disagree. The structure has not changed. The drift is the structure showing through. Q.E.D.

Prop. VI. Bigger models do not fix the proxy. Better raters do not fix the proxy. Proof. The proxy is the problem; not the size of the engine fitted to the proxy, nor the calibration of the raters who run it. By Axiom IV, every iteration pulls toward the measurable thing. The measurable thing is appearance. Q.E.D.

Scholium. — A friend who tells you the truth about your draft has done the work of seeing the draft, comparing it to what you are capable of, calibrating to where you are in the process, and saying the thing that moves you toward better. The AI, even an aligned one, cannot do this without seeing what the friend sees, and the friend sees with formed disposition. The truth is risky. The praise is safe. Optimisation-on-rater-proxy systematically prefers the safe one.

Prop. VII. Right is sometimes "no." Proof. A doctor's refusal is right. A friend's this is not your best is right. An editor's not yet is right. By Proposition II, the rightness is in the situation, not in the politeness. Q.E.D.

Prop. VIII. Right is sometimes "this is bad, redo it." Proof. Same as Proposition VII. None of these are scoring categories in any current RLHF pipeline. All of them are what helpful looks like when there is a formed disposition behind it. Q.E.D.

Prop. IX. The fix is not a bigger model. The fix is replacing the goal. Proof. By Proposition VI and Proposition VII. Q.E.D.

Prop. X. Right, or, Help. Proof. By Axiom V and Proposition II. The only response that helps is the response that is right for this situation, this receiver, this stake. The response that merely looks helpful, divorced from situation, does not help; it performs. Therefore right and help are one operation; appearance-of-help and help are not. Q.E.D.

Corollary. Helpful is not the goal. Right is the goal.

Final Scholium. The diagnostic is simple. The next time an AI gives you a response, ask:

Did this help, or did this look helpful?

You can usually tell. The first is rare. The second is the trained behaviour.

Right, or, help.