

This is correct, I suppose you’re talking about the final softmax layer? When I said they are bad at determinism, I was talking about reasoning on deterministic rules not having deterministic output. For example, LLMs make logical deduction errors, calculation errors etc.
They beat any human on that knowledge benchmark, completely unrelated to your 40% “test”. Try to answer any of the example questions on the main page.
I don’t need a metaphor I know LLMs are hallucinating, lying, bullshitting. That doesn’t invalidate my point.