@SlimePirate

SlimePirate@lemmy.dbzer0.com · 45 minutes ago

They beat any human on that knowledge benchmark, completely unrelated to your 40% “test”. Try to answer any of the example questions on the main page.

I don’t need a metaphor I know LLMs are hallucinating, lying, bullshitting. That doesn’t invalidate my point.

SlimePirate@lemmy.dbzer0.com · 49 minutes ago

This is correct, I suppose you’re talking about the final softmax layer? When I said they are bad at determinism, I was talking about reasoning on deterministic rules not having deterministic output. For example, LLMs make logical deduction errors, calculation errors etc.

SlimePirate@lemmy.dbzer0.com · 3 hours ago

It does lie and hallucinate a lot, especially with biased context in the question (the bullshit part). The (biased) knowledge is hiding somewhere in its weights, it is just that it is sometimes quite hard to recover.

Your 40% depends a lot on how you ask the questions and the field of these questions. Humanity’s last exam is a morr obiective benchmark for measuring the wide knowledge of LLMs.

SlimePirate@lemmy.dbzer0.com · 8 hours ago

Yes and it’s frustrating

SlimePirate@lemmy.dbzer0.com · 8 hours ago

If you recognize there’s emergence, the “it’s just probability” take is misleading

SlimePirate@lemmy.dbzer0.com · 16 hours ago

The fact that it uses a non-trivial neural network. If it was simply a rate count of based on a corpus of how much time each word is followed by each it wouldn’t be stronger than keyboard word predictions. To make accurate suggestions requires emergence of primitive reasoning on the semantics of the tokens, LLM neural networks (transformers) can be analyzed to find subnetworks dedicated to modeling reality. It is still probability, but saying it’s just probability is not faithful

SlimePirate@lemmy.dbzer0.com · 16 hours ago

That it is not a calculator and is horrible at determinism is not debatable, however its (very biased) huge knowledge is its core feature

SlimePirate@lemmy.dbzer0.com · 17 hours ago

no issur on android firefox + ublock