Being an LLM certainly isn’t easy. People are constantly taking screenshots of flubs. The LLM is constantly accused of copyright infringement, and nobody is taking pictures of the times Gemini gets the answers right vs. the times it tells people there are three letter ‘N’s in the word ‘mayonnaise’, or when it tells people it’s okay to leave a dog in a hot car seemingly based off of a parody page of the ‘Fake Offensive Beatles’. (https://www.tumblr.com/brucesterling/751526559274401792/is-it-true-or-did-google-gemini-just-make-it-up)
ChatGPT doesn’t have it any easier – it’s lack of real cognition is similarly well-known amongst the tech-literate, a blindfolded dart thrower who sometimes happens to hit the target, but just as often hits the door or the barkeep. When the answers being funneled in come from Reddit, the answers that come out will sound like they came from Reddit. While Gemini can at least generally cite a source, which passes the burden of source verification onto the user, ChatGPT doesn’t even do that much, and famously makes up sources a lot of the time when it’s asked. Remember that lawyer who tried to get ChatGPT to write his defense, only for ChatGPT to majorly embarrass him with non-existent case laws in front of the judge? It’s a stroke of unlikely luck when it gets the answer right, because it’s not thinking, it’s laying out puzzle pieces and cramming the ones with matching slots and arms together no matter what the actual picture on the puzzle pieces might suggest.
And yet in spite of this, people are still using these services to write works that need to be factually accurate and clear. That ‘and clear’ is important, because Google Gemini says you can often identify mushrooms by taste. You can’t. You can often spit test mushrooms, but to say that would mean to lay out what a spit test even is, what tastes you would need to look for, and how to do all of that safely. And Gemini doesn’t have the capacity for that. It must be fast, or users will stop reading, even if it requires simplifying a text to the point of total inaccuracy.
Worse, examples of LLMs that don’t struggle so dearly with attachment to reality, like China’s newest product, DeepSeek, are showing up companies who said they’d hit the upper limits of what computers could do. No, they hit the upper limits of what their model could do, even with almost all the data available online combined with tons and tons of computer resources. DeepSeek takes a lot less (how much less in reality is currently in debate) but manages to produce answers on par with services already out there. It’s tough to be an LLM.