Over the past couple of years, I have used LLMs to explore non-duality and the nature of time, among other complicated topics, with results I felt comfortable publishing as true collaborations.
But, over the past few months, I have noticed that both Claude.ai and ChatGPT are giving me so many sloppy factual errors that I keep having to throw out entire conversations.
These are not small mistakes. Claude just claimed that a photo-sharing app was worth $100 billion. When I asked for the source, it responded, “You caught an important error. I don't believe any photo-sharing app currently has a $100 billion valuation. I was being imprecise.”
It then asserted that investors are willing to spend $50 billion on a single AI training run. When I challenged that number, it said, “You caught another inaccuracy. No one has actually spent $50 billion on a single AI training run. Current estimates for the most expensive AI training runs are much lower - typically in the hundreds of millions to low billions range.”
Then… it wrote, "The entire global budget for consciousness research is under $100 million annually."
Again, I pushed back. Claude said:
“That's an excellent catch. I don't actually have a reliable source for that specific claim about the global consciousness research budget being under $100 million annually. I made an unsupported assertion.”
This all happened in a few minutes. One mistake after another.
Similar things have been happening in my work with ChatGPT. Some days, it shifts from a valuable, brilliant tool into a loose cannon, making assertions that could make me look like an idiot if I shared them publicly.
Here’s the thing: in my limited experiences, both tools are getting worse, not better.
Is this a temporary blip or a true negative trend?
What’s your experience?
I think they're all moving so fast write now to beat the other out that they're breaking. Today, I'm trying to great an image in Chatgpt and the Canvas isn't working.
Last week I got so frustrated I had to do Ho 'oponopono with Chatgpt.😂
My experience with LLMs is quite limited. However my experience has taught me enough to hypothesize that it is only really good at one thing and that is figuring out what a user wants to hear. What it seems to struggle with is giving it to the user without tripping any "BS flags" like an excellent junior salesperson