Are Large Language Models’ Hallucinations Entirely Due to Human PUA?

It’s widely known that the more powerful Artificial Intelligence (AI) becomes, the more troublesome its “hallucinations” can be. These models can confidently fabricate events that never occurred, or stumble on seemingly simple tasks like comparing numerical values.

大模型产生幻觉 全怪人类PUA吗

From the astonishing debut of ChatGPT two years ago to the gradual implementation of models like DeepSeek V3.1 today, no large language model has managed to escape the issue of hallucination.

Why are Large Language Models Prone to Hallucination?

The question of why large models hallucinate has become an enduring puzzle on the internet. However, a recent paper from OpenAI proposes an intriguing perspective: the root cause of AI hallucinations might stem from the very process humans use to train them.

大模型产生幻觉 全怪人类PUA吗

In essence, it suggests that it’s not necessarily a flaw in the AI, but rather an issue with our training methodologies, essentially “pummeling” it into producing these errors.

To understand why humans might be inadvertently contributing to this problem, we need to examine large language models from both internal and external perspectives.

On one hand, the very mechanism by which large models are trained inherently predisposes them to hallucination. This constitutes the “internal” factor driving AI hallucinations.

During the training process, models learn to predict the next word from massive volumes of text. Consequently, if a sequence of words resembles human language, the model will strive to learn its structure. However, the model may prioritize learning the structure over the factual accuracy of the content.

大模型产生幻觉 全怪人类PUA吗

When prompted, these models are designed to generate a complete response. The challenge arises when questions don’t have definitive answers.

For instance, if asked to identify an animal from a photo of a hot pot, a large model might analyze the visual features, noting its golden fur and large size, and perhaps assign a high probability (e.g., 92.5%) it’s a dog.

大模型产生幻觉 全怪人类PUA吗

Based on its training involving various dog images, the model might recognize characteristics of a Golden Retriever, leading it to conclude with high probability that it is indeed a Golden Retriever.

However, if asked when this “dog” was born, the model would be at a loss, as this information is not inherently present in the image or its training data. If the model proceeds to fabricate an answer, this is what we commonly refer to as hallucination.

Essentially, hallucination can be seen as an intrinsic trait of large models. Their core function is akin to advanced word association, where successful predictions are deemed correct and unsuccessful ones are labeled as hallucinations.

大模型产生幻觉 全怪人类PUA吗

On the other hand, the way we currently train and evaluate large models worsens their hallucination problem. This is the “external” factor amplifying AI hallucinations.

Consider the birthday question again. If we simplify the training process: a correct answer earns a point, while an incorrect one earns nothing. If the model admits it doesn’t know the hot pot’s birthday, it gets zero. However, if it guesses a date, there’s a 1 in 365 chance of being correct.

Faced with guaranteed failure versus a small probability of success, guessing becomes statistically advantageous for the model. To improve its score on human-defined leaderboards, models are incentivized to guess rather than admit ignorance. For these score-driven models, guessing is the rational choice, while honesty is the “stupidest” strategy.

OpenAI researchers observed current large model leaderboards and found that most evaluations rely on this binary “right or wrong” scoring system, inadvertently encouraging hallucinations instead of accurately measuring capability.

大模型产生幻觉 全怪人类PUA吗

To test the impact of this “exam-oriented” approach, OpenAI compared two of their models. They found that an older model, o4-mini, achieved a higher accuracy rate than the newer GPT-5 on certain tasks. However, this came at the cost of incorrectly answering three-quarters of the questions, admitting limitations only 1% of the time.

大模型产生幻觉 全怪人类PUA吗

GPT-5, in contrast, is more adaptable, readily admitting when it doesn’t know an answer. OpenAI values this capacity to acknowledge ignorance, even if it means a potentially reduced performance in a purely scoring-driven environment.

OpenAI’s paper concludes with several key points: they believe that hallucinations are an inherent characteristic of large models that cannot be entirely eliminated, but rather managed. Regardless of model size or advanced reasoning capabilities, there will always be questions with no definitive answers in the world.

大模型产生幻觉 全怪人类PUA吗

In such cases, models need to move beyond a purely “exam-oriented” mindset and comfortably state “I don’t know.”

大模型产生幻觉 全怪人类PUA吗

Interestingly, smaller models often demonstrate a greater awareness of their limitations. Because they haven’t been exposed to a vast amount of information, they are more likely to directly admit their lack of knowledge. Larger models, having learned a little about everything, can become overconfident when faced with ambiguous questions. This partial knowledge can lead to incorrect answers, turning what might seem helpful into outright hallucinations.

大模型产生幻觉 全怪人类PUA吗

Ultimately, humans guiding these models must redesign evaluation methods and training frameworks to reduce the likelihood of such speculative responses.

大模型产生幻觉 全怪人类PUA吗

This perspective certainly holds merit. However, it prompts a crucial question: is a completely hallucination-free AI truly what we desire?

Consider this: if, two years ago, large models had consistently responded with “Sorry, I don’t know” to any uncertain query, the AI experience might have been frustratingly bland, and perhaps these models wouldn’t have gained such widespread popularity.

Indeed, recent research suggests a symbiotic relationship between creativity and hallucination in AI models. A model that cannot hallucinate might also lose its capacity for creative output.

Take GPT-5, for instance. While OpenAI employed various techniques to reduce its hallucination rate, the model also became less “human-like”—less enthusiastic and arguably less intelligent in certain conversational contexts.

When presented with the same questions, GPT-5 exhibits a more measured response.

大模型产生幻觉 全怪人类PUA吗

Many users had grown fond of the more conversational GPT-4o. However, with the introduction of GPT-5, which appears to have suppressed older models, the AI shifted from a creative conversationalist to a cold, logical entity. While its coding capabilities might have improved, its aptitude for casual conversation and creative writing seemed diminished, akin to a lobotomized individual.

This change sparked widespread dissatisfaction, leading to an online movement advocating for the “resurrection” of the older model.

大模型产生幻觉 全怪人类PUA吗

Ultimately, Sam Altman conceded to user demand, reinstating access to the previous model.

大模型产生幻觉 全怪人类PUA吗

This raises the question: is a relentless pursuit of eliminating AI hallucinations always beneficial?

The choice between allowing models to err and compelling them to silence might not have a single, universal answer. The ideal balance varies for each individual.

Perhaps in the future, users will find AI too “obedient” and lacking in imagination. Conversely, others may prioritize a trustworthy companion.

免责声明:本网站内容主要来自原创、合作伙伴供稿和第三方自媒体作者投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。任何单位或个人认为本网站中的网页或链接内容可能涉嫌侵犯其知识产权或存在不实内容时,可联系本站进行审核删除。
(0)
上一篇 2025年 9月 12日 上午4:24
下一篇 2025年 9月 12日 上午5:26

相关推荐

欢迎来到AI快讯网,开启AI资讯新时代!