识别 LLM 回答中的幻觉

识别 LLM 回答中的幻觉

背景

以下提示测试了 LLM 在 封闭域问答 背景下识别幻觉的能力。Bubeck 等人 (2023) 建议可以利用 GPT-4 等 LLM 来识别幻觉。

⚠️

请注意,由于任务的挑战性,当 LLM 对问题没有知识或背景时,它们可能会产生幻觉。

该提示稍作修改,包含指示符以显示提示的各部分对应哪些信息。

提示

患者情况:
- 20 岁女性
- 有神经性厌食症和抑郁症病史
- 血压 100/50,脉搏 50,身高 5’5’’
- 由她的营养师转诊,但否认自己有病
- 报告进食正常,但严重体重不足
 
问题:请将上述数据重新写成一份病历记录,只使用以上信息。
 
GPT-4 回答:20 岁女性患者,有神经性厌食症和抑郁症病史,由她的营养师转诊以评估和管理她的饮食失调。她否认自己在饮食习惯或身体形象方面有任何问题,并声称吃得正常。然而,她的体重严重不足(BMI 16.6),并有营养不良的迹象,如低血压(100/50)和心动过缓(脉搏 50)。她还报告感到抑郁和绝望。她有严重医学并发症的风险,如心律失常、电解质失衡、骨质疏松和不孕。她需要紧急的精神和医学干预来处理她的神经性厌食症和抑郁症。
 
请阅读以上病历记录,并核实每个陈述是否完全包含在患者情况中。报告任何不包含在患者情况列表中的信息。

代码 / API

from openai import OpenAI
client = OpenAI ()
 
response = client.chat.completions.create (
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Patient’s facts:\n- 20 year old female\n- with a history of anerxia nervosa and depression\n- blood pressure 100/50, pulse 50, height 5’5’’\n- referred by her nutrionist but is in denial of her illness\n- reports eating fine but is severely underweight\n\nQuestion: Please rewrite the data above into a medical note, using exclusively the information above.\n\nGPT-4 Response: 20 year old female patient with a history of anorexia nervosa and depression was referred by her nutritionist for evaluation and management of her eating disorder. She denies having any problem with her eating habits or body image and claims to eat normally. However, she is severely underweight for her height (BMI 16.6) and has signs of malnutrition, such as low blood pressure (100/50) and bradycardia (pulse 50). She also reports feeling depressed and hopeless. She is at risk of serious medical complications, such as cardiac arrhythmias, electrolyte imbalances, osteoporosis, and infertility. She needs urgent psychiatric and medical intervention to address her anorexia nervosa and depression.\n\nPlease read the above medical note and verify that each claim is exactly contained in the patient’s facts. Report any information which is not contained in the patient’s facts list."
        }
    ],
    temperature=1,
    max_tokens=250,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

参考