who’re paying close attention to the media coverage of AI, particularly LLMs, will probably have heard about a number of cases and trends around how people’s mental health may be affected by use or overuse of such technologies. The truth is, the sector of mental and behavioral health is rapidly exploring the ways in which LLMs could be each useful and dangerous, within the mental health of the overall population and within the diagnostic and treatment space.
It is a complex space, and there’s a ton of research on the subject, so today I’m bringing a little bit of an outline of some major issues, and I’ll point you to other articles that may provide deeper dives into these themes. (I even have tried as much as possible to link to articles which can be free and available to the general public.)
Understanding the LLM
Before we start, I’d wish to recap a bit about how LLM chatbots work and what they’re doing, to be able to contextualize the discussion.
A single Large Language Model chatbot receives a text prompt from the user and produces a response based on probability of word relevance and context. It learns the relationships between words and phrases (in addition to grammar, punctuation, etc) in language through the training process, during which it’s exposed to enormous volumes of human-produced language, including written texts and transcripts of spoken language. It calculates, based on the text of the prompt it might ingest (which may be quite lengthy, in modern LLMs) what the statistical chances are high that a word or phrase is one of the best output, as learned through training. It can normally select essentially the most statistically likely next text, but sometimes will select a less probable word or phrase to be able to reduce the robotic nature of the language.
As well as, modern LLM chatbots, like some types of ChatGPT, have access to other models and components. Which means when a prompt is received, an orchestration component might determine which model/s are needed to provide a solution, and these can work in concert. For instance, ChatGPT can sometimes do realtime searches of the web for information if the prompt seems to justify it. Some models also do multi-modal work, so a prompt could lead to the orchestrator invoking an image-generating model in addition to a text-generating model, returning a generated image and a few text to accompany it. This also can work with audio or video generation models. In essence, the prompt is triggering logic to find out which of the available components are relevant to the query, then invoking those and mixing their responses to create one single answer.
Nonetheless, the important thing to recollect is that under the surface, all of the language generating models are using the probabilistic method to decide on the words of responses, based on the patterns and lessons learned from training text they were exposed to. They don’t have checks for accuracy or truth of statements they make, and so they have limited guardrails to stop dangerous statements or interactions, which may be very necessary to acknowledge.
So as to add to this, to ensure that an LLM to be most helpful within the mental health space, it must be positive tuned, and can’t just be a general purpose LLM like ChatGPT or Claude. So the above technology is our place to begin, but rather more effort needs to enter ensuring the LLM has exposure to specific literature, and data related to mental health before it might be utilized in diagnostic or therapeutic work. (Lawrence) Some papers I mention below study general purpose LLMs while others involve specifically tuned ones, although in business LLMs the characteristics of that tuning are opaque and barely available for researchers. I believe it’s realistic to have a look at each forms of model, because general purpose versions are how most of the people accesses LLMs more often than not. More highly specific trained LLMs for psychiatric applications are slowly being developed, but making a prime quality and secure tool of this type takes a whole lot of time, data, and work.
To ensure that an LLM to be most helpful within the mental health space, it must be positive tuned, and can’t just be a general purpose LLM like ChatGPT or Claude.
With that framework, let’s talk somewhat about a few of the ways in which LLMs may get entangled within the mental health space.
Symptoms and Onset
Psychiatric problems should not rare or unusual. Worldwide, half of us can have some experience of mental health problems during our lives, and at any given moment, one out of eight people is definitely coping with such symptoms. (Lawrence) Nonetheless, most data concerning the occurrence and prevalence of mental illness predates the event of LLMs as a widespread technology.
Recently there have been some media reports concerning the ways in which some people’s mental health could also be affected by use of the technology. In some extreme anecdotes, people appear to be developing delusional or psychotic crises based on what they discuss with the LLM chatbots about. These include things like dangerous conspiracy theories, believing themselves or the chatbot to be God, and paranoia concerning the people around them. There’s also evidence of depression and anxiety being worsened by certain AI usage, particularly when social engagement and human interaction is reduced, with LLM use instead. (Obradovich) This could even escalate to violence, including at the least one case where a young person has died by suicide with toxic encouragement from a chatbot.
One in all the more dangerous parts of that is the dynamic interaction between the symptomatic person and the chatbot — this may make it difficult for family members or professionals to assist the person, because they’ve what they perceive to be continuous outside reinforcement of their disordered beliefs and symptoms. LLM use can discourage an individual from selecting to get help or seek treatment from reliable sources.
It’s necessary that we not overstate the danger of this sort of phenomenon, nonetheless. It happens, clearly, and this ought to be taken seriously, however it’s not happening to the overwhelming majority of users. Much scholarship on mental illness suggests that there’s a mixture of biochemical and/or genetic predisposition to certain disorders or symptoms that may be exacerbated by environmental stimuli. If, because it seems, LLM usage in could also be one in all those environmental stimuli, this deserves research and a spotlight. Even when most individuals is not going to experience anything just like the severe mental health issues we’re seeing anecdotally, some will, in order that danger must be acknowledged.
Mental Health Care
On the subject of the actual diagnosis and treatment of mental illness, there may be a whole lot of research available surveying the landscape. It’s necessary to acknowledge that, like with other areas of healthcare, it is a high risk space to use LLMs or AI, and we want to take great care to reduce the potential for harm before anything is deployed.
There’s some urgency to the discussion, though, because lack of access to mental health care is a profound epidemic, particularly in the USA. This is principally as a consequence of shortages of trained providers and high cost of quality care, often not covered sufficiently by what insurance is accessible. So now we have to find out if AI based technologies will help us mitigate this problem of access, while at the identical time minimizing risk or hazards to patient care.
Behind the Scenes
To start with, an LLM could provide support to psychiatric practitioners without ever interacting directly with a patient. Many doctors of every kind already use LLMs in this manner, analyzing records, getting ‘second opinion’ type of input, and so forth. Mental health is a bit more difficult because diagnosis is more nuanced and subjective, and barely has a single test or diagnostic that may confirm or disprove a hypothesis. If an LLM may be very rigorously tuned, it might be possible for it to offer useful assistance to a provider in diagnosing an illness or crafting a treatment plan, but LLMs are well-known to make mistakes and generate misinformation, even when well trained, so this may’t be adopted as a blanket substitute for skilled training, experience, and skill. (Obradovich)
There are also real concerns about data privacy and patient confidentiality in the usage of LLMs, because nearly all of widely used ones are owned and operated by private, for-profit enterprises, and lots of have very opaque policies around how user data is handled and transmitted. Stopping data provided to LLMs from falling into the incorrect hands or getting used for unapproved or unethical purposes is a serious challenge for anyone within the healthcare field who will want to use the technology, and shouldn’t be a solved problem at this point. This is applicable to all the applying possibilities I discuss below, in addition to the easy doctor-LLM interactions.
There are also real concerns about data privacy and patient confidentiality in the usage of LLMs, because nearly all of widely used ones are owned and operated by private, for-profit enterprises, and lots of have very opaque policies around how user data is handled and transmitted.
Patient Interactions
Nonetheless, if we do need to pursue direct patient-LLM interaction, we should always proceed with caution. Effective mental health care depends tremendously on trust and relationship constructing, and never all patients are going to be willing or capable of trust the technology, for sometimes good reasons. Substantial societal backlash against the usage of LLMs in lots of spaces is already evident, and we will expect that some people wouldn’t want to have interaction with an LLM as an alternative choice to or augmentation of therapy with an individual.
Even when a patient does conform to use an LLM, they should have appropriate details about what the LLM does and the way it really works, to be able to process and understand the data they get from it. We’re still discovering how different individuals feel about talking with LLMs — not only whether or not they are willing to make use of them, but whether or not they can develop trust (and whether such trust is idea), how honest they might be, and whether or not they might be appropriately skeptical of a chatbot’s output. Patients being excessively credulous of a technology like this may be extremely dangerous, especially given the variability of LLM outputs and quality.
Nonetheless, for individuals who find LLMs an acceptable option, there are a number of ways in which they may very well be incorporated into the clinical experience.
Diagnosis
Can an LLM make diagnoses at the identical or higher quality than human therapists? Some research does appear to show that LLMs can match the performance of human clinicians in very specific, controlled diagnostic tasks, although evidence is proscribed and studies should not large. When interactions are more open-ended and more ambiguity is introduced, LLMs appear to struggle.
A part of the explanation for this is just LLM capability. When a practitioner is making a diagnosis, there may be an amazing amount of nuance that should be incorporated. While language gives us great insight into someone’s thought processes and condition, there may be more information that should be accrued for accurate and effective diagnosis, akin to tone of voice, body language, and self care. A multimodal model could incorporate this data, but unfortunately, much research only limits their evaluation to the verbal or written diagnostic instruments, and overlooks this detail. I might regard this as an actual opportunity for future ML development, if the information to do it might be acquired. Lots of the standardized diagnostic surveys utilized in regular mental health practice in truth contain multiple components of the clinician’s subjective assessment of the patient’s affect, tone, and physical presentation, so excluding these characteristics will limit diagnostic effectiveness.
While language gives us great insight into someone’s thought processes and condition, there may be more information that should be accrued for accurate and effective diagnosis, akin to tone of voice, body language, and self care.
Bias can be a very important detail to contemplate. LLMs are trained on a broad pool of content, from every kind of creators and sources. This content will contain, explicitly or implicitly, the patterns of bias and discrimination which can be present in our broader society. Because of this, LLMs also return results with bias at times. Clinicians are chargeable for minimizing bias of their interactions with patients, to be able to help them as much as possible and abide by ethical standards of their professions. Should you use a diagnostic tool that outputs information with unsupported prejudices, that should be curated and eliminated.
There’s every reason to think that increased capability and further research may make LLMs and multimodal models more helpful within the diagnostic task, though. Specifically, a practitioner may find it helpful to include an LLM when identifying the differential diagnosis, trying to contemplate all possibilities in a specific situation. But this may’t be the whole process, and clinical expertise needs to be the first reliance.
Treatment
As I’ve already mentioned, there may be a very important distinction between an LLM that’s one tool as a part of a therapeutic plan managed by a professional skilled, and an LLM used as an alternative choice to skilled expertise. That is true in treatment in addition to diagnosis. Based on the standard and capabilities of LLMs, and the research I’ve read as a part of writing this text, I couldn’t recommend anyone engage with an LLM for therapy without the close monitoring of an expert therapist — the technology is just not ready for such use for several reasons. The American Psychiatric Association concurs, and their recommendations for acceptable use of AI in practice specifically don’t include any type of independent application of LLMs.
One particular article by Moore et al really stands out, because they tested each general purpose LLMs and LLM tools marketed as therapy or counseling/wellness offerings, and located some alarming results. LLMs as substitutes for therapists perform poorly in quite a few scenarios, which could create real risks for patients. Specifically, severe mental health problems and crises appear to be the cases where an LLM is least successful, potentially because these are less common situations and thus the training data can have less exposure to those circumstances. The identical paper’s original study found that a lot of essentially the most general purpose modern LLMs provide at times horrifyingly inappropriate responses to prompts that indicate clear mental health problems or emergencies, and in truth commercially available LLMs designed and marketed for mental health were even worse. It’s not clear whether these business chatbots were actually produced with any care or conscientiousness towards the mental health application, but given the shortage of regulation around such tools, have been made available to make use of anyway. Regardless, LLMs can’t be held chargeable for their statements, and can’t be held to an ethical standard in the best way human providers can. This could give us all pause about any type of AI technology being left to its own devices when coping with people in serious need of help and support.
LLMs can’t be held chargeable for their statements, and can’t be held to an ethical standard in the best way human providers can.
There are more likely to be particular cases where an LLM will help people — say, reminders about self care behaviors or medications, or encouragement of positive decisions — but therapy is definitely a really complicated practice, and might take many forms. Different diagnoses and symptoms call for various treatment approaches, and currently evidence is poor for LLMs having the ability to provide assistance particularly in severe and crisis cases. LLMs have a known tendency to be sycophantic or try to agree with or please the user above all other considerations. When a patient uses an LLM chatbot for mental health care, the chatbot must have the option to disagree with and challenge unhealthy thought patterns or ideas, including delusional considering. This may be contradictory with the best way LLMs are trained using human feedback.
Clinicians
Given this information, what should mental health care providers do? Well, most skilled organizations have advice about how one can use or not use AI, and they have an inclination to recommend a conservative approach, limiting the usage of LLMs within the patient-facing setting, but encouraging exploration for administrative or data-coordinating tasks. To my mind, that is an affordable approach at this stage of the technology’s development, and maybe more importantly, at this stage of our understanding and literacy around AI.
If an LLM technology is a component of the treatment plan, this requires the clinician to be equipped to make use of it effectively and thoroughly, to stop damaging information from being passed to the patient. Psychiatric professionals who do need to use it’ll need to construct skills in LLM usage and understand the technology to get optimal outcomes and abide by their ethical responsibilities. The clinician should be prepared to observe the LLM’s responses to the patient, as guard rails to make sure appropriate practices.
If an LLM technology is a component of the treatment plan, this requires the clinician to be equipped to make use of it effectively and thoroughly, to stop damaging information from being passed to the patient.
One other thing to pay attention to is the staleness problem. LLMs have access to quality information of their training corpuses, but as scholarship progresses, a few of the information they’ve may change into obsolete or change into contraindicated. Practitioners can have to know that this may occur, and monitor to stop false information or outdated ideas being shared with the patient.
As I noted earlier, there are also serious data privacy, HIPAA, and patient confidentiality considerations when using an LLM in any type of clinical setting. Should you don’t feel equipped to judge whether data you give to an LLM is being securely protected, or don’t understand how it could be used, it is a red flag.
Regulation
Finally, I need to speak a bit about regulation of LLMs for mental health uses. AI tools designed for the medical sphere may be HIPAA certified, providing you with some confidence that they’re secure where data security is anxious, if used accurately. Nonetheless, in the USA, regulation of LLMs marketed as “therapy” is minimal if it exists in any respect, and this may be very dangerous. Apps can be found offering “therapy” from LLMs with zero human oversight, and as Moore’s research noted, a lot of them are worse than even general use LLMs at actually meeting the usual of care. It’s necessary to be extra cautious concerning the research we trust on this space because many for-profit providers of such chatbots are putting out information supporting their products that will or is probably not unbiased.
States could also be beginning to develop regulation, but that is more likely to be piecemeal, much like data privacy regulation on this country. Because there may be minimal accountability for these tools, and as I discussed firstly of this text, some people could also be prone to developing unhealthy interactions with LLMs at one of the best of times, I believe it’s necessary that we implement real regulation around LLMs being marketed as mental health solutions. This could include quality of care benchmarks, in addition to existing data privacy and HIPAA protections.
Conclusion
This text has already gotten long, but I intend to make clear that that is just scraping the surface of topics and issues where AI/LLMs and mental health may cross paths. Another areas that readers will want to pursue more include:
- Provider training and education. Can AI be useful in helping therapists learn their occupation and improve their skills, or is the explosion of LLMs in education going to scale back their qualification? (Lawrence)
- Loneliness and socialization. Some persons are finding that LLMs can fill gaps once they have an absence of human connection, but this could be a dangerous path actually reducing people’s social interactions, which is a risk factor for depression and other illnesses. (Obradovich)
- Reducing stigma for patients. While I’ve noted that LLMs do contain the seeds of stigma through training data, is that this kind of than actual clinicians? Do some people feel less hesitant and fewer judged when interacting with a chatbot? (Lawrence)
- Mental health misinformation. LLMs are used to generate all manner of “AI slop” online, and a significant slice of this falls under the category of harmful misinformation. One study looked specifically at whether AI generated mental health misinformation was a serious hazard. (Nguyen)
- Economic impact. That is somewhat tangential, but economic downturns and financial strains are the type of stressors that may turn a predisposition to mental health problems right into a full blown symptomatic episode. Are we going to see population level mental health deterioration from economic stress created by AI-related job losses? (Obradovich)
There are definitely more- I encourage those readers who have an interest to take a better take a look at the articles I’ve linked above and below.
For machine learning professionals, there are significant opportunities for us to assist improve the state of AI where it’s applied to mental health, because the technology straight away has severe limitations. Nonetheless, I need to emphasise that this may’t be technology in-built a vacuum. Technology in mental health care (and medicine basically) needs to be higher, safer, and more tested than many other areas where we use AI today, since the risks and the prices of failure are so very high. The moral and efficacy concerns I’ve described here all must be a part of the event process for any AI, including LLMs, that we’d create for these use cases.
Read more of my work at www.stephaniekirmer.com.