Shielding Prompts from LLM Data Leaks

Opinion An interesting IBM NeurIPS 2024 submission from late 2024 resurfaced on Arxiv last week. It proposes a system that may mechanically intervene to guard users from submitting personal or sensitive information right into a message once they are having a conversation with a Large Language Model (LLM) akin to ChatGPT.

Source: https://arxiv.org/pdf/2502.18509

The mock-ups shown above were employed by the IBM researchers in a study to check potential user friction to this type of ‘interference’.

Though scant details are given in regards to the GUI implementation, we will assume that such functionality could either be incorporated right into a browser plugin communicating with an area ‘firewall’ LLM framework; or that an application might be created that may hook directly into (for example) the OpenAI API, effectively recreating OpenAI’s own downloadable standalone program for ChatGPT, but with extra safeguards.

That said, ChatGPT itself mechanically self-censors responses to prompts that it perceives to contain critical information, akin to banking details:

ChatGPT refuses to engage with prompts that contain perceived critical security information, such as bank details (the details in the prompt above are fictional and non-functional). Source: https://chatgpt.com/

Source: https://chatgpt.com/

Nonetheless, ChatGPT is far more tolerant in regard to various kinds of personal information – even when disseminating such information in any way may not be within the user’s best interests (on this case perhaps for various reasons related to work and disclosure):

The example above is fictional, but ChatGPT does not hesitate to engage in a conversation on the user on a sensitive subject that constitutes a potential reputational or earnings risk (the example above is totally fictional).

Within the above case, it may need been higher to put in writing:

The IBM project identifies and reinterprets such requests from a ‘personal’ to a ‘generic’ stance.

Schema for the IBM system, which uses local LLMs or NLP-based heuristics to identify sensitive material in potential prompts.

This assumes that material gathered by online LLMs, on this nascent stage of the general public’s enthusiastic adoption of AI chat, won’t ever feed through either to subsequent models or to later promoting frameworks that may exploit user-based search queries to offer potential targeted promoting.

Though no such system or arrangement is thought to exist now, neither was such functionality yet available on the dawn of web adoption within the early Nineties; since then, cross-domain sharing of knowledge to feed personalized promoting has led to diverse scandals, in addition to paranoia.

Due to this fact history suggests that it will be higher to sanitize LLM prompt inputs now, before such data accrues at volume, and before our LLM-based submissions find yourself in everlasting cyclic databases and/or models, or other information-based structures and schemas.

Remember Me?

One factor weighing against the usage of ‘generic’ or sanitized LLM prompts is that, frankly, the power to customize an expensive API-only LLM akin to ChatGPT is kind of compelling, no less than at the present state-of-the-art – but this will involve the long-term exposure of personal information.

I often ask ChatGPT to assist me formulate Windows PowerShell scripts and BAT files to automate processes, in addition to on other technical matters. To this end, I find it useful that the system permanently memorize details in regards to the hardware that I actually have available; my existing technical skill competencies (or lack thereof); and various other environmental aspects and custom rules:

ChatGPT allows a user to develop a 'cache' of memories that will be applied when the system considers responses to future prompts.

Inevitably, this keeps details about me stored on external servers, subject to terms and conditions that will evolve over time, with none guarantee that OpenAI (though it might be every other major LLM provider) will respect the terms they set out.

Normally, nonetheless, the capability to construct a cache of memories in ChatGPT is most useful due to the limited attention window of LLMs normally; without long-term (personalized) embeddings, the user feels, frustratingly, that they’re conversing with a entity affected by Anterograde amnesia.

It’s difficult to say whether newer models will eventually grow to be adequately performant to offer useful responses without the necessity to cache memories, or to create custom GPTs which can be stored online.

Temporary Amnesia

Though one could make ChatGPT conversations ‘temporary’, it is beneficial to have the Chat history as a reference that could be distilled, when time allows, right into a more coherent local record, perhaps on a note-taking platform; but in any case we cannot know exactly what happens to those ‘discarded’ chats (though OpenAI states they’ll not be used for training, it doesn’t state that they’re destroyed), based on the ChatGPT infrastructure. All we all know is that chats now not appear in our history when ‘Temporary chats’ is turned on in ChatGPT.

Various recent controversies indicate that API-based providers akin to OpenAI shouldn’t necessarily be left in command of protecting the user’s privacy, including the invention of emergent memorization, signifying that larger LLMs usually tend to memorize some training examples in full, and increasing the chance of disclosure of user-specific data – amongst other public incidents which have persuaded a mess of big-name firms, akin to Samsung, to ban LLMs for internal company use.

Think Different

This tension between the acute utility and the manifest potential risk of LLMs will need some inventive solutions – and the IBM proposal appears to be an interesting basic template on this line.

Three IBM-based reformulations that balance utility against data privacy. In the lowest (pink) band, we see a prompt that is beyond the system's ability to sanitize in a meaningful way.

The IBM approach intercepts outgoing packets to an LLM on the network level, and rewrites them as essential before the unique could be submitted. The fairly more elaborate GUI integrations seen in the beginning of the article are only illustrative of where such an approach could go, if developed.

In fact, without sufficient agency the user may not understand that they’re getting a response to a slightly-altered reformulation of their original submission. This lack of transparency is such as an operating system’s firewall blocking access to a web site or service without informing the user, who may then erroneously hunt down other causes for the issue.

Prompts as Security Liabilities

The prospect of ‘prompt intervention’ analogizes well to Windows OS security, which has evolved from a patchwork of (optionally installed) industrial products within the Nineties to a non-optional and rigidly-enforced suite of network defense tools that come as standard with a Windows installation, and which require some effort to show off or de-intensify.

If prompt sanitization evolves as network firewalls did over the past 30 years, the IBM paper’s proposal could function a blueprint for the longer term: deploying a completely local LLM on the user’s machine to filter outgoing prompts directed at known LLM APIs. This method would naturally must integrate GUI frameworks and notifications, giving users control – unless administrative policies override it, as often occurs in business environments.

The researchers conducted an evaluation of an open-source version of the ShareGPT dataset to grasp how often contextual privacy is violated in real-world scenarios.

Llama-3.1-405B-Instruct was employed as a ‘judge’ model to detect violations of contextual integrity. From a big set of conversations, a subset of single-turn conversations were analyzed based on length. The judge model then assessed the context, sensitive information, and necessity for task completion, resulting in the identification of conversations containing potential contextual integrity violations.

A smaller subset of those conversations, which demonstrated definitive contextual privacy violations, were analyzed further.

The framework itself was implemented using models which can be smaller than typical chat agents akin to ChatGPT, to enable local deployment via Ollama.

Schema for the prompt intervention system.

The three LLMs evaluated were Mixtral-8x7B-Instruct-v0.1; Llama-3.1-8B-Instruct; and DeepSeek-R1-Distill-Llama-8B.

User prompts are processed by the framework in three stages: ; ; and .

Two approaches were implemented for sensitive information classification: and classification: dynamic classification determines the essential details based on their use inside a selected conversation; structured classification allows for the specification of a pre-defined list of sensitive attributes which can be all the time considered non-essential. The model reformulates the prompt if it detects non-essential sensitive details by either removing or rewording them to reduce privacy risks while maintaining usability.

Home Rules

Though structured classification as an idea is just not well-illustrated within the IBM paper, it’s most akin to the ‘Private Data Definitions’ method within the Private Prompts initiative, which provides a downloadable standalone program that may rewrite prompts – albeit without the power to directly intervene on the network level, because the IBM approach does (as an alternative the user must copy and paste the modified prompts).

The Private Prompts executable allows a list of alternate substitutions for user-input text.

Within the above image, we will see that the Private Prompts user is in a position to program automated substitutions for instances of sensitive information. In each cases, for Private Prompts and the IBM method, it seems unlikely that a user with enough presence-of-mind and private insight to curate such a listing would really want this product – though it might be built up over time as incidents accrue.

In an administrator role, structured classification could work as an imposed firewall or censor-net for workers; and in a house network it could, with some difficult adjustments, grow to be a domestic network filter for all network users; but ultimately, this method is arguably redundant, since a user who could set this up properly could also self-censor effectively in the primary place.

ChatGPT’s Opinion

Since ChatGPT recently launched its deep research tool for paid users, I used this facility to ask ChatGPT to review related literature and provides me a ‘cynical’ tackle IBM’s paper. I received probably the most defensive and derisive response the system has ever given when asked to guage or parse a brand new publication:

ChatGPT-4o has a low opinion of the IBM project.

ChatGPT posits.

This objection seems self-serving and non-applicable, given the storied history of open source projects that genuinely profit end-users through the concerted long-term efforts of community developers and contributors; and given the growing potential of mobile devices to run – and even train – lightweight LLMs. Definitely on this instance, the use case is just not terribly demanding.

Next, ChatGPT confidently misses the purpose of getting a light-weight LLM provide oversight of input towards a industrial LLM that would not possibly be run on an area machine (since the LLM is simply too big, and allowing local access is simply too dangerous for the corporate that makes it):

The reply to the ultimate query here is that the local LLM is meant to be entirely on the side of the user, and inevitably open source, with minimal or zero need for network access. An equivalent industrial version, nonetheless well-intentioned on the outset, would eventually be vulnerable to corporate shifts and changes to the terms of service, whereas an appropriate open source license would prevent this type of ‘inevitable corruption’.

ChatGPT further argued that the IBM proposal ‘breaks user intent’, because it could reinterpret a prompt into another that affects its utility. Nonetheless, this can be a much broader problem in prompt sanitization, and never specific to this particular use case.

In closing (ignoring its suggestion to make use of local LLMs ‘as an alternative’, which is strictly what the IBM paper actually proposes), ChatGPT opined that the IBM method represents a barrier to adoption as a result of the ‘user friction’ of implementing warning and editing methods right into a chat.

Here, ChatGPT could also be right; but when significant pressure involves bear due to further public incidents, or if profits in a single geographical zone are threatened by growing regulation (and the corporate refuses to only abandon the affected region entirely), the history of consumer tech suggests that safeguards will eventually now not be optional anyway.

Conclusion

We won’t realistically expect OpenAI to ever implement safeguards of the kind which can be proposed within the IBM paper, and within the central concept behind it; no less than not effectively.

And definitely not ; just as Apple blocks certain iPhone features in Europe, and LinkedIn has different rules for exploiting its users’ data in numerous countries, it’s reasonable to suggest that any AI company will default to probably the most profitable terms and conditions which can be tolerable to any particular nation wherein it operates – in each case, on the expense of the user’s right to data-privacy, as essential.

Shielding Prompts from LLM Data Leaks

Remember Me?

Temporary Amnesia

Think Different

Prompts as Security Liabilities

Home Rules

ChatGPT’s Opinion

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

a Leaderboard for Real World Use Cases

Patch Time Series Transformer in Hugging Face

Constitutional AI with Open LLMs

Hugging Face Text Generation Inference available for AWS Inferentia2

The best way to Leverage Slash Commands to Code Effectively

Shielding Prompts from LLM Data Leaks

Remember Me?

Temporary Amnesia

Think Different

Prompts as Security Liabilities

Home Rules

ChatGPT’s Opinion

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.