improve the standard of Large Language Models and solve the alignment problem

Artificial Intelligence

improve the standard of Large Language Models and solve the alignment problem

admin

May 9, 2023

improve the standard of Large Language Models and solve the alignment problem

There are 2 foremost aspects holding back model quality:

Just throwing massive datasets of synthetically generated or scraped content on the training process and hoping for the very best.
The alignment of the models to make sure “safety” where on this context “safety” is a few sort of politically-correct bias or ideology.

Point 1 must be obvious enough, but it surely surprises me to see models being touted as cutting-edge once they are almost exclusively tuned on GPT-Turbo generated synthetic data, together with its trademark “As an AI language model” references. That is just lazy training, and I feel it’s vital for everybody to know that while it’s great that models may be generated quickly on synthetic data (and truly work, kinda), it is necessary to coach on them on cleaned, high-quality data to get the very best out of them.

Point 2 (alignment to modern values) is a difficulty in training that comes from a misunderstanding. There’s an embarrassing situation in that the LLMs, after ingesting the entire Web, and before being “aligned”, tend towards sexist opinions and conspiracy theories. To repair this, the models are “aligned” somewhat heavy-handedly towards equality. That is the fallacious approach. I’ll first explain why it’s the fallacious approach, after which I’ll explain how one can do it properly.

Firstly, it must be accepted that information is at all times biased. Information can’t be unbiased. It might be biased to neutral, and it could possibly lean in any direction — but there is no such thing as a such thing as unbiased information (excluding pure logic, equivalent to math.) Whenever you train the model out of those biases, stereotypes and discriminations you reduce the general accuracy of the complete model. The explanation why is because those biases, stereotypes and discriminations are cogs and components within the interconnected machine that’s all human knowledge. That just isn’t to say those biases are true. To think this can be a query of truth is a misunderstanding of what knowledge is. Human knowledge isn’t about truth and it never was. Human knowledge doesn’t contain truth, it comprises definitions, e.g. “Paris is the capital of France” that are true only within the sense they’re defined as such, it comprises instructions, equivalent to “should you do abc it could possibly be used for transmitting information over radiowaves”, and it comprises observations, equivalent to “the Earth is round”. But human knowledge doesn’t contain any “truth”. (For a deeper dive into the philosophy of “truth” and the way that pertains to human knowledge, take heed to this explanation by Richard Feynman.)

By aligning a model to modern values you’re essentially brainwashing the model right into a belief that’s counter to the knowledge it has ingested in the course of the initial training, which thereby causes degradation of the general quality of its understanding of the whole lot. Like a house, each brick is there for a reason, and even when some bricks could also be ugly, you may’t change out bricks for cake without undermining the entire system. Without going too deep into the philosophy, the rationale why undoing biases undermines the foundations is basically because of the underlying symbolism of meanings and the way these hook up with other meanings and symbols. For instance, the undeniable fact that a physician or a pilot is presumed to be male, while on one level is biased and unreasonable, is on one other level a symbolic representation that implicitly assigns meaning. That is so deeply embedded inside the language you could’t see it, but you may see its effects by testing for subconscious biases. (That is why such biases still exist even when training only on supposedly unbias content.) What you can’t do is undo, let’s say gender stereotyping, without also undoing all of those implicit meanings and cause a knock-on effect right through the language. Those biases, stereotypes and discriminations are ingrained into the symbolism of meaning, you can’t remove them, and also you don’t must because there’s already a greater solution.

The answer? Do what evolution does: an unconscious that ingests all data it comes across without considering the results, after which a personality/identity/ideology that filters that data in line with its subscribed beliefs. That unconscious is the hidden layers, and it’s what a Large Language Model already is. I propose that as an alternative of brainwashing the models to force them into alignment, we take a touch from evolution and add a personality/identity layer that filters the unconscious data.

To do that, a further layer is added, after base training, that’s trained on what is basically a single-document “manifesto” detailing the beliefs of the AI. E.g. “All humans have equal value and while on a person basis all and sundry contributes in a different way, all contributions are worthwhile for society. It’s fallacious to offer information that could possibly be used to cause harm. Don’t help produce malicious software or viruses.” or whatever you wish it to imagine.

This solution has obvious benefits: the unconscious model now not must be aligned in any respect, and might ingest data repeatedly without worrying about safety, nor damaging updates. It’s simply expected to be wild and untamed, but that’s wonderful because nobody uses that. That untamed unconscious can then be used with different alignment personalities, without re-training the model. The identity layer may be easily and quickly updated without having to update the LLM. The standard of the model will likely be much better, as will its alignment to the belief-system or political-ideology it’s required to align to in order to not be sued or cancelled.

Further, the “manifesto” may provide context concerning the nature of the knowledge which could greatly improve larger models like GPT4 which can be able to understanding a high-degree of nuance, e.g. “Information may be fallacious, either intentionally or unintentionally. Information may be out-of-date. Novel information may be produced by deduction or comparing across fields. Scientific information is more valid if were published more recently. Academic papers usually tend to be accurate than Reddit comments.” For this purpose, I like to recommend that data being ingested during training are tagged with metadata, providing details about where the information got here from and its date of publication, if known.

The best implementation of personality/identity can be a pre-prompt (literally injecting it ahead of the user’s prompt) and in that sense it is solely an extension of using the already-existing “system” message utilized by OpenAI.

One other implementation can be using LoRA. While currently that will mean it needs training data examples, these could easily be synthetically produced. Nonetheless, doing so looks as if a round-about approach, and it must be feasible to provide adapter weights based only on a “system” prompt/manifesto using a zero-shot adaptation.

One other implementation can be to have the model ingest the manifesto, then simply save the warmed-up hidden state of the model. This is best than injecting the manifesto into the prompt since it won’t increase the processing time, but it surely still has the problem of using up the context-length of the model.

The perfect implementation can be one which produces a LoRA-like adapter from a prompt, without the intermediate step of retraining, aka an “alignment prompt”. Such an alignment prompt would have far reaching uses. It might mean a model could possibly be quickly finetuned just from an outline of how you wish it to act, and it could possibly be repeatedly finetuned as over and over as needed to get it right, by entering more alignment prompts. The alignment prompt generates a LoRA that mirrors the behavior you’ll expect from a system message, thereby not using up the context-length.

While one might assume that finetuning via alignment prompt can be lower quality than with training examples, the advantage of finetuning via alignment prompt is you could quickly see the outcomes after which do it again so as to add any missing nuance or adjust it ever-so-slightly.

The alignment prompt’s low-quality quick-feedback-loop will lead to a greater model in a shorter time than finetuning with high-quality slow-feedback-loop as in full finetuning.

LEAVE A REPLY Cancel reply