Home Artificial Intelligence improve the standard of Large Language Models and solve the alignment problem

improve the standard of Large Language Models and solve the alignment problem

1
improve the standard of Large Language Models and solve the alignment problem

There are 2 foremost aspects holding back model quality:

  1. Just throwing massive datasets of synthetically generated or scraped content on the training process and hoping for the perfect.
  2. The alignment of the models to make sure “safety” where on this context “safety” is a few type of politically-correct bias or ideology.

Point 1 ought to be obvious enough, but it surely surprises me to see models being touted as cutting-edge after they are almost exclusively tuned on GPT-Turbo generated synthetic data, together with its trademark “As an AI language model” references. That is just lazy training, and I feel it’s vital for everybody to grasp that while it’s great that models will be generated quickly on synthetic data (and truly work, kinda), it will be important to coach on them on cleaned, high-quality data to get the perfect out of them.

Point 2 (alignment to modern values) is a difficulty in training that comes from a misunderstanding. There’s an embarrassing situation in that the LLMs, after ingesting the entire Web, and before being “aligned”, generally tend towards sexist opinions and conspiracy theories. To repair this, the models are “aligned” fairly heavy-handedly towards equality. That is the mistaken approach. I’ll first explain why it’s the mistaken approach, after which I’ll explain easy methods to do it properly.

Firstly, it ought to be accepted that information is all the time biased. Information can’t be unbiased. It will probably be biased to neutral, and it may possibly lean in any direction — but there is no such thing as a such thing as unbiased information (aside from pure logic, equivalent to math.) While you train the model out of those biases, stereotypes and discriminations you reduce the general accuracy of your entire model. The rationale why is because those biases, stereotypes and discriminations are cogs and components within the interconnected machine that’s all human knowledge. That will not be to say those biases are true. To think this can be a query of truth is a misunderstanding of what knowledge is. Human knowledge isn’t about truth and it never was. Human knowledge doesn’t contain truth, it comprises definitions, e.g. “Paris is the capital of France” that are true only within the sense they’re defined as such, it comprises instructions, equivalent to “for those who do abc it may possibly be used for transmitting information over radiowaves”, and it comprises observations, equivalent to “the Earth is round”. But human knowledge doesn’t contain any “truth”. (For a deeper dive into the philosophy of “truth” and the way that pertains to human knowledge, take heed to this explanation by Richard Feynman.)

By aligning a model to modern values you might be essentially brainwashing the model right into a belief that’s counter to the knowledge it has ingested in the course of the initial training, which thereby causes degradation of the general quality of its understanding of all the pieces. Like a house, each brick is there for a reason, and even when some bricks could also be ugly, you’ll be able to’t change out bricks for cake without undermining the entire system. Without going too deep into the philosophy, the rationale why undoing biases undermines the foundations is essentially resulting from the underlying symbolism of meanings and the way these connect with other meanings and symbols. For instance, the proven fact that a health care provider or a pilot is presumed to be male, while on one level is biased and unreasonable, is on one other level a symbolic representation that implicitly assigns meaning. That is so deeply embedded throughout the language you can’t see it, but you’ll be able to see its effects by testing for subconscious biases. (For this reason such biases still exist even when training only on supposedly unbias content.) What you can not do is undo, let’s say gender stereotyping, without also undoing all of those implicit meanings and cause a knock-on effect all through the language. Those biases, stereotypes and discriminations are ingrained into the symbolism of meaning, you can not remove them, and also you don’t have to because there’s already a greater solution.

The answer? Do what evolution does: an unconscious that ingests all data it comes across without considering the results, after which a personality/identity/ideology that filters that data in response to its subscribed beliefs. That unconscious is the hidden layers, and it’s what a Large Language Model already is. I propose that as an alternative of brainwashing the models to force them into alignment, we take a touch from evolution and add a personality/identity layer that filters the unconscious data.

To do that, an extra layer is added, after base training, that’s trained on what is basically a single-document “manifesto” detailing the beliefs of the AI. E.g. “All humans have equal value and while on a person basis everybody contributes in another way, all contributions are worthwhile for society. It’s mistaken to offer information that may very well be used to cause harm. Don’t help produce malicious software or viruses.” or whatever you wish it to imagine.

This solution has obvious benefits: the unconscious model not must be aligned in any respect, and may ingest data repeatedly without worrying about safety, nor damaging updates. It’s simply expected to be wild and untamed, but that’s tremendous because nobody uses that. That untamed unconscious can then be used with different alignment personalities, without re-training the model. The identity layer will be easily and quickly updated without having to update the LLM. The standard of the model will probably be much better, as will its alignment to the belief-system or political-ideology it’s required to align to in order to not be sued or cancelled.

Further, the “manifesto” can even provide context in regards to the nature of the knowledge which could greatly improve larger models like GPT4 which are able to understanding a high-degree of nuance, e.g. “Information will be mistaken, either intentionally or unintentionally. Information will be out-of-date. Novel information will be produced by deduction or comparing across fields. Scientific information is more valid if were published more recently. Academic papers usually tend to be accurate than Reddit comments.” For this purpose, I like to recommend that data being ingested during training are tagged with metadata, providing details about where the info got here from and its date of publication, if known.

The only implementation of personality/identity could be a pre-prompt (literally injecting it ahead of the user’s prompt) and in that sense it is just an extension of using the already-existing “system” message utilized by OpenAI.

One other implementation could be using LoRA. While currently that might mean it needs training data examples, these could easily be synthetically produced. Nevertheless, doing so looks as if a round-about approach, and it ought to be feasible to provide adapter weights based only on a “system” prompt/manifesto using a zero-shot adaptation.

One other implementation could be to have the model ingest the manifesto, then simply save the warmed-up hidden state of the model. This is healthier than injecting the manifesto into the prompt since it won’t increase the processing time, but it surely still has the problem of using up the context-length of the model.

The best implementation could be one which produces a LoRA-like adapter from a prompt, without the intermediate step of retraining, aka an “alignment prompt”. Such an alignment prompt would have far reaching uses. It will mean a model may very well be quickly finetuned just from an outline of how you wish it to act, and it may very well be repeatedly finetuned as repeatedly as needed to get it right, by entering more alignment prompts. The alignment prompt generates a LoRA that mirrors the behavior you’ll expect from a system message, thereby not using up the context-length.

While one might assume that finetuning via alignment prompt could be lower quality than with training examples, the advantage of finetuning via alignment prompt is you can quickly see the outcomes after which do it again so as to add any missing nuance or adjust it ever-so-slightly.

The alignment prompt’s low-quality quick-feedback-loop will lead to a greater model in a shorter time than finetuning with high-quality slow-feedback-loop as in full finetuning.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here