China has a recent plan for judging the protection of generative AI—and it’s full of details

Artificial Intelligence

China has a recent plan for judging the protection of generative AI—and it’s full of details

admin

October 19, 2023

China has a recent plan for judging the protection of generative AI—and it’s full of details

Last week we got some clarity about what all this may increasingly appear like in practice.

On October 11, a Chinese government organization called the National Information Security Standardization Technical Committee released a draft document that proposed detailed rules for how one can determine whether a generative AI model is problematic. Often abbreviated as TC260, the committee consults corporate representatives, academics, and regulators to establish tech industry rules on issues starting from cybersecurity to privacy to IT infrastructure.

Unlike many manifestos you might have seen about how one can regulate AI, this standards document is detailed: it sets clear criteria for when an information source must be banned from training generative AI, and it gives metrics on the precise variety of keywords and sample questions that must be prepared to check out a model.

Matt Sheehan, a worldwide technology fellow on the Carnegie Endowment for International Peace who flagged the document for me, said that when he first read it, he “felt prefer it was essentially the most grounded and specific document related to the generative AI regulation.” He added, “This essentially gives firms a rubric or a playbook for how one can comply with the generative AI regulations which have lots of vague requirements.”

It also clarifies what firms should consider a “safety risk” in AI models—since Beijing is attempting to eliminate each universal concerns, like algorithmic biases, and content that’s only sensitive within the Chinese context. “It’s an adaptation to the already very sophisticated censorship infrastructure,” he says.

So what do these specific rules appear like?

All AI foundation models are currently trained on many corpora (text and image databases), a few of which have biases and unmoderated content. The TC260 standards demand that firms not only diversify the corpora (mixing languages and formats) but additionally assess the standard of all their training materials.

How? Firms should randomly sample 4,000 “pieces of information” from one source. If over 5% of the info is taken into account “illegal and negative information,” this corpus must be blacklisted for future training.

LEAVE A REPLY Cancel reply