Recent and improved content moderation tooling


To assist developers protect their applications against possible misuse, we’re introducing the faster and more accurate Moderation endpoint. This endpoint provides OpenAI API developers with free access to GPT-based classifiers that detect undesired content—an instance of using AI systems to help with human supervision of those systems. We have now also released each a technical paper describing our methodology and the dataset used for evaluation.

When given a text input, the Moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self-harm—content prohibited by our content policy. The endpoint has been trained to be quick, accurate, and to perform robustly across a spread of applications. Importantly, this reduces the probabilities of products “saying” the fallacious thing, even when deployed to users at-scale. As a consequence, AI can unlock advantages in sensitive settings, like education, where it couldn’t otherwise be used with confidence.


What are your thoughts on this topic?
Let us know in the comments below.


0 0 votes
Article Rating
1 Comment
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x