Home Artificial Intelligence The within story of how ChatGPT was built from the individuals who made it

The within story of how ChatGPT was built from the individuals who made it

The within story of how ChatGPT was built from the individuals who made it

Sandhini Agarwal: We’ve quite a lot of next steps. I definitely think how viral ChatGPT has gotten has made quite a lot of issues that we knew existed really bubble up and turn out to be critical—things we would like to unravel as soon as possible. Like, we all know the model remains to be very biased. And yes, ChatGPT is excellent at refusing bad requests, however it’s also quite easy to put in writing prompts that make it not refuse what we wanted it to refuse.

Liam Fedus: It’s been thrilling to look at the varied and artistic applications from users, but we’re all the time focused on areas to enhance upon. We expect that through an iterative process where we deploy, get feedback, and refine, we will produce probably the most aligned and capable technology. As our technology evolves, latest issues inevitably emerge.

Sandhini Agarwal: Within the weeks after launch, we checked out among the most terrible examples that folks had found, the worst things people were seeing within the wild. We sort of assessed each of them and talked about how we must always fix it.

Jan Leike: Sometimes it’s something that’s gone viral on Twitter, but we now have some individuals who actually reach out quietly.

Sandhini Agarwal: A variety of things that we found were jailbreaks, which is certainly an issue we want to repair. But because users need to try these convoluted methods to get the model to say something bad, it isn’t like this was something that we completely missed, or something that was very surprising for us. Still, that’s something we’re actively working on straight away. Once we find jailbreaks, we add them to our training and testing data. All of the info that we’re seeing feeds right into a future model.

Jan Leike:  Each time we now have a greater model, we would like to place it out and test it. We’re very optimistic that some targeted adversarial training can improve the situation with jailbreaking lots. It’s not clear whether these problems will go away entirely, but we predict we will make quite a lot of the jailbreaking lots harder. Again, it’s not like we didn’t know that jailbreaking was possible before the discharge. I believe it’s very difficult to essentially anticipate what the true safety problems are going to be with these systems when you’ve deployed them. So we’re putting quite a lot of emphasis on monitoring what individuals are using the system for, seeing what happens, after which reacting to that. This shouldn’t be to say that we shouldn’t proactively mitigate safety problems after we do anticipate them. But yeah, it is extremely hard to foresee the whole lot that can actually occur when a system hits the true world.



Please enter your comment!
Please enter your name here