ChatGPT: Dialogue Language Model Optimization
We developed a model called ChatGPT that interacts in a conversational manner. ChatGPT may respond to follow-up inquiries, confess mistakes, dispute faulty premises, and reject unsuitable requests thanks to the dialogue style. ChatGPT is a sister model of InstructGPT, which is taught to respond to a prompt with a thorough response.
We are excited to introduce ChatGPT to get users’ feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free. Try it now at chat.openai.com.
Samples
In the following sample, ChatGPT asks the clarifying questions to debug code.
Methods
We trained this model via Reinforcement Learning from Human Feedback (RLHF), employing the same approaches as InstructGPT, but with minor variations in data gathering setup. We used supervised fine-tuning to train an initial model: human AI trainers offered dialogues in which they played both sides—the user and an AI assistant. We provided the trainers with model-written ideas to assist them in composing their replies. This new dialogue dataset was combined with the InstructGPT dataset, which was converted into a dialogue format.
To build a reinforcement learning reward model, we required to collect comparison data, which comprised of two or more model replies graded by quality. We gathered this information by recording discussions between AI trainers and the chatbot.
ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series here. ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure.
Limitations:
1. ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.
2. ChatGPT is sensitive to tweaks to the input phrasing or attempting the same prompt multiple times. For example, given one phrasing of a question, the model can claim to not know the answer, but given a slight rephrase, can answer correctly.
3. The model is frequently very verbose and overuses specific terms, such as repeating that it is an OpenAI-trained language model. These problems emerge as a result of biases in the training data (trainers prefer lengthier responses that appear more thorough) and well-known over-optimization concerns.
4. When a user submits an uncertain query, the model should offer clarifying questions. Instead, our existing models frequently infer what the user meant.
5. While we have taken steps to prevent the model from responding to incorrect requests, it will occasionally respond to damaging instructions or display biassed behaviour. We're utilising the Moderation API to warn against and ban specific sorts of dangerous material, although we expect some false negatives and positives for the time being. We are glad to gather user input to help us with our ongoing efforts.
No comments:
Post a Comment