Large Language Models that Follow Instructions
Learn about the new version of GPT-3 that follows instructions and see the demo of the application that uses it in action. This could be a game changer for how we interact with machines.
TL;DR: Build software using large language models that follow your instructions.
Outline
Large Language Models that follow instructions
InstructGPT model in action with a working example
Extensions and variations
What’s next?
Hi all,
Following our series on building language applications, we will be covering an emergent way of using language models (LLMs) for your application.
In the previous posts, we covered what are large language models (LLMs) like GPT-3 and how to use them when building your AI software applications.
In a nutshell, LLMs (like GPT-3) are neural networks that have been trained to predict the next word on a large corpus of text (terabytes of text) crawled from the internet. Once trained, we can use these LLMs using a simple text-in-text-out interface.
That’s we provide very few examples (sometimes 2-3 examples are sufficient) as a so-called prompt, and GPT-3 uses them to perform a certain task.
Prompt Example:
Convert movie titles into emoji.
Back to the Future: 👨👴🚗🕒
Batman: 🤵🦇
Transformers: 🚗🤖
Star Wars:
Prompting has been a de-facto way of using large language models in production (see writing assistant software like Jasper, Copy ai). Engineers play around with different examples in the prompt until they get a satisfactory output from GPT-3.
What’s missing in classical prompting is providing a narrative and instructions behind a task. GPT-3 needs to figure out the task you wanted it to perform given input/output pairs only.
Wouldn’t it be nice to provide some instructions to GPT-3 and ask it to follow these instructions in addition to providing some examples? Enter InstructGPT and other instruction-based large language models.
Large Language Models that follow instructions
What do the large language models (LLMs) that follow instructions offer to the developer? How can the developer utilize them to build software better and faster?
New capabilities unlocked with LLMs that follow instructions:
Perform the task by explaining the model that you would like to generate.
Example: “Complete the paragraph summarizing what has been written before“.
In addition to relying on examples, the large language model can now use the clues from the instructions to solve the task.
Chain-of-thought is one way to provide instructions to the model with step-by-step reasoning steps to get it to the final output (we will talk about it more later in the post)
Example: “Q: Take the last letters of the words in "Elon Musk" and concatenate them. A: The last letter of "Elon" is "n". The last letter of "Musk" is "k". Concatenating them is "nk". The answer is nk.”
For certain tasks, the ground-truth output is sometimes ambiguous. Providing instructions lets us go around it.
For example, take the news summarization task. The definition of a good summary of a news article can be ambiguous and varies from person to person. With new instruction-based LLMs, we can simply ask “generate a summary for the following news article“ and get the desired output.
Instruct-GPT is a new iteration of the original GPT-3 model by OpenAI. Back at the beginning of 2022, OpenAI researchers released a paper showing that they can train GPT-3 to follow instructions from human feedback.
OpenAI researchers used the following three steps to make the original GPT-3 model follow instructions:
Hire human labelers to provide examples of instructions with the appropriate response. Train GPT-3 on the collected dataset.
Train the reward model that ranks different responses from GPT-3 trained on the collected dataset of instructions & responses according to human preference.
Further fine-tuned the GPT-3 on an additional set of instructions using the trained reward model as a critic.
While it is not exactly clear what type of large language model is now behind the current text-davinci-002 at the OpenAI console, the recent OpenAI member remarks and performance on instruction-based tasks make me believe that text-davinci-002 is a GPT-3 tuned to follow human instructions with some secret sauce not released in their paper.
InstructGPT model in action with a working example
Now let’s put the InstructGPT model into action!
Come follow me as I show you how to build software using InstructGPT that can help e-commerce owners increase their outreach and write more engaging ad copy for all social media networks, even if they have little to no copywriting experience.
This prototype can be further improved and developed into a full SaaS, or it can be used as a tool to help you stand out when applying for jobs as an NLP engineer.