The AI Edition

The AI Edition

Share this post

The AI Edition
The AI Edition
Large Language Models that Follow Instructions

Large Language Models that Follow Instructions

Learn about the new version of GPT-3 that follows instructions and see the demo of the application that uses it in action. This could be a game changer for how we interact with machines.

Nov 01, 2022
∙ Paid
2

Share this post

The AI Edition
The AI Edition
Large Language Models that Follow Instructions
2
Share

TL;DR: Build software using large language models that follow your instructions.

Outline

  • Large Language Models that follow instructions

  • InstructGPT model in action with a working example

  • Extensions and variations

  • What’s next?

    Human Language Technology is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


Hi all,

Following our series on building language applications, we will be covering an emergent way of using language models (LLMs) for your application.

In the previous posts, we covered what are large language models (LLMs) like GPT-3 and how to use them when building your AI software applications.

Human Language Technology
Building AI products using Large Language Models
This is the first post in the series of guest posts written by Shubham Saboo. Thanks, Shubham for contributing to the Human Language Technology newsletter.Human Language Technology is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber…
Read more
3 years ago · 1 like · 1 comment · hal

In a nutshell, LLMs (like GPT-3) are neural networks that have been trained to predict the next word on a large corpus of text (terabytes of text) crawled from the internet. Once trained, we can use these LLMs using a simple text-in-text-out interface.

That’s we provide very few examples (sometimes 2-3 examples are sufficient) as a so-called prompt, and GPT-3 uses them to perform a certain task.

Prompt Example:

Convert movie titles into emoji.

Back to the Future: 👨👴🚗🕒

Batman: 🤵🦇

Transformers: 🚗🤖

Star Wars:

Prompting has been a de-facto way of using large language models in production (see writing assistant software like Jasper, Copy ai). Engineers play around with different examples in the prompt until they get a satisfactory output from GPT-3.

What’s missing in classical prompting is providing a narrative and instructions behind a task. GPT-3 needs to figure out the task you wanted it to perform given input/output pairs only.

Wouldn’t it be nice to provide some instructions to GPT-3 and ask it to follow these instructions in addition to providing some examples? Enter InstructGPT and other instruction-based large language models.

Large Language Models that follow instructions

What do the large language models (LLMs) that follow instructions offer to the developer? How can the developer utilize them to build software better and faster?

New capabilities unlocked with LLMs that follow instructions:

  • Perform the task by explaining the model that you would like to generate.

    • Example: “Complete the paragraph summarizing what has been written before“.

  • In addition to relying on examples, the large language model can now use the clues from the instructions to solve the task.

    • Chain-of-thought is one way to provide instructions to the model with step-by-step reasoning steps to get it to the final output (we will talk about it more later in the post)

    • Example: “Q: Take the last letters of the words in "Elon Musk" and concatenate them. A: The last letter of "Elon" is "n". The last letter of "Musk" is "k". Concatenating them is "nk". The answer is nk.”

  • For certain tasks, the ground-truth output is sometimes ambiguous. Providing instructions lets us go around it.

    • For example, take the news summarization task. The definition of a good summary of a news article can be ambiguous and varies from person to person. With new instruction-based LLMs, we can simply ask “generate a summary for the following news article“ and get the desired output.

Instruct-GPT is a new iteration of the original GPT-3 model by OpenAI. Back at the beginning of 2022, OpenAI researchers released a paper showing that they can train GPT-3 to follow instructions from human feedback.

Aligning Language Models to Follow Instructions
Three steps outlining how Instruct GPT model is trained

OpenAI researchers used the following three steps to make the original GPT-3 model follow instructions:

  1. Hire human labelers to provide examples of instructions with the appropriate response. Train GPT-3 on the collected dataset.

    Set of different use cases that OpenAI used to collect instruction/response dataset
  2. Train the reward model that ranks different responses from GPT-3 trained on the collected dataset of instructions & responses according to human preference.

  3. Further fine-tuned the GPT-3 on an additional set of instructions using the trained reward model as a critic.

While it is not exactly clear what type of large language model is now behind the current text-davinci-002 at the OpenAI console, the recent OpenAI member remarks and performance on instruction-based tasks make me believe that text-davinci-002 is a GPT-3 tuned to follow human instructions with some secret sauce not released in their paper.

Twitter avatar for @janleike
Jan Leike @janleike
@sarahwiegreffe I agree. While OpenAI doesn't like talking about exact model sizes / parameter counts anymore, documentation should definitely be better. text-davinci-002 isn't the model from the InstructGPT paper. The closest to the paper is text-davinciplus-002.
Twitter avatar for @janleike
Jan Leike @janleike
PSA: If you want to compare InstructGPT to a base model in your research, the closest comparison is "text-davinciplus-002" with "davinci" (you might need to request access to the former). It's not a super clean comparison, because we haven't deployed the exact paper models.
6:50 PM ∙ Oct 24, 2022
30Likes2Retweets

InstructGPT model in action with a working example

Now let’s put the InstructGPT model into action!

Come follow me as I show you how to build software using InstructGPT that can help e-commerce owners increase their outreach and write more engaging ad copy for all social media networks, even if they have little to no copywriting experience.

This prototype can be further improved and developed into a full SaaS, or it can be used as a tool to help you stand out when applying for jobs as an NLP engineer.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 hal
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share