Building AI products using Large Language Models
How to effectively and efficiently build AI products using Large Language Models
This is the first post in the series of guest posts written by Shubham Saboo.
Thanks, Shubham for contributing to the Human Language Technology newsletter.
Outline
Intro
But what are Large Language Models exactly?
What was building AI applications before LLMs like?
How have Large Language Models changed the NLP game?
Prompt Engineering
Intro
Language is a complex and ever-changing system. Conventionally AI models have been too small or too simple to generate proper text given the context.
Large Language Models (LLMs) have been developed in order to more accurately capture the complexity of the language while also generating text as naturally as humans do. LLMs have gained popularity in recent years, mainly due to the success of deep learning and the availability of a large amount of unlabeled data on the web.
But what are Large Language Models exactly?
You’re probably familiar with this from the autocomplete feature on your phone. For instance, if you type “good”, autocomplete might come up with suggestions like “morning” or “luck.” Natural language processing applications such as autocomplete rely heavily on such language models.
Large Language Models (LLMs) are all about predicting the next word correctly. See the following examples 👇
Ex-1: Spiderman was bitten by a radioactive spider which gave him his _____ (powers)
Ex-2: Spiderman was bitten by a radioactive spider which gave him his superpowers. He is a brave and courageous hero who is always ready to face any _____ (danger)
The bigger the context, the better you get at predicting the next word!
LLMs are neural networks that have been trained to predict the next word on a large corpus of text (terabytes of text) crawled from the internet. Thanks to the large size and scale of the data, these models learn the relationships between words in sentences very well. Then this understanding of language is used in various tasks such as text generation, machine translation, question answering, etc.
What was building AI applications before LLMs like?
Before Large Language Models (LLMs) existed, building AI applications was a very different and time-consuming process. Due to the not sufficient amount of labeled data available, AI systems were much less effective at understanding and generating natural language. This was largely due to not using pre-trained large language models, which didn’t understand the relationships between the words in sentences very well. Developers had to start from scratch every time for their AI application which performed poorly.
In order to build a successful natural language processing (NLP) application, developers had to overcome a lot of hurdles. A lot of time was spent understanding the domain of the natural language problem, collecting, and cleaning the data. Developers then would train their small model themselves, which typically didn’t result in a very good performance to be used in the real world. After multiple iterations of collecting data and training the model, developers would eventually deploy and host it themselves. As you can see there were many steps involved in getting the NLP model to work; which took a lot of time, and at the end of the day, success wasn’t guaranteed.
How have Large Language Models changed the NLP game?
One of the most exciting aspects of LLMs is that they can perform domain-specific NLP tasks in a zero-shot way. I.e. without collecting any labeled data for your task Large Language Models (LLMs) predicted the correct label out of the box. Sometimes LLMs even performed better than smaller models that were specifically trained for this task only using domain-specific data.
Due to the size and scale of large language models, training them from scratch is not for a large majority of people. Instead, access to these models is given in the form of a simple-to-use text-in, text-out application programming interface (API).
Granting access to LLMs via API made it relatively easy to use and build applications for various use cases which lowered the barriers to entry for anyone, irrespective of background.
Here are a few companies that offer access to Large Language Models (LLMs) in the form of an API. They also come with a user interface to play and experiment with your application in the web browser itself.
List of companies offering Large Language Models with API access:
Example of the web UI to play with Large Language Models
In the early phase of development in LLMs, OpenAI’s GPT-3 emerged as the most innovative AI model, followed by a wave of API-based large language models. Its flexibility to perform generalized tasks with near human efficiency is unmatched by any other AI model that has been developed before which is what makes it so exciting!
Now, let’s understand the core pillar behind LLMs that allows us to communicate with them in the language they understand.
Prompt Engineering
We use prompts as input to the language models for performing different tasks and get responses as output. Prompts allow us to communicate with sophisticated language models in simple English, making it really easy for anyone to build AI applications.
Let’s look at an example of a prompt and a response:
Prompt is the input given to the language model.
Ex - Berlin is the capital of ______
The response is the output text generated by the language model to the input prompt.
Ex - Berlin is the capital of Germany.
The art and science of giving clear input text (instructions) to a language model such that it generates desired output is called Prompt Engineering. It is the key to unlocking the true potential of LLMs.
There is a direct relation between the prompt you provide and the quality of the completion you get. The structure and arrangement of your words heavily influence the output.
“The secret to writing good prompts is understanding what the model knows about the world.”
Your job is to get the model to use the information it already has to generate useful results. In the game of charades (word guessing game), the performer gives the other players just enough information to figure out the secret word. Similarly, with language models, we give the model just enough context (in the form of a training prompt) to figure out patterns and perform the given task.
Here is a golden rule on how to go about prompt engineering:
As a rule of thumb, while designing the training prompt you should aim towards getting a zero-shot response from the model.
If that isn’t possible, move forward with a few examples rather than providing it with an entire dataset.
The standard flow for training prompt design should look like:
Zero-Shot (no examples) → Few Shots (few examples) → Corpus-based Priming (all examples from dataset).
GPT-3 is one of the most capable and easily-accessible language models that today powers a host of businesses.
Let’s now look at how you can get started with OpenAI GPT-3 using the simple-to-use playground and then take your use case from playground to production.
We will cover the hands-on GPT-3 tutorial in the next post
Hearing about LLMs everywhere these days, in addition to seeing an endless amount of new papers on them, can make the area seem a little intimidating for those of us who lack some experience with them.
Shubham, you did a great job at making them easy to understand and very approachable.
I really enjoyed reading this.
You’ve gotten the proverbial ball in my head rolling.
Thanks to you and Hal for this!