How ChatGPT and Large Language Models Work. Simply Explained

This in-depth article was written by Francesco Galvani, CEO of Deep Marketing, branding strategy instructor, science communicator and -- most importantly -- a developer of neural networks and artificial intelligence systems for marketing and finance since 2003. Enjoy the read!

What is machine learning

Imagine the human brain, with its billions of interconnected neurons communicating with each other. Now imagine creating a similar "artificial brain" made of software and hardware. In simple terms, that is what modern AI systems like GPT-4 and ChatGPT are.

Instead of neurons, these systems have what are called "nodes." And instead of synapses, they have mathematical "connections" between nodes. Just as our brain learns through experience, these artificial neural networks also learn by analyzing enormous amounts of data.

The more data they process, the "smarter" the system becomes and the more capable it is of generating text, answering questions, translating languages, and much more. This is a form of machine learning.

To understand how GPT-4 and ChatGPT work, we first need to understand what "machine learning" is. It is a field of artificial intelligence where computers learn and evolve through experience, just like human beings.

For example, suppose we want to train a computer to recognize cats and dogs in images. We provide it with thousands of images of cats and dogs, each correctly labeled ("this is a cat," "this is a dog"). By examining this data, the system slowly learns to recognize patterns and distinguishing features of cats and dogs, until it is able to categorize new images it has never seen before but that share many morphological elements with those it was trained on.

As it receives more data and feedback (a process called "supervised learning"), the system continues to improve its accuracy. And with enough data and computing power, these software systems can match and even surpass human performance in specific tasks.

Artificial neural networks

Modern artificial intelligence systems are based on artificial neural networks, inspired (albeit in a highly simplified and biologically unrealistic way) by the human brain. Neural networks are composed of interconnected nodes organized in "layers".

Each layer contains many artificial neurons -- our nodes. The layers are generally stacked on top of each other (although more complex architectures exist).

Each node receives input from other nodes, usually from the layer below, performs simple mathematical calculations on those inputs, and produces an output. By connecting many nodes across different layers, you get a network capable of modeling very complex relationships between inputs and outputs.

For example, the inputs could be pixels from images of cats and dogs. As this data passes through the artificial neural network, patterns are extracted and the final outputs identify the image as "cat" or "dog."

A close-up of a blue and orange striped background.

Training neural networks

The key to building truly "intelligent" neural networks is training them with enormous amounts of relevant data, a computationally intensive process known as "deep learning." The more data we provide, the more sophisticated patterns and rules the system can learn.

Researchers collect massive datasets called "corpus" (plural "corpora") to train systems. For example, to train GPT-4 and ChatGPT, text corpora containing billions of web pages, books, articles, and user-generated content were used.

The neural network's internal parameters are gradually adjusted to improve its performance, reducing errors during the supervised learning process we discussed earlier.

Language models -- Large Language Models

A particularly powerful type of neural network is "language models" (or large language models -- LLMs), designed to process and generate human text and dialogue. They can translate languages, answer questions, write coherent texts, and even generate artificial images based on textual descriptions.

Language models are trained on gigantic text corpora in different languages to learn the complex rules and relationships in human language.

For instance, they learn that certain words and phrases are more likely to follow other specific words and phrases. They also learn the structure and grammar of different sentences and paragraphs.

As the model is trained on more text, its understanding of language becomes deeper and more sophisticated. It can then generate more natural, human-like text, as well as better understand human language.

The two state-of-the-art models today, in our opinion, are Claude, developed by Anthropic, and ChatGPT by OpenAI.

A close-up of a purple molecule-like structure on a black background.

What is GPT-4

GPT-4 is the latest version of the Generative Pre-trained Transformer model developed by OpenAI, an AI company founded by well-known figures, including Elon Musk and Sam Altman.

As the name suggests, GPT-4 is the evolution of GPT-3 and continues to push the boundaries of what generative AI can do. It is trained on an enormous text corpus that includes books, articles, web pages, and much more.

GPT stands for "Generative Pre-trained Transformer." Generative means it can generate new, coherent text. Pre-trained means it has already been trained on a large amount of data and does not require further training for specific applications. And Transformer refers to the underlying computational architecture. We will get to that shortly.

So in a nutshell, GPT-4 is a massive pre-trained AI model that can generate extremely realistic text and even artificial images from simple text prompts provided by users.

It can translate languages, answer questions, summarize long texts, correct grammatical and typing errors, write creative stories, and more. In some text and image generation tasks, it approaches human-level capabilities.

ChatGPT is derived from GPT-4 but optimized for conversational chat. It was developed by OpenAI to have natural, human-like discussions.

It can answer questions in detail, acknowledge when it does not know something, correct inaccurate information, and even refuse to provide harmful content. All in an interactive chat format.

ChatGPT is trained through a process called "reinforcement learning from human feedback," where it is taught what constitutes a high-quality response through millions of examples.

This means it can maintain longer and more in-depth conversations compared to previous chatbots. It can even learn new concepts as it talks with users, making it more useful over time.

How ChatGPT's Transformer architecture works

In the past, artificial intelligence systems that processed natural language relied on traditional approaches such as recurrent neural networks. These models had limitations in understanding long and complex sentences and contexts. Then, in 2017, a revolutionary new type of model called the "transformer" was introduced.

Transformers introduced a new approach based on attention mechanisms, which allow them to better analyze the relationships between words in a sentence. Transformers were a huge step forward and paved the way for the modern large language models we use today.

What is attention and how it works

But what exactly is the "attention mechanism" that makes transformers so powerful?

You can think of attention as a "gaze" that the language model uses to focus on the most relevant parts of the text. Just as we humans do when reading, the model decides which words or phrases to pay attention to in order to better understand the overall meaning. From a technical standpoint, attention mechanisms assign an "attention score" to each word in the sentence. Words with higher scores are "taken into account" more thoroughly by the model to extract relationships and meanings. This approach is much more effective than previous ones because it allows the model to truly focus on the key parts of the text, just like we humans do!

Interestingly, the power of an LLM (large language model) system is tied to the context window in which the attention mechanisms can exercise their power: the more context the system can process, the more intelligent it will appear!

Internal structure of a language model

Now that we generally understand attention mechanisms, let's take a closer look at the internal architecture of these language models.

First, the input text data (for example, a sentence) is converted into numerical vectors through a process called "embedding" -- or "word embeddings." Each word is transformed into a vector of numbers that mathematically captures its meaning.

These vectors are then processed through a series of processing layers, as we saw in the previous chapters. Each layer applies attention operations and creates progressively more abstract and contextual representations of the input text.

The final layers of the model ultimately generate an output vector that represents the entire sentence and can be used for various purposes, such as classification, translation, or generating new text.

A close-up of a dog's face on a green background.

Where it all began

For example, one of the first and most famous language models based on transformers is BERT (not "BART"), developed by Google in 2018. BERT achieved stunning results in tasks such as sentence completion and identifying relationships between keywords.

Subsequent models like GPT-3 and PaLM were trained on enormous amounts of data and contain billions of parameters. They can generate incredibly human-like text and are even capable of carrying on simple conversations.

What is the difference between neural networks and the human brain?

There are some important differences between artificial neural networks and the human brain:

The human brain is extremely complex, with over 100 billion neurons and trillions of connections, not to mention the complexity of neurotransmitters and supporting structures. Artificial neural networks are much simpler, typically with only a few layers of artificial neurons. Even GPT-4 does not come anywhere close to biology.
The human brain learns continuously from experience, while neural networks must be trained on large datasets. Once trained, neural networks do not continue to learn.
The human brain is capable of performing a wide variety of tasks, from vision to language to planning. Neural networks are designed for specific tasks such as image recognition or word generation. Multimodal networks exist, but they have nothing to do with the general intelligence of the human mind.
The human brain is fault-tolerant and can continue to function even when damaged. Artificial neural networks are fragile and can fail completely if their parameters are altered.
The inner workings of the human brain remain largely unknown. In contrast, the parameters and architecture of artificial neural networks are fully known.

The future of GPT-4, ChatGPT, and beyond

So, what does the future hold for these artificial intelligence systems and how might they evolve?

First, they will become much more powerful. Moore's Law predicts that computer processing power roughly doubles every 1-2 years. This means that by 2030, models could have a thousand trillion parameters or more!

This could lead to much more versatile AI systems with near-human performance across many different tasks. They could even surpass humans in specific areas such as text and artificial image generation. But increasing computational power alone is not enough. New algorithms and architectures will be needed to effectively leverage all this extra power. For example, the Transformer architecture has already revolutionized language performance compared to previous models.

Another area of improvement will be the integration of multiple data modalities beyond just text, such as images, audio, and video. This will enable more aware AI systems capable of understanding the world.

Finally, the key will be ensuring that these systems are truly safe, controllable, and ethical as they become more powerful.