Reference¶

Papers¶

The original GPT-3 paper Language Models are Few-Shot Learners: it is the ultimate resource to understand the new techniques applied by Open AI.
The GPT-2 paper Language Models are Unsupervised Multitask Learners: the GPT-3 model remains largely the same as the GPT-2 counterpart.
Attention Is All You Need: introduced the encoder-decoder transformer, which is the most tricky part of the GPT model.

The Annotated GPT-2 is an annotated version of the GPT-2 paper with plenty of PyTorch code.
This GitHub repo is an PyTorch implementation of the GPT-2 by Hugging Face.
The minGPT is a minimal PyTorch re-implementation.
Yet another GPT-2 implementation via PyTorch
The Annotated Transformer explains in code how the transformer is implemented, and is endorsed by the author of “The Annotated GPT-2”.
The PyTorch tutorial
- tutorial on training a sequence-to-sequence model that uses the nn.Transformer module.

The document[1] of the official OpenAI library:

The transformers from Hugging Face provides APIs to download and train pre-trained models, including GPT-2 and GPT Neo.

The post The GPT-3 Architecture, on a Napkin explains as detailed as possible on the GPT-3 architecture, which is super useful.
This article is an entry point of several GPT-3 related resources, including application tutorials.
Alberto Romero’s medium
- A Complete Overview of GPT-3
- Understanding GPT-3 In 5 Minutes
Jay Alammar’s blog
GPT-J
- How to Fine-Tune GPT-J

Back to GPT.