Training and Predicting¶
1. input: "The quick brown fox jumps", label: "over"
2. input: "The quick brown fox jumps over", label: "the"
3. input: "The quick brown fox jumps over the", label: "lazy"
4. input: "The quick brown fox jumps over the lazy", label: "dog"
For both the pre-training and fine-tuning:
unsupervised training
model: autoregressive, several blocks of transformer decoders
objective: predict the next word (maximize the probability of the next word)
the input should include both the sequence of tokens and the mask
it follows a sentence-by-sentence mode, with each sentence is followed by an
EOS
token
Pre-Training¶
dataset: a total of 300 billion tokens
Meta-Learning¶
The meta-learning is the inner-loop / outer-loop structure in the learning process. The in-context learning refers to the inner loop of meta-learning.
sequence 1 inner loop (arithmetic)
3+5=13, 7+2=9, ..., 9+8=17
sequence 2 inner loop (correct words)
gaot=>goat, sakne=>snake, dcku=>duck
...
(the end of the outer loop)
Each inner-loop represents a task to train a specific set of skills, and repeated sub-tasks can be embedded within a single sequence
Meta-learning develops a broad set of skills and pattern recognition abilities at training time within the same model
With these abilities the model can adapt rapidly to desired tasks
Some of the results are still far inferior to fine-tuning
The efficiency of meta-learning improves with scale
Questions¶
does in-context learning means few-shots?
Examples¶
The demo training code can be found here, implemented by leveraging the transformers package.
Semantic Classification¶
Tweet: I hate it when my phone battery dies. Sentiment: Negative
Tweet: My day has been great so far. Sentiment: Positive
Tweet: This is the link to the article. Sentiment: Neutral
Tweet: This new music video was incredible. Sentiment:
Answer: Positive
Translate a Sentence¶
Me: Le singe est dans l'arbre Her: The monkey is in the tree
Me: la plume de ma tante est sur la table Her: My aunt's pen is on the table
Me: j'aime bien le jambon Her: I like the chair
Me: Qu'est-ce que c'est que ca? Her: What do you mean?
Me: Comment tu t'appeles? Her: I am called Bob
Me: Où est le garçon? Her: Where is the boy?
Me: Qui est le president des Etats-Unis? Her:
Answer: Who is the president of the United States?
Netflix Movie Classification¶
Description: When Lebanon's Civil War deprives Zozo of his family, he's
left with grief and little means as he escapes to Sweden in search of
his grandparents.
Type: Dramas, International Movies
Description: A scrappy but poor boy worms his way into a tycoon's
dysfunctional family, while facing his fear of music and the truth about
his past.
Type: Dramas, International Movies, Music & Musicals
Description: In this documentary, South African rapper Nasty C hits the
stage and streets of Tokyo, introducing himself to the city's sights,
sounds and culture.
Type: Documentaries, International Movies, Music & Musicals
Description: Dessert wizard Adriano Zumbo looks for the next “Willy
Wonka” in this tense competition that finds skilled amateurs competing for
a $100,000 prize.
Type: International TV Shows, Reality TV
Description: This documentary delves into the mystique behind the
blues-rock trio and explores how the enigmatic band created their iconic
look and sound. Type: Documentaries, International Movies, Music & Musicals
Type:
Answer: Documentaries, International Movies, Music & Musicals
Fine-Tune Training¶
***** Running training *****
Num examples = 7004
Num Epochs = 1
Instantaneous batch size per device = 2
Total train batch size (w. parallel, distributed & accumulation) = 2
Gradient Accumulation steps = 1
Total optimization steps = 3502
{'loss': 0.4458, 'learning_rate': 3.677248677248677e-05, 'epoch': 0.29}
{'loss': 0.3704, 'learning_rate': 2.2075249853027632e-05, 'epoch': 0.57}
{'loss': 0.3546, 'learning_rate': 7.37801293356849e-06, 'epoch': 0.86}
100%|█████████████████████████████████████████████████████████████████████████████████| 3502/3502 [4:53:35<00:00, 4.90s/it]
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 17615.3528, 'train_samples_per_second': 0.398, 'train_steps_per_second': 0.199, 'train_loss': 0.38377865323470295, 'epoch': 1.0}
Back to GPT.