Representation 1 - Pre-training for Fine-Tuning (9/19/2023)
Content:
- Simple overview of multi-task learning
- Sentence embeddings
- BERT and variants
- Other language modeling objectives
Reading Material
- Highly Recommended Reading: Illustrated BERT (Alammar 2019)
- Reference: Language Model Transfer (Dai et al. 2015)
- Reference: ELMo: Deep Contextualized Word Representations (Peters et al. 2018)
- Reference: Sentence BERT (Reimers and Gurevych 2019)
- Reference: BERT: Bidirectional Transformers (Devlin et al. 2018)
- Reference: RoBERTa: Robustly Optimized BERT (Liu et al. 2019)
- Reference: XLNet: Autoregressive Training w/ Permutation Objectives (Yang et al. 2019)
- Reference: ELECTRA: Pre-training Text Encoders as Discriminators (Clark et al. 2020)
- Reference: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al. 2019)
- Reference: BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension (Lewis et al. 2019)
- Reference: Don't Stop Pretraining: Adapt Language Models to Domains and Tasks (Guruangan et al. 2020)
- Reference: Should we be Pre-training? (Dery et al. 2021)
- Reference: Automating Auxiliary Learning (Dery et al. 2022)
Slides: Pre-training Slides
Sample Code: Pre-training Code Examples