Learning 2 - Structured Learning Algorithms (11/17/2023)
Content:
- Reinforcement Learning
- Minimum Risk Training
- The Structured Perceptron
- Structured Max-margin Objectives
- Simple Remedies to Exposure Bias
- Required Reading: Deep Reinforcement Learning Tutorial (Karpathy 2016)
- Reference: Goldberg Book Chapter 19-19.3
- Reference: Course in Machine Learning Chapter 17 (Daume)
- Reference: Reinforcement Learning Textbook (Sutton and Barto 2016)
- Reference: Minimum Risk Training for NMT (Shen et al. 2015)
- Reference: REINFORCE (Williams 1992)
- Reference: Co-training (Blum and Mitchell 1998)
- Reference: Revisiting Self-training (He et al. 2020)
- Reference: Adding Baselines (Dayan 1990)
- Reference: Sequence-level Training for RNNs (Ranzato et al. 2016)
- Reference: Minimum Risk Training for Neural Machine Translation (Shen et al. 2016)
- Reference: Proximal Policy Optimization Algorithms (Schulman et al. 2017)
- Reference: Direct Preference Optimization (Rafailov et al. 2023)
- Reference: Quark: Controllable Text Generation with Reinforced Unlearning (Rafailov et al. 2023)
- Reference: Experience Replay (Lin 1993)
- Reference: Neural Q Learning (Tesauro 1995)
- Reference: Intrinsic Reward (Schmidhuber 1991)
- Reference: Intrinsic Reward for Atari (Bellemare et al. 2016)
- Reference: Reinforcement Learning for Dialog (Young et al. 2013)
- Reference: End-to-end Neural Task-based Dialog (Williams and Zweig 2016)
- Reference: Neural Chat Dialog (Li et al. 2016)
- Reference: User Simulation for Learning in Dialog (Schatzmann et al. 2007)
- Reference: RL for Mapping Instructions to actions (Branavan et al. 2009)
- Reference: Deep RL for Mapping Instructions to Actions (Misra et al. 2017)
- Reference: RL for Text-based Grames (Narasimhan et al. 2015)
- Reference: Incremental Prediction in MT (Grissom et al. 2014)
- Reference: Incremental Neural MT (Gu et al. 2017)
- Reference: RL for Information Retrieval (Narasimhan et al. 2016)
- Reference: RL for Query Reformulation (Nogueira and Cho 2017)
- Reference: RL for Coarse-to-fine Question Answering (Choi et al. 2017)
- Reference: RL for Learning Neural Network Structure (Zoph and Le 2016)
- Reference: Conditional Random Fields (Lafferty et al. 2001)
- Reference: Structured Perceptron (Collins 2002)
- Reference: Structured Hinge Loss (Taskar et al. 2005)
- Reference: SEARN (Daume et al. 2006)
- Reference: DAgger (Ross et al. 2011)
- Reference: Dynamic Oracles (Goldberg and Nivre 2013)
- Reference: Training Neural Parsers w/ Dynamic Oracles (Ballesteros et al. 2016)
- Reference: Word Dropout (Gal and Ghahramani 2015)
- Reference: RAML (Norouzi et al. 2016)
- Reference: Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing (Fried and Klein 2018)
- Reference: Learning to summarize from human feedback (Stiennon et al. 2020)
- Reference: WebGPT: Browser-assisted question answering with human feedback (Nakano et al. 2022)
- Reference: Scaling Laws for Reward Model Overoptimization (Gao et al. 2023)
Slides: Structured Prediction Slides
Sample Code: Structured Prediction Code Examples