Guest Lecture by Zora Wang and Nikitha Rao - Code Generation (11/9/2023)
- Lexical-based evaluation
- Domain divergence
- Test creation
- Functional complexity
- Aligning code models
- Highly Recommended Reading: Evaluating Large Language Models Trained on Code
- Recommended Reading: A Systematic Evaluation of Large Language Models of Code
- Recommended Reading: PLUR: A unifying, graph-based view of program learning, understanding, and repair
- Recommended Reading: Code Llama: Open Foundation Models for Code
- Recommended Reading: StarCoder: may the source be with you!
- Recommended Reading: WizardCoder: Empowering Code Large Language Models with Evol-Instruct
- Recommended Reading: CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
- Recommended Reading: Execution-based Code Generation using Deep Reinforcement Learning
- Recommended Reading: Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation
- Recommended Reading: Addressing Compiler Errors: Stack Overflow or Large Language Models?
- Recommended Reading: Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
- Reference: CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
- Reference: Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow
- Reference: MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
- Reference: CodeBERT: A Pre-Trained Model for Programming and Natural Languages
- Reference: CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
- Reference: Measuring Coding Challenge Competence With APPS
- Reference: Program Synthesis with Large Language Models
- Reference: DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
- Reference: Natural Language to Code Generation in Interactive Data Science Notebooks
- Reference: Execution-Based Evaluation for Open-Domain Code Generation
- Reference: ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
- Reference: RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
- Reference: SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
- Reference: CodeT5+: Open Code Large Language Models for Code Understanding and Generation
- Reference: OctoPack: Instruction Tuning Code Large Language Models
- Reference: CodeT: Code Generation with Generated Tests
- Reference: RLTF: Reinforcement Learning from Unit Test Feedback
- Reference: Teaching Large Language Models to Self-Debug
- Reference: LEVER: Learning to Verify Language-to-Code Generation with Execution
- Reference: CAT-LM Training Language Models on Aligned Code And Tests
- Reference: Scaling Instruction-Finetuned Language Models
- Reference: Training language models to follow instructions with human feedback
Slides: Code Slides 1
Slides: Code Slides 2