You are cordially invited to my thesis defense on April 10th.
Title: On Training, Inference, and Sample Efficiencies of Language Models
Date: 04/10/2023
Time: 12:00pm - 1:00 pm EST
Location: Zoom
Meeting URL: https://gatech.zoom.us/j/99505946703?pwd=ZlgvYTZjQjJLbnplTUtxSHJvelJtdz09
Meeting ID: 995 0594 6703
Passcode: 644944
Simiao Zuo
Machine Learning PhD Student
School of Industrial and Systems Engineering
Georgia Institute of Technology
Committee
1 Dr. Tuo Zhao (Advisor, ISyE, Georgia Tech)
2 Dr. Chao Zhang (CSE, Georgia Tech)
3 Dr. Yajun Mei (ISyE, Georgia Tech)
4 Dr. Anqi Wu (CSE, Georgia Tech)
5 Dr. Xiaodong Liu (Microsoft Research, Microsoft)
Abstract
Large language models have demonstrated superior performance in various natural language processing tasks such as machine translation, natural language understanding, and natural language generation. However, despite the recent developments, language models still face critical challenges. In this thesis, we investigate efficient training and inference algorithms. We also investigate the sample efficiency of training language models.
In Chapter 2, we improve training efficiency of sparsely activated models by proposing a novel Mixture-of-Experts architecture. In Chapter 3, we propose state space augmented Transformer models, facilitating efficient modeling of long sequences. In Chapter 4, we target for inference efficiency of pre-trained language models. Specifically, we propose a knowledge distillation algorithm which adapts a pre-trained model into a Mixture-of-Experts model. In Chapter 5, we design a label efficient self-training algorithm. Specifically, we integrate differentiable teacher models into the conventional teacher-student self-training framework.