You are cordially invited to my thesis defense on April 10th.

 

Title: On Training, Inference, and Sample Efficiencies of Language Models

 

Date: 04/10/2023 

Time: 12:00pm - 1:00 pm EST 

Location: Zoom

Meeting URL: https://gatech.zoom.us/j/99505946703?pwd=ZlgvYTZjQjJLbnplTUtxSHJvelJtdz09

Meeting ID: 995 0594 6703

Passcode: 644944

 

Simiao Zuo

Machine Learning PhD Student

School of Industrial and Systems Engineering
Georgia Institute of Technology

 

Committee

1 Dr. Tuo Zhao (Advisor, ISyE, Georgia Tech)

2 Dr. Chao Zhang (CSE, Georgia Tech)

3 Dr. Yajun Mei (ISyE, Georgia Tech)

4 Dr. Anqi Wu (CSE, Georgia Tech)

5 Dr. Xiaodong Liu (Microsoft Research, Microsoft)

 

Abstract

Large language models have demonstrated superior performance in various natural language processing tasks such as machine translation, natural language understanding, and natural language generation. However, despite the recent developments, language models still face critical challenges. In this thesis, we investigate efficient training and inference algorithms. We also investigate the sample efficiency of training language models.

 

In Chapter 2, we improve training efficiency of sparsely activated models by proposing a novel Mixture-of-Experts architecture. In Chapter 3, we propose state space augmented Transformer models, facilitating efficient modeling of long sequences. In Chapter 4, we target for inference efficiency of pre-trained language models. Specifically, we propose a knowledge distillation algorithm which adapts a pre-trained model into a Mixture-of-Experts model. In Chapter 5, we design a label efficient self-training algorithm. Specifically, we integrate differentiable teacher models into the conventional teacher-student self-training framework.