Thesis Title: Efficient Robust Algorithms for Linear Discriminant Analysis and Sequential Matching Problems
Thesis Committee:
Dr. Yajun Mei (advisor), Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Tuo Zhao, Industrial and Systems Engineering, Georgia Institute of Technology
Dr. David Goldsman, Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Enlu Zhou, Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Ethan X. Fang, Department of Biostatistics & Bioinformatics, Duke University
Dr. Ofer Sadan, School of Medicine, Emory University
Date and Time: May 9 at 10:00am
In-Person Location: Groseclose 403
Meeting Link: https://gatech.zoom.us/j/94665025570?pwd=NThkWkoyTWxQSnJTMFFzSW9nRjREZz09
Abstract:
Data science, machine learning, and statistics are useful to assist making data-driven decisions in many modern applications, and some challenges rise in analyzing real-world datasets, such as high dimensionality, robustness, and computational efficiency. This dissertation investigates three specific topics in statistical machine learning: (1) pivotal method for high-dimensional linear discriminant analysis (LDA); (2) robust algorithm for LDA under data contamination; and (3) efficient algorithm for sequential assignment with unknown utility.
In Chapter 1, we propose a pivotal method for high-dimensional linear discriminant analysis (LDA) that enjoys tuning-insensitive property. We term our method PivotAl liNear Discriminant Analysis (PANDA). Our method conducts parameter estimation under a pivotal estimation framework and only needs to solve a single convex optimization problem when both means and variances are unknown for both classes of training data. Theoretically, our method achieves comparable convergence rates as existing methods in terms of both estimation error and misclassification rate.
In Chapter 2, we propose a computationally efficient robust algorithm for LDA under data contamination, where a fraction of sample data might be corrupted by some adversary. Our main ideas are as follows. We first identify the outliers in each class and robustly estimate the mean, and then apply our developed PANDA method for uncontaminated data to estimate the discriminant direction in LDA with data contamination. Theoretical properties of the proposed algorithm are established in terms of both the error in estimating the optimal projection vector and the misclassification rate.
In Chapter 3, we develop an efficient algorithm for sequential assignment with unknown utility, with the objective of nearly maximizing the overall utility for each time. Our proposed algorithm is to use stochastic binary bandit feedback to adaptively estimate the unknown utilities through the logistic regression, and then to combine the Upper Confidence Bound (UCB) algorithm in the multi-armed bandit problem with the Hungarian algorithm in the assignment problem. We derive the theoretical bounds of our algorithm for both the estimation error and the total regret, and numerical studies are also conducted to illustrate the usefulness of our algorithm.
We conclude the dissertation in Chapter 4, where we summarize our contributions, and also highlight several potential research topics for future investigation.