Title: Statistical Theory for Neural Network-Based Learning

 

Committee:

Dr. Xiaoming Huo (advisor), School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Yao Xie, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Shihao Yang, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Mayya Zhilova, School of Mathematics, Georgia Institute of Technology

Dr. Yajun Mei, School of Global Public Health, New York University

 

Date and Time: Wednesday, August 28th, 09:00 AM - 10:30 AM EDT

Location: Groseclose 303

Virtual Link: Microsoft Teams 

Meeting ID: 253 390 754 819

Passcode: YtgR27

Abstract:

Data have now become an indispensable part of our lives. This of course was made possible with the ease at which we can now collect, curate, display and manipulate data. However, the power and usefulness of data are only realized when they guide us to make better decisions. Statistical decision theory is a theoretical framework created for the purpose of answering this question, and in this thesis, we adopt this viewpoint to study the problem of pattern recognition and classification. In particular, we focus on understanding the performance of neural network-based learning for classification problems.

 

In Chapter 2, we provide the necessary technical preparations for the statement and proofs of the results in the rest of the thesis along with a discussion of related works. We conclude with a discussion of the relationship between regression and classification.

 

In Chapter 3, we show that random classifiers based on finitely wide and deep neural networks are  consistent for a very general class of distributions. Consistency is a highly desirable property for a sequence of classifiers that guarantees that the classification risk converges to the smallest possible risk. This result improves the current literature by extending the known consistency property of shallow, underparametrized neural networks with sigmoid activations to wide and deep ReLU neural networks without complexity constraints.

 

In Chapter 4, we give several convergence rate guarantees of the excess classification risk for a semiparametric model of distributions indexed by Borel probability measures and regression functions belonging to L2 class of functions with finite Kolmogorov-Donoho optimal exponents. Furthermore, we give explicit characterizations of distributional regimes in which neural network classifiers are minimax optimal.

 

In Chapter 5, we show that for a semiparametric model of distributions characterized by regression functions that locally belong to the Barron approximation space, neural network classifiers achieve a minimax optimal rate of convergence up to a logarithmic factor.