Title:  Learning Representations for Sensor Based Human Activity Recognition for Challenging Application Scenarios

Committee: 

Dr. Ploetz, Advisor        

Dr. Essa, Co-Advisor

Dr. Inan, Chair

Dr. Starner

Dr. Lane

Abstract: The objective of the proposed research is to develop self-supervised techniques for learning representations of wearable sensor data to recognize human activities, and to demonstrate their effectiveness on a diverse set of downstream applications. Driven by the ubiquitous nature of sensors such as accelerometers and gyroscopes on commodity wearables including smartwatches and smartphones, human activity recognition is at the forefront of  diverse applications such as health monitoring and fitness tracking. The conventional approach towards developing these systems involved utilizing statistical descriptors of sensor data as the features, in conjunction with classifiers. The conventional approach involved utilizing statistical descriptors of sensor data as the features, in conjunction with classifiers. The advent of deep learning combined the feature extraction and classification, while delivering state-of-the-art performance on small-scale datasets, typically collected in laboratory conditions. However, the limited scale and diversity of the datasets constrains the application of complex architectures, due to their reliance on large quantities of labeled data. In order to tackle this `sparse labeled dataset' problem, this dissertation proposes to apply the pretrain-then-finetune paradigm to wearable sensing, where large-scale unlabeled data is first utilized to learn representations that are subsequently tuned to specific activities under study. I first show how unsupervised learning using Autoencoders can be more effective than end-to-end training, when considering factors important for wearable sensing, thereby establishing the usefulness of leveraging unlabeled data. Aiming to improve the representation learning further, I introduce two self-supervised pretext tasks that are effective feature learners, along with an assessment of the state-of-the-field using a large-scale empirical study. Further, I propose that learning representations that contain the structure inherent to human movements, i.e., the ability to break down complex movements into constituent parts, is beneficial for recognizing activities as well as a deeper analysis of the underlying movements.  At its core, this dissertation demonstrates the promise of the pretrain-then-finetune paradigm, enunciating the resulting advantages and shortcomings over utilizing traditional features and supervised deep learning.