Title: Robust Learning Frameworks and Algorithms for Scalable Data Systems
Date: Friday, April 21, 2023
Time: 13:00 - 15:00 (EST)
Zoom: https://gatech.zoom.us/j/4596258486
Ka Ho Chow
PhD Candidate
School of Computer Science
College of Computing
Georgia Institute of Technology
Committee
=========
Dr. Ling Liu (Advisor, School of Computer Science, Georgia Institute of Technology)
Dr. Margaret Loper (Georgia Tech Research Institute)
Dr. Shamkant Navathe (School of Computer Science, Georgia Institute of Technology)
Dr. Calton Pu (School of Computer Science, Georgia Institute of Technology)
Dr. Lakshmish Ramaswamy (Department of Computer Science, University of Georgia)
Abstract
========
The data explosion and advances in machine learning have transformed modern cognitive computing systems. While blossomed in business, science, and engineering, deep learning is known to be vulnerable to adversarial manipulation and can be used as a tool for privacy intrusion. This dissertation research is dedicated to advancing robust learning algorithms and scalable frameworks for next-generation trustworthy and responsible intelligent data systems.
The first contribution is to develop risk assessment frameworks for in-depth investigation of security threats in deep learning-driven visual recognition systems, including both the vulnerability during the model inference phase and the distributed model training phase. We identify potential risks unique to object detection systems arising from their multi-task learning nature and introduce TOG, a suite of optimization algorithms generating deceptive queries to fool well-trained object detection models. It targets different loss functions in object recognition to deceive the victim model into misbehaving randomly or purposefully with domain knowledge-driven semantics. Similarly, we take a holistic approach to understanding the data poisoning vulnerability in distributed model training. We introduce perception poisoning to mislead the learning process of the global object detection model in federated learning by selectively poisoning various combinations of objectness, bounding boxes, and class labels. Our innovations offer practitioners comprehensive frameworks for risk management and researchers to identify root causes for designing mitigation strategies.
The second contribution is to develop risk mitigation frameworks for building reliable systems with robustness guarantees against adversarial manipulation and for enabling privacy-preserving photo sharing against unauthorized face recognition. Deceptive queries at the model inference phase can be detrimental to the integrity of numerous existing intelligent systems. They can be transferred across different models to launch black-box attacks. To circumvent such a severe threat, we present a diversity-driven model fusion framework, ODEN, for robust object detection. It employs a team of models carefully constructed by our optimization algorithms and focal diversity methodology to conduct robust fusion through a three-stage technique. ODEN effectively mitigates TOG and other state-of-the-art attacks and enhances accuracy in the benign scenario. For the perception poisoning threat during the distributed training phase, only a small population of clients is present, and malicious clients can contribute gradients inconsistently to obfuscate their identity. We introduce a new poisoning-resilient federated learning framework, STDLens, with a spatial-temporal forensic methodology with robust statistics to perform timely identification and removal of malicious clients. Even under various adaptive attacks, the STDLens-protected system has no observable performance degradation. Apart from the security threats, we develop defense mechanisms against unauthorized entities scraping personal data online and conducting privacy-intrusive learning. Governments, private companies, or even individuals can scrape the web, collect facial images, and build a face database to fuel a face recognition system to identify human faces without their consent. We introduce methodologies for people to remove their facial signatures from photos before sharing them online. Even though the signature-removed photos look similar to their unprotected counterparts, privacy intruders cannot infer meaningful information from them for face recognition.
The third contribution of this dissertation research is to develop machine learning-enhanced algorithms to strengthen the reliability and scalability of microservice applications in hybrid clouds. Cyberattacks such as ransomware have been on the rise, and rapid recovery from such attacks with minimal data loss is crucial for business continuity. We introduce an algorithm, DeepRest, to estimate how many resources are expected to serve the application traffic received from its end users. It enables the verification of resource usage by comparing the expected consumption with the actual measurement from the microservice application without any assumption on workload periodicity. Any statistically unjustifiable resource usage can be identified as a potential threat. Our extensive studies confirm the effective detection of representative ransomware and crypto-jacking attacks. As a solution with dual purposes, DeepRest is the first to support resource estimation for unseen application traffic in the future (e.g., ten times more users purchasing products due to a special sale). While it enables precise scaling in advance, the expected resource usage can exceed the capacity of the current private computing infrastructure. We further propose an application-aware hybrid cloud migration planner to span the microservice application across private and public clouds to enjoy virtually unlimited resources while remaining cost-effective and performance-optimized with the least disruption to the regular operation of the application during the migration process.