Title: Leveraging Machine Learning for Enhancing Program Performance and Programmer Productivity
Date: Monday, June 26, 2023
Time: 2:00pm - 4:00pm ET
Location: Klaus 3402 and Virtual (https://gatech.zoom.us/j/94129272487)
Fangke Ye
Ph.D. Student
School of Computer Science
Georgia Institute of Technology
Committee:
Dr. Vivek Sarkar (Advisor) – School of Computer Science, Georgia Institute of Technology
Dr. Jisheng Zhao – School of Computer Science, Georgia Institute of Technology
Dr. Santosh Pande – School of Computer Science, Georgia Institute of Technology
Dr. Qirun Zhang – School of Computer Science, Georgia Institute of Technology
Abstract:
As hardware performance continues to improve with the increase of hardware complexity and diversification, software struggles to keep up and fully realize these performance gains. Only a handful of expert programmers can harness the full potential of modern hardware using hardware-exposed low-level programming primitives. Meanwhile, the widespread adoption of high-level dynamically-typed programming languages like Python and JavaScript provides high productivity but suffers from low performance due to the lack of static type information necessary for compiler optimizations. Therefore, it becomes increasingly difficult to enable the development of high performance programs capable of utilizing the potential performance provided by evolving hardware while maintaining high programmer productivity for the mass developers.
This thesis proposes the use of machine learning to simultaneously enhance both programming productivity and program performance. First, we show how a graph-based deep learning type inference method can be used to infer types in JavaScript to help productivity and performance; our approach employs multiple graph neural network models and a novel type flow graph representation to infer types in dynamically-typed languages without manual annotations. Then, we demonstrate a new approach to concrete type inference for Python programs, enabling ahead-of-time code optimization for dynamically-typed languages by combining machine learning and SMT solving without requiring programmers to provide any type annotation. Finally, we present a neural network based system that can compute code-semantics similarity in C/C++ code, with the goal of identifying semantically equivalent high-performance code for a given low-performance input code; this approach incorporates a context-aware semantics structure and an extensible neural code similarity scoring algorithm.