NSF PPoSS Large Award
The HARP lab has won a Large (~$5M) award from the National Science Foundation (NSF) under the Principles and Practice of Scalable Systems (PPoSS) program. (This award follows a small PPoSS planning grant we were awarded in 2022.) In this project, we will develop full-stack implementation approaches for next-generation AI-based programming languages that perform deduction (chain-forward reasoning) at massive scale on a variety of heterogeneous backends (GPUs, supercomputers, cloud-based clusters, etc). A long-standing dream of computer science is to enable high-level, declarative programming where a user is enabled to simply specify the problem to be solved while the work of materializing a solution, and optimizing this work on the available hardware, is done automatically. Our work will develop the necessary techniques to enable this vision for specific types of reasoning and a specific ensamble of target reasoning tasks, including genomic analysis, smart querying of medical literature, neuro-symbolic AI with stochastic reasoning, graph analytics, security analysis, and general program analysis and software verification tasks. Scaling up these complex reasoning tasks will involve enabling application-level insights to be structured naturally via a language-based approach with both semantic- and implementation-level innovations, that sit atop a stack of high-performance-computing techniques designed to implement this high-level behavior in ways disparately optimized for modern heterogenous systems.
We will be collaborating with researchers at Ohio State University, Washington State University, the University of Texas, Syracuse University, the University of Illinois at Chicago, and at UABs Precision Medicine Institute and investigating five principle applications for evaluation of our full-stack approach. The concept tree above illustrates our proposal, showing our language-based approach as the trunk of the tree, developing a structured Datalog with logical constraints and probablistic reasoning, the underlying techniques as roots, and the application foci as branches. This NSF Large award will support around ten PhD students and seven faculty, for five years, working to tackle the implementation of complex deductive-analytic workloads (precision medicine, program analysis, monotonic aggregation, probabilistic reasoning) at unprecedented scale. We’re currently looking for students!
Of particular interest to me, given my broader research program, is our investigation of techniques for using structured data and data-parallel deduction in a chain-forward reasoning system. My original motivation in developing our prototype Structured Datalog system ("Slog") was to enable high-level and scalable reasoning about code for the purposes of verification and auditing of complex software systems. This technology uniquely permits us to turn high-level and understandable specifications of programming-language behavior (i.e., a rules-based semantics) into massively parallel simulations of code that can be run at scale (i.e., program analyses), including on large flexible resources such as cloud systems Amazon EC2, Google Cloud, or Microsoft Azure, and on supercomputers such as UAB Cheaha and even larger US Department of Energy machines, such as ALCF's Theta.