Neural Code Comprehension: A Learnable Representation of Code Semantics
- 👤 Speaker: Tal Ben-Nun, ETH Zurich
- 📅 Date & Time: Thursday 28 February 2019, 13:00 - 14:00
- 📍 Venue: Auditorium, Microsoft Research Ltd, 21 Station Road, Cambridge, CB1 2FB
Abstract
In the era of “Big Code”, research is being conducted into automating the understanding of computer programs. Most of the current works base on techniques from Natural Language Processing and Deep Learning, which have been successful recently, attempting to process the code directly or using syntactic representations (e.g., ASTs and AST paths). However, to comprehend program semantics robustly, structural features of code have to be taken into account as well, including function calls, branching, and interchangeable order of statements. In this talk, I will present a novel processing technique to use Machine Learning for code semantics, and show how it applies to a variety of program analysis tasks. In particular, we stipulate that a robust distributional hypothesis of code applies to both human- and machine-generated programs. Following this hypothesis, we define an embedding space, inst2vec, based on an Intermediate Representation (IR) of the code that is independent of the source programming language. We provide a novel definition of contextual flow for this IR, leveraging both the underlying data- and control-flow of the program. We then analyze the embeddings quantitatively using analogies and clustering, and evaluate the learned representation on three different high-level tasks. We show that even without fine-tuning, a single Recurrent Neural Network (RNN) architecture and fixed inst2vec embeddings outperform specialized approaches for performance prediction (compute device mapping, optimal thread coarsening); and algorithm classification from raw code (104 classes), where we set a new state-of-the-art.
Series This talk is part of the Frontiers in Artificial Intelligence Series series.
Included in Lists
- All Talks (aka the CURE list)
- Auditorium, Microsoft Research Ltd, 21 Station Road, Cambridge, CB1 2FB
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Datalog for Program Analysis: Beyond the Free Lunch
- Frontiers in Artificial Intelligence Series
- Guy Emerson's list
- Interested Talks
- Microsoft Research Cambridge, public talks
- ndk22's list
- ob366-ai4er
- Optics for the Cloud
- personal list
- PMRFPS's
- rp587
- School of Technology
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Tal Ben-Nun, ETH Zurich
Thursday 28 February 2019, 13:00-14:00