Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Neural Code Comprehension: A Learnable Representation of Code Semantics

Add to your list(s) Download to your calendar using vCal

Tal Ben-Nun, ETH Zurich
Thursday 28 February 2019, 13:00-14:00
Auditorium, Microsoft Research Ltd, 21 Station Road, Cambridge, CB1 2FB.

If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.

Please note, this event may be recorded. Microsoft will own the copyright of any recording and reserves the right to distribute it as required.

In the era of “Big Code”, research is being conducted into automating the understanding of computer programs. Most of the current works base on techniques from Natural Language Processing and Deep Learning, which have been successful recently, attempting to process the code directly or using syntactic representations (e.g., ASTs and AST paths). However, to comprehend program semantics robustly, structural features of code have to be taken into account as well, including function calls, branching, and interchangeable order of statements. In this talk, I will present a novel processing technique to use Machine Learning for code semantics, and show how it applies to a variety of program analysis tasks. In particular, we stipulate that a robust distributional hypothesis of code applies to both human- and machine-generated programs. Following this hypothesis, we define an embedding space, inst2vec, based on an Intermediate Representation (IR) of the code that is independent of the source programming language. We provide a novel definition of contextual flow for this IR, leveraging both the underlying data- and control-flow of the program. We then analyze the embeddings quantitatively using analogies and clustering, and evaluate the learned representation on three different high-level tasks. We show that even without fine-tuning, a single Recurrent Neural Network (RNN) architecture and fixed inst2vec embeddings outperform specialized approaches for performance prediction (compute device mapping, optimal thread coarsening); and algorithm classification from raw code (104 classes), where we set a new state-of-the-art.

This talk is part of the Frontiers in Artificial Intelligence Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Neural Code Comprehension: A Learnable Representation of Code Semantics

This talk is included in these lists:

Other lists

Other talks