COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Frontiers in Artificial Intelligence Series > Neural Code Comprehension: A Learnable Representation of Code Semantics
Neural Code Comprehension: A Learnable Representation of Code SemanticsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins. Please note, this event may be recorded. Microsoft will own the copyright of any recording and reserves the right to distribute it as required. In the era of “Big Code”, research is being conducted into automating the understanding of computer programs. Most of the current works base on techniques from Natural Language Processing and Deep Learning, which have been successful recently, attempting to process the code directly or using syntactic representations (e.g., ASTs and AST paths). However, to comprehend program semantics robustly, structural features of code have to be taken into account as well, including function calls, branching, and interchangeable order of statements. In this talk, I will present a novel processing technique to use Machine Learning for code semantics, and show how it applies to a variety of program analysis tasks. In particular, we stipulate that a robust distributional hypothesis of code applies to both human- and machine-generated programs. Following this hypothesis, we define an embedding space, inst2vec, based on an Intermediate Representation (IR) of the code that is independent of the source programming language. We provide a novel definition of contextual flow for this IR, leveraging both the underlying data- and control-flow of the program. We then analyze the embeddings quantitatively using analogies and clustering, and evaluate the learned representation on three different high-level tasks. We show that even without fine-tuning, a single Recurrent Neural Network (RNN) architecture and fixed inst2vec embeddings outperform specialized approaches for performance prediction (compute device mapping, optimal thread coarsening); and algorithm classification from raw code (104 classes), where we set a new state-of-the-art. This talk is part of the Frontiers in Artificial Intelligence Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsPeterhouse Graduate Seminars Cambridge Judge Business School Statistical Laboratory info aggregatorOther talksFrom Sinners to Saints: How Redemption Narratives Motivate Prosocial Consumer Behaviors Emergence and Control in UK Energy Democratisation Probabilistic Reliability Management for Electric Power Systems Operation Making difference: queer activism and anthropological theory Art speak Livestock Associated Methicillin Resistant Staphylococcus aureus (LA MRSA ST398) in Relation to Pig Farming in the UK |