Funded by ERC grant BIGCODE - #680358

Startups

DeepCode
DeepCode offers the first AI-based code review system

Statistical Engines

JSNice
JSNice de-obfuscates JavaScript programs. JSNice is a popular system in the JavaScript commmunity used by tens of thousands of programmers, worldwide
Nice2Predict
Efficient and scalable open-source framework for structured prediction, enabling one to build new statistical engines more quickly.
DeGuard
Based on Nice2Predict, DeGuard reverses the process of layout obfuscation done by Android obfuscation systems. It enables security analyses, including code inspection and predicting libraries.
DEBIN
Based on Nice2Predict, DEBIN recovers debug information (e.g., names and types) of stripped binaries, helpful for various analysis tasks like decompilation, malware inspection and similarity.

Datasets and Models

150k Python Dataset
Dataset consisting of 150'000 Python ASTs
150k JavaScript Dataset
Dataset consisting of 150'000 JavaScript files and their parsed ASTs
Probablistic models
Sythesized programs for probabilistic models (on the above datasets)
JSNice artifact
JSNice artifact that contains an engine, trained model and evaluation dataset
JSNice dataset
List of GitHub repositories used to train JSNice on

Publications

2023

Large Language Models for Code: Security Hardening and Adversarial Testing
Jingxuan He, Martin Vechev
ACM CCS 2023 Distinguished Paper Award

2022

On Distribution Shift in Learning-based Bug Detectors
Jingxuan He, Luca Beurer-Kellner, Martin Vechev
ICML 2022

2021

Learning to Explore Paths for Symbolic Execution
Jingxuan He, Gishor Sivanrupan, Petar Tsankov, Martin Vechev
ACM CCS 2021
TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer
Berkay Berabi, Jingxuan He, Veselin Raychev, Martin Vechev
ICML 2021
Learning to Find Naming Issues with Big Code and Small Supervision
Jingxuan He, Cheng-Chun Lee, Veselin Raychev, Martin Vechev
PLDI 2021
Robustness Certification with Generative Models
Matthew Mirman, Alexander Hägele, Timon Gehr, Pavol Bielik, Martin Vechev
PLDI 2021

2020

Learning Fast and Precise Numerical Analysis
Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev
PLDI 2020
Guiding Program Synthesis by Learning to Generate Examples
Larissa Laich, Pavol Bielik, Martin Vechev
ICLR 2020
Adversarial Robustness for Code
Pavol Bielik, Martin Vechev
ACM ICML 2020

2019

Learning to Infer User Interface Attributes from Images
Philippe Schlattner, Pavol Bielik, Martin Vechev
ArXiv 2019
Learning to Fuzz from Symbolic Execution with Application to Smart Contracts
Jingxuan He, Mislav Balunović, Nodar Ambroladze, Petar Tsankov, Martin Vechev
ACM CCS 2019
Unsupervised Learning of API Aliasing Specifications
Jan Eberhardt, Samuel Steffen, Veselin Raychev, Martin Vechev
PLDI 2019
Scalable Taint Specification Inference with Big Code
Victor Chibotaru, Benjamin Bichsel, Veselin Raychev, Martin Vechev
PLDI 2019

2018

Robust Relational Layouts Synthesis from Examples for Android
Pavol Bielik, Marc Fischer, Martin Vechev
ACM OOPSLA 2018
DEBIN: Predicting Debug Information in Stripped Binaries
Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, Martin Vechev
ACM CCS 2018
Inferring Crypto API Rules from Code Changes
Rumen Paletov, Petar Tsankov, Veselin Raychev, Martin Vechev
PLDI 2018

2017

Learning a Static Analyzer from Data
Pavol Bielik, Veselin Raychev, Martin Vechev
CAV 2017
Program Synthesis for Character Level Language Modeling
Pavol Bielik, Veselin Raychev, Martin Vechev
ICLR 2017

2016

Probabilistic Model for Code with Decision Trees
Veselin Raychev, Pavol Bielik, Martin Vechev
ACM OOPSLA 2016
Statistical Deobfuscation of Android Applications
Benjamin Bichsel, Veselin Raychev, Peter Tsankov, Martin Vechev
ACM CCS 2016

2015

Predicting Program Properties from "Big Code"
Veselin Raychev, Martin Vechev, Andreas Krause
ACM POPL 2015
Programming with Big Code: Lessons, Techniques and Applications
Pavol Bielik, Veselin Raychev, Martin Vechev
SNAPL 2015

2014

Code Completion with Statistical Language Models
Veselin Raychev, Martin Vechev, Eran Yahav
ACM PLDI 2014
Phrase-Based Statistical Translation of Programming Languages
Svetoslav Karaivanov, Veselin Raychev, Martin Vechev
Onward 2014

Talks

Learning to Analyze Programs at Scale
Machine Learning for Programming Workshop, FLOC 2018
Learning a static analyzer from data
Computer Aided Verification 2017
Programming Languages and Machine Learning
Neural Abstract Machines & Program Induction (NIPS'16 workshop)
Machine Learning for Programming
Invited Talk at ML4PL'15
Machine Learning for Programming
Invited Talk at MIT ExCAPE'15 Summer School
Machine Learning for Programming
Invited Talk at TCE'15 Conference
Programming Tools based on Big Data and Conditional Random Fields
Zurich Machine Learning and Data Science Meet-up
Code Completion with Statistical Language Models
Talk given at University of Washington and Microsoft Research (by V. Raychev) and EPFL and ETH (by Martin Vechev)

Resources