Profile Picture

Valerio Pepe

I am an undergraduate student at Harvard University in the class of 2026, studying towards a concurrent BA/MS in Computer Science and Mind, Brain, and Behavior, and advised by Prof. Stuart Shieber.

I'm broadly passionate about the interface between language and thought, with a focus on the inductive biases that favor language learning and reasoning in both humans and machines. The overarching goal of my research is to develop machines that acquire language and perform reasoning with the compositional flexibility and sample efficiency characteristic of humans.

I'm also interested in AI safety from both a technical (interpretability, alignment, control) and a political (preparedness, economic impacts, international collaboration) perspective, and work with the AI Safety Student Team at Harvard towards these goals.

I am currently affiliated with the Computational Cognitive Science Group at MIT Brain & Cognitive Sciences and the Language and Intelligence Group at the MIT Computer Science and AI Lab, where I am fortunate to be supervised by Gabriel Grand, Prof. Joshua Tenenbaum, and Prof. Jacob Andreas.

Previously, I have worked in the George Church Lab at the Wyss Institute at Harvard, the Sabatini Lab at Harvard Medical School, and the Working on Language in the Field (WOLF) Lab at the Harvard Department of Linguistics.

Publications

A Llama Sunk My Battleship! Asking Rational Questions with LLMs via Bayesian Inference
Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum
NeurIPS'24, The First Workshop on System-2 Reasoning at Scale (2024)
Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling
Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum
Proceedings of the Annual Meeting of the Cognitive Science Society, 46. (2024)
SeqVerify: An accessible analysis tool for cell line genomic integrity, contamination, and gene editing outcomes
Merrick Pierson Smela*, Valerio Pepe*, Steven Lubbe, Evangelos Kiskinis, George M. Church
Stem Cell Reports. (2024)

* indicates equal contributions by multiple authors.

Projects

🏎️

AdaSPEED: Adaptive Self-Speculative Early Exit Decoding

Developed a method to dynamically select the number of tokens to be generated by a speculative decoding system, speeding up inference by up to 40%.

Speculative Decoding BranchyNet Efficient Machine Learning
🔁

Controllable Benchmarks for LLM Unlearning

Developed methods and benchmarks for controlling the similarity of the retain and unlearn set in machine unlearning, suggesting that the focus of further unlearning research should be on better datasets for unlearning, not only better algorithms.

Machine Unlearning Natural Langugage Processing LLM Benchmarking
⚖️

Individual Fairness for Image Classification

Developed a method to use individual fairness to impose inference-time constraints on image-classifying neural networks, guaranteeing better adversarial robustness with no retraining, finetuning, or additional inference costs.

Algorithmic Fairness Adversarial Robustness FGSM
🛡️

Adversarial Robustness in Self-Explaining Neural Networks

Explored which types of noise fool Self-Explaining Neural Networks, and what types of defenses could be implemented in order to mitigate their weaknesses.

Explainability Adversarial Robustness SENNs
🗣️

Historical Linguistics-Informed Distinctive Feature Theory

Trained a sparse autoencoder on a graph representation of historical sound shifts in order to derive a historically-grounded distinctive feature set for phonology.

Computational Linguistics Interpretability Graph Representation Learning

Social Media / Contact