Buu Phan

I am a PhD student at the University of Toronto, where my current research focuses on information theory and statistical machine learning. My advisor is Prof. Ashish Khisti. I recently spent my Summer 2024 as a Research Scientist Intern at FAIR Lab/META with Dr. Karen Ullrich, working on tokenization (or string theory!?!?) and LLMs.

Previously, I was an AI Research Engineer at LG Electronics, working on applied ML for automated checkout and situated reasoning. Before that, I was a computer vision researcher at Algolux, where I worked with Felix Heide and Fahim Mannan on Adversarial Robustness and NLOS Imaging for self-driving cars.

I obtained my Master’s Degree from the University of Waterloo, where I was advised by Prof. Krzysztof Czarnecki and Dr. Rick Salay. My thesis was about Bayesian deep learning, distillation and computer vision. I also spent some time working on anomaly detection and requirement engineering. I completed my BEng in Electrical Engineering at International University-VNU, advised by Prof. Huu Tue Huynh.

Email / CV / Google Scholar / Twitter / Github

Research

I'm interested in probability, optimization and deep learning. Currently, I am interested in topics intersection between information theory, sampling and deep learning. My long-term research goal is develop reliable AI.

	Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles Buu Phan, Brandon Amos, Itai Gat, Marton Havasi, Matthew Muckley, Karen Ullrich International Conference on Learning Representations 2025 (Singapore), preliminary version accepted at ICML 2024 Workshop on Theoretical Foundations of Foundation Models , 2024 Code We provide an algorithmic process to convert any tokenized LLMs to token-free (byte-level) LLMs. This naturally solves the odd generation problem in LLM when the prompt ends with mid-token or white space. We also demonstrate the application in model ensemble where each LLM has a distinct tokenization.
	Importance Matching Lemma for Lossy Compression with Side Information Buu Phan, Ashish Khisti, Christos Louizos, International Conference on Artificial Intelligence and Statistics (AISTATS) , 2024 Code We provide a channel simulation method based on Poisson Matching Lemma (Li and Anatharam 2021) for a fixed number of proposals. We also demonstrated its application in distributed compression and vertical federated learning setup, focusing on the scenario where the correlated side information is available only at the decoder.
	On the Choice of Perception Loss Function for Learned Video Compression Sadaf Salehkalaibar, Buu Phan, Jun Chen, Wei Yu, Ashish Khisti, Conference on Neural Information Processing Systems , 2023 We provide a information-theoretic analysis on perceptual video compression. Surprisingly, our results show that preserving temporal consistency comes with a cost of higher distortion!
	Rate-Distortion-Perception Tradeoff Based on the Conditional Perception Measure Sadaf Salehkalaibar, Buu Phan, Ashish Khisti, Wei Yu, Biennial Symposium on Communications (BSC) , 2023 We provide a information-theoretic analysis on the conditional perceptual metric for image compression.
	Adversarial imaging pipelines Buu Phan, Felix Heide, Fahim Mannan CVPR, 2021 project page / Paper / video / Contact me to reproduce the results on your camera. We proposed a new kind of adversarial attack by targeting the ISP and camera optics. Our method produces real-world adversarial scenes without relying on image augmentation.
	Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar Nicolas Scheiner, Florian Kraus, Fangyin Wei, Buu Phan, Fahim Mannan, Nils Appenrodt, Werner Ritter, Jürgen Dickmann, Klaus Dietmayer, Felix Heide, CVPR, 2020 project page / Paper / video / code / We show how to scale up NLOS imaging technique in real-world using radar. The method can reliably detect and track NLOS pedestrian and cyclist.
	Detecting Out-of-Distribution Inputs in Deep Neural Networks Using an Early-Layer Output Vahdat Abdelzad, Krzysztof Czarnecki, Rick Salay Taylor Denounden, Sachin Vernekar, Buu Phan Arxiv, 2019 Paper / code / We show that OOD images can be detected by applying the one-class SVM on early-layer's embeddings. The results outperforms several baseline approaches on MNIST, CIFAR and Tiny-Imagenet.
	Bayesian uncertainty quantification with synthetic data Buu Phan, Samin Khan, Rick Salay, Krzysztof Czarnecki, (Best paper) Workshop on Artificial Intelligence Safety Engineering - SAFECOMP , 2019 Paper We emperically show that Bayesian deep networks captures some notions of uncertainty that are similar to human's view of uncertainty. Contrary to expectation, we also find that the estimated aleatoric uncertainty from Bayesian deep models can be reduced with more training data.
	ProcSy: Procedural Synthetic Dataset Generation Towards Influence Factor Studies Of Semantic Segmentation Networks Samin Khan, Buu Phan, Rick Salay, Krzysztof Czarnecki, CVPR Workshop , 2019 Paper/ Dataset We provide a tool for synthetic self-driving dataset generation. Our tool can simulate adverse environmental condition such as rain and cloud. We also provide some analysis regarding distribution shift on this dataset.
	Analysis of confident-classifiers for out-of-distribution detection Sachin Vernekar, Ashish Gaurav, Taylor Denouden, Buu Phan, Vahdat Abdelzad, Rick Salay, Krzysztof Czarnecki, Safe ML Workshop, ICLR , 2019 Paper/ Code We show some theoretical disadvantages of using classifier for out of distribution detection. We show that training a classifier for "unseen" class can leverage the issue.
	Improving reconstruction autoencoder out-of-distribution detection with mahalanobis distance Taylor Denouden, Rick Salay, Krzysztof Czarnecki, Vahdat Abdelzad, Buu Phan, Sachin Vernekar. Arxiv , 2018 Paper/ Reconstruction autoencoder was one of the go-to approach for out-of-distribution detection. The idea is to use the reconstruction error for OOD score. Nevertheless, in many cases, we observe that OOD images are perfectly reconstructed. This paper demystifies the above phenomenon from the view of PCA.
	Calibrating Uncertainties in Object Localization Task Buu Phan, Rick Salay, Krzysztof Czarnecki, Vahdat Abdelzad, Taylor Denouden, Sachin Vernekar. Neurips Bayesian Deep Learning Workshop , 2018 Paper/ We observe that Bayesian Deep Networks do not provide well-calibrated regression uncertainty. In this paper, we address this problem for the single-object localization task by adapting an existing technique for calibrating regression models.
	An automated vehicle safety concept based on runtime restriction of the operational design domain Ian Colwell, Buu Phan, Shahwar Saleem, Rick Salay, Krzysztof Czarnecki. IEEE Intelligent Vehicles Symposium , 2018 Paper/ We proposes an approach and architectural design for achieving maximum functionality in the case of system failures.

Source code credit to Dr. Jon Barron