About me
Hi, I am an M.S. student studying computer science at Carnegie Mellon University (CMU). Before that, I graduated from The University of Hong Kong (HKU). My interest lies in Vision-language model, with a particular focus on computer-use agent (CUA) and multimodal GUI grounding. Previously, I was fortunate to work with Prof. Tao Yu at XLANG lab on CUA and interned as a machine learning engineer at Apple China.
βοΈI am currently seeking 2026 Summer SWE/MLE Intern in the United States, and potential lab collaboration in 2026 Spring. Feel free to reach out to me if your team needs interns or simply a chatπ
Publications
-
OpenCUA: Open Foundations for Computer-Use Agents
NeurIPS 2025 Spotlight (3%) & COLM 2025 Workshop Best Paper
Open Source Projects
-
SGLang RL - Slime Contributor
Implemented a VLM multi-turn rollout framework PR #1141 with a custom rollout generation function supporting 'assistant generation β environment feedback β context appending' iteration. Designed a pluggable interaction environment interface and optimized token-level training control with proper loss masking, achieving stable policy convergence on Geo3k dataset with Qwen3-VL-2B-Instruct.
Experience
-
XLANG Lab
Research Assistant
Working on computer-use agents, GUI grounding, etc.
-
Apple Inc.
Machine Learning Engineer Intern
Working on RAG-based Q&A system.
-
UC Berkeley CDSS
Research Assistant
Working on natural language processing and digital humanities.
-
Huawei Technology
Software Engineering Intern
Back-end development