Shiyi Cao

Shiyi Cao 曹诗怡

I am a third-year Ph.D. student at UC Berkeley EECS, advised by Ion Stoica and Joseph Gonzalez, affiliated with Sky Computing Lab and BAIR.

Previously, I was fortunate to be working with Zhihao Jia at CMU on accelerating distributed training. I obtained M.S. in Computer Science at ETH, working at SPCL with Torsten Hoefler. Prior to joining grad school, I had a great time at Shanghai Jiao Tong University where I earned a bachelor's degree in Computer Science.

I am mainly interested in automated optimization of ML computations on large-scale heterogeneous hardwares.

01 News

Updates & highlights.

Feb '26
Excited to release K-Search for automated GPU kernel generation!
Oct '25
Excited to receive Amazon AI Fellowship!
Aug '25
Two papers (Sky-T1 and S*) are accepted as EMNLP '25 Findings!
May '25
Excited to announce the release of SkyRL, an RL framework for training Real-World Long-Horizon Agents!
Oct '24
Two papers (MoE-Lightning and GraphPipe) are accepted at ASPLOS '25!
Oct '23
Released S-LoRA, a scalable system for serving thousands of LoRA adapters concurrently!
Aug '23
Graduated from ETH and joined UC Berkeley EECS!

02 Publications

Projects I led or co-led.

K-Search framework overview: search tree with action selection, local refinement loop, and LLM-driven world model updates

K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model

Shiyi Cao, Ziming Mao, Joseph E. Gonzalez, Ion Stoica

arXiv preprint · 2026

GPU kernels · LLMs · World Model · Evolutionary search

Sky-T1 teaser: blue bird on a branch (NovaSky)

Sky-T1: Train your own O1 preview model within $450

Dacheng Li*, Shiyi Cao*, Tyler Griggs, Shu Liu, Xiangxi Mo, Eric Tang, Sumanth Hegde, Kourosh Hakhamaneshi, Shishir G Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

Findings of EMNLP 2025

Long chain-of-thought · SFT & LoRA · Sky-T1 · Open reasoning models

S* test-time scaling: LiveCodeBench performance baseline vs improvement across models

S*: Test-Time Scaling for Code Generation

Dacheng Li*, Shiyi Cao*, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica

Findings of EMNLP 2025

Test-time scaling · Code generation · Hybrid selection · LiveCodeBench

MoE-Lightning figure

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E Gonzalez, Matei Zaharia, Ion Stoica

ASPLOS 2025

Mixture of Experts · LLM batch inference · CPU offloading

GraphPipe figure

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Byungsoo Jeon*, Mengdi Wu*, Shiyi Cao*, Sunghyun Kim*, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

ASPLOS 2025

Distributed training · Pipeline parallelism

Fairness in serving LLMs figure

Fairness in Serving Large Language Models

Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E Gonzalez, Ion Stoica

OSDI 2024

LLM serving · Fair scheduling

S-LoRA figure

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Ying Sheng*, Shiyi Cao*, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

MLSys 2024

LLM inference · LoRA · Adapters · Memory management

In-network compute figure

Accelerating Data Serialization/Deserialization Protocols with In-Network Compute

Shiyi Cao, Salvatore Di Girolamo, Torsten Hoefler

Workshop on Exascale MPI, ExaMPI@SC 2022

SmartNICs · In-network compute · Data (de)serialization

AdaM figure

AdaM: An Adaptive Fine-Grained Scheme for Distributed Metadata Management

Shiyi Cao, Yuanning Gao, Xiaofeng Gao, Guihai Chen

International Conference on Parallel Processing (ICPP) 2019

Distributed systems · Metadata management · Reinforcement learning

03 Open-source Highlights

Open frameworks and models.

SGLang

Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue (Livia) Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

NeurIPS 2024

Structured generation and fast serving for large language models.

SkyRL

Shiyi Cao*, Sumanth Hegde*, Dacheng Li*, Tyler Griggs*, Shu Liu*, Eric Tang*, Jiayi Pan, Xingyao Wang, Akshay Malik, Kourosh Hakhamaneshi, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

AgentX Competition 2025 · 1st Place (Research Track)

SkyRL-Agent-v0: the first open-source models trained with multi-turn RL on long-horizon SWE-Bench tasks.

04 Talks

Invited talks & visits.