Shiyi Cao

Shiyi Cao 曹诗怡

I am a third-year Ph.D. student at UC Berkeley EECS, advised by Ion Stoica and Joseph Gonzalez, affiliated with Sky Computing Lab and BAIR.

Previously, I was fortunate to be working with Zhihao Jia at CMU on accelerating distributed training. I obtained M.S. in Computer Science at ETH, working at SPCL with Torsten Hoefler. Prior to joining grad school, I had a great time at Shanghai Jiao Tong University where I earned a bachelor's degree in Computer Science.

I am mainly interested in automated optimization of ML computations on large-scale heterogeneous hardwares.

01 News

Updates & highlights.

Feb '26
Excited to release K-Search for automated GPU kernel generation!
Oct '25
Excited to receive Amazon AI Fellowship!
Aug '25
Two papers (Sky-T1 and S*) are accepted as EMNLP '25 Findings!
May '25
Excited to announce the release of SkyRL, an RL framework for training Real-World Long-Horizon Agents!
Oct '24
Two papers (MoE-Lightning and GraphPipe) are accepted at ASPLOS '25!
Oct '23
Released S-LoRA, a scalable system for serving thousands of LoRA adapters concurrently!
Aug '23
Graduated from ETH and joined UC Berkeley EECS!

02 Publications

Projects I led or co-led.

K-Search framework overview: search tree with action selection, local refinement loop, and LLM-driven world model updates

K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model

Shiyi Cao, Ziming Mao, Joseph E. Gonzalez, Ion Stoica

arXiv preprint · 2026

GPU kernels · LLMs · World Model · Evolutionary search

Sky-T1 teaser: blue bird on a branch (NovaSky)

Sky-T1: Train your own O1 preview model within $450

Dacheng Li*, Shiyi Cao*, Tyler Griggs, Shu Liu, Xiangxi Mo, Eric Tang, Sumanth Hegde, Kourosh Hakhamaneshi, Shishir G Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

Findings of EMNLP 2025

Long chain-of-thought · SFT & LoRA · Sky-T1 · Open reasoning models

S* test-time scaling: LiveCodeBench performance baseline vs improvement across models

S*: Test-Time Scaling for Code Generation

Dacheng Li*, Shiyi Cao*, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica

Findings of EMNLP 2025

Test-time scaling · Code generation · Hybrid selection · LiveCodeBench

MoE-Lightning figure

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E Gonzalez, Matei Zaharia, Ion Stoica

ASPLOS 2025

Mixture of Experts · LLM batch inference · CPU offloading

GraphPipe figure

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Byungsoo Jeon*, Mengdi Wu*, Shiyi Cao*, Sunghyun Kim*, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

ASPLOS 2025

Distributed training · Pipeline parallelism

Fairness in serving LLMs figure

Fairness in Serving Large Language Models

Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E Gonzalez, Ion Stoica

OSDI 2024

LLM serving · Fair scheduling

S-LoRA figure

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Ying Sheng*, Shiyi Cao*, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

MLSys 2024

LLM inference · LoRA · Adapters · Memory management

In-network compute figure

Accelerating Data Serialization/Deserialization Protocols with In-Network Compute

Shiyi Cao, Salvatore Di Girolamo, Torsten Hoefler

Workshop on Exascale MPI, ExaMPI@SC 2022

SmartNICs · In-network compute · Data (de)serialization

AdaM figure

AdaM: An Adaptive Fine-Grained Scheme for Distributed Metadata Management

Shiyi Cao, Yuanning Gao, Xiaofeng Gao, Guihai Chen

International Conference on Parallel Processing (ICPP) 2019

Distributed systems · Metadata management · Reinforcement learning

03 Talks

Invited talks & visits.