I am a first-year Ph.D. student at UC Berkeley EECS
, advised by Ion Stoica
and Joseph Gonzalez, affliated with
Sky Computing Lab and
BAIR.
Previously, I was fortunate to be working with Zhihao Jia at
CMU on accelerating distributed training. I obtained M.S. in Computer Science
at ETH, working at SPCL with
Torsten Hoefler. Prior to
joining grad school, I had a great time at Shanghai
Jiao Tong University where I earned Bachelor degree in Computer Science.
I am mainly interested in accelerating/optimizing computations (especially ML
workloads) on large-scale heterogeneous systems.
Started to study as a master student in Computer Science at ETH Zürich.
Fairness in Serving Large Language Models (Arxiv)
Ying Sheng,  Shiyi Cao,  Dacheng Li,  Banghua Zhu,  Zhuohan Li,  Danyang Zhuo, 
Joseph E Gonzalez,  Ion Stoica 
OSDI 2024.
LLM Serving; Fair Scheduling.
S-LoRA: Serving Thousands of Concurrent LoRA Adapters (Arxiv,
Github,
Blog)
Ying Sheng*,  Shiyi Cao*,  Dacheng Li,  Coleman Hooper,  Nicholas Lee,  Shuo Yang, 
Christopher Chou,  Banghua Zhu,  Lianmin Zheng,  Kurt Keutzer, 
Joseph E. Gonzalez,  Ion Stoica 
MLSys 2024.
LLM Inference; LoRA; Adapters; Memory Management.
Accelerating Data Serialization/Deserialization
Protocols with In-Network Compute (pdf, video)
Shiyi Cao, 
Salvatore Di Girolamo, 
Torsten Hoefler 
Workshop on Exascale MPI, ExaMPI@SC, 2022. 
SmartNICs; In-Network Compute; Data (De)serialization.