Model freshness at scale
When scaling improves offline accuracy but harms online metrics, I look for "systems bottlenecks that become ML failure modes" (e.g., stale models from slow update pipelines).
I build efficient ML systems for large-scale production
Keeping models fresh under tight latency/cost constraints, and turning research ideas into deployable systems.
I'm interested in system–algorithm co-design that makes large-scale learning technically and economically feasible in production.
When scaling improves offline accuracy but harms online metrics, I look for "systems bottlenecks that become ML failure modes" (e.g., stale models from slow update pipelines).
Safe and scalable data engineering for training: isolation without heavyweight process boundaries, and locality-aware execution near data to reduce data movement. Modern feature pipelines are long and increasingly multimodal. The shift toward multimodal inputs pushes pipeline design toward safe, expressive composition.
Cost-effective serving mechanisms (including LLM serving), where latency SLOs, memory bandwidth, and infrastructure costs interact with model behavior.
Systems I've built to make large models practical in production: freshness, cost, reliability, and performance.
Low-latency model updates at extreme scale
Safe, scalable pipelines for training data
Compiler infrastructure contributions
Ekko is a production system for low-latency model updates in large-scale deep learning recommender systems. It's motivated by a common scaling failure: bigger models can improve offline metrics yet degrade online outcomes when update pipelines can't keep up.
When we scaled DLRMs, offline accuracy improved, but online engagement regressed. The root cause was stale models: pre-scaling infrastructure couldn't propagate updates quickly enough.
The question became: how can we keep update latency low as models scale to extreme size?
In production iterations, Ekko-style mechanisms supported very large-scale deployment while maintaining second-level freshness (reported 2.4s model-update latency) and serving over a billion users daily.
After rollout of Ekko-based recommendation infrastructure, a public WeChat engineering blog reports growth metrics (e.g., "+40% DAU" and "+87% total VV" over six months) alongside product iteration and operations.
What I consider the core research contribution is showing that model-aware mechanisms can make second-level freshness feasible even at multi-terabyte scale.
If you prefer a full list, see my Google Scholar.
Chijun Sima*, Yao Fu*, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, Luo Mai
Supervised by Luo Mai
Read PaperKejian Shi, Hongyang Qin, Chijun Sima, Sen Li, Lifeng Shen, Qianli Ma
Selected service, talks, and recognition.
Conference & workshop peer review
Invited presentations (Ekko)
Selected recognition
Update available
Refresh to load the latest version.