Available for collaboration

Chijun Sima

I build efficient ML systems for large-scale production

Keeping models fresh under tight latency/cost constraints, and turning research ideas into deployable systems.

Senior SDE · Tencent (WeChat) OSDI '22 (co-first author) LLVM developer (commit access) B.Eng. CS · SCUT (Rank 1/28)
Chijun Sima
Guangzhou, China
Research

Research focus

I'm interested in system–algorithm co-design that makes large-scale learning technically and economically feasible in production.

Model freshness at scale

When scaling improves offline accuracy but harms online metrics, I look for "systems bottlenecks that become ML failure modes" (e.g., stale models from slow update pipelines).

Low-cost data / feature pipelines

Safe and scalable data engineering for training: isolation without heavyweight process boundaries, and locality-aware execution near data to reduce data movement. Modern feature pipelines are long and increasingly multimodal. The shift toward multimodal inputs pushes pipeline design toward safe, expressive composition.

Efficient serving

Cost-effective serving mechanisms (including LLM serving), where latency SLOs, memory bandwidth, and infrastructure costs interact with model behavior.

Selected work

Selected work

Systems I've built to make large models practical in production: freshness, cost, reliability, and performance.

Ekko (WeChat recsys)

Low-latency model updates at extreme scale

  • Compressed dissemination + model-aware scheduling (WAN -92%)
  • SLO-aware placement (machine cost -49%) + safe rollback
  • Second-level freshness (2.4s) for multi-terabyte models

Data / feature platform

Safe, scalable pipelines for training data

  • WebAssembly runtime for in-process isolation
  • Locality-aware operator placement near data sources
  • Reduced data movement up to 1,200x on representative workloads
WASM Data systems

LLVM

Compiler infrastructure contributions

  • Improved Semi-NCA & optimization pipeline (LLVM 9.0), speedups up to 1980x
  • Unified dominator-tree update APIs (LLVM 7.0)
  • Google Summer of Code 2018 (LLVM)
C++ Compilers
Ekko

Ekko, explained

Ekko is a production system for low-latency model updates in large-scale deep learning recommender systems. It's motivated by a common scaling failure: bigger models can improve offline metrics yet degrade online outcomes when update pipelines can't keep up.

Problem - research question

When we scaled DLRMs, offline accuracy improved, but online engagement regressed. The root cause was stale models: pre-scaling infrastructure couldn't propagate updates quickly enough.

The question became: how can we keep update latency low as models scale to extreme size?

Design highlights

  • Update dissemination + scheduling: compress updates and prioritize synchronization using model/gradient signals (WAN bandwidth -92%).
  • SLO-aware placement: optimization-based shard management to co-locate models without overloading inference engines (machine cost -49%).
  • Safe rollout: model-state management to roll back harmful updates in seconds.

Production outcomes (selected)

In production iterations, Ekko-style mechanisms supported very large-scale deployment while maintaining second-level freshness (reported 2.4s model-update latency) and serving over a billion users daily.

2.4s update latency 10,000x model-size scaling WAN -92% cost -49%
Industry impact

After rollout of Ekko-based recommendation infrastructure, a public WeChat engineering blog reports growth metrics (e.g., "+40% DAU" and "+87% total VV" over six months) alongside product iteration and operations.

What I consider the core research contribution is showing that model-aware mechanisms can make second-level freshness feasible even at multi-terabyte scale.

Publications

Selected Publications

If you prefer a full list, see my Google Scholar.

IEEE Access 2019

Dynamic Barycenter Averaging Kernel in RBF Networks for Time Series Classification

Kejian Shi, Hongyang Qin, Chijun Sima, Sen Li, Lifeng Shen, Qianli Ma

Service

Service & recognition

Selected service, talks, and recognition.

Reviewer

Conference & workshop peer review

  • CVPR 2025
  • ICLR 2025 Workshop (FM-Wild)
  • NeurIPS 2025 Workshop (Efficient Reasoning)

Talks

Invited presentations (Ekko)

  • Tencent WeChat AI (Shenzhen, Jun 2022)
  • DataFun (Virtual, Aug 2022)
  • TechBeat (Virtual, Sep 2022)

Awards

Selected recognition

  • Tencent Technology Breakthrough Award (Gold), 2022H2 — Project Lead (Ekko)
  • ACM-ICPC Asia Xi'an Regional — Bronze (2017)
  • CCPC Guangdong Division — Second Prize (out of 177 teams)