Technical notes

Systems notes and outcomes

Architecture details for Ekko, plus timeline context, service, talks, awards, and supporting references.

Ekko in detail

Scaling DLRMs improved offline quality but hurt online performance when model updates became stale. Ekko addresses this with model-aware update dissemination, optimization-based placement, and safe rollback mechanisms.

Update dissemination and scheduling

Compressed update dissemination plus accuracy-aware scheduling using model and gradient signals.

WAN bandwidth reduced by 92%.

SLO-aware model placement

Optimization-based shard management to co-locate models while protecting inference engine SLOs.

Machine cost reduced by 49%.

Safe rollout and rollback

Model-state manager to detect harmful updates and roll back in seconds.

Production safety with second-level recovery.

Selected outcomes

  • 2.4s model-update latency
  • 10,000x model-size scaling (GB to tens of TB)
  • Deployed in WeChat recommendation stacks
  • Serves over one billion users daily
  • +40% DAU and +87% total VV over six months (official WeChat write-up, alongside product iteration and operations)

Timeline and context

Experience context from CV, including ongoing systems work in recommendation, data pipelines, and LLM serving.

Jul 2020 - Present

Tencent (WeChat), Senior Software Development Engineer

  • Led and delivered Ekko-related systems; part published at OSDI 2022 (co-first author).
  • Designed data and feature infrastructure with WebAssembly-based in-process isolation and locality-aware placement.
  • Developing LLM serving mechanisms around remote KV-cache storage and compression for better cost-performance.

2018 - Present

LLVM, Developer (commit access)

  • Improved Semi-NCA performance and optimization pipeline (LLVM 9.0).
  • Unified APIs on dominator trees (LLVM 7.0).
  • Reported speedups up to 1980x on real-world samples.

Sep 2016 - Jun 2020

South China University of Technology, B.Eng. Computer Science

  • Innovation Class, GPA 3.85/4.00, Rank 1/28.

Service, talks, and awards

Academic service

  • CVPR 2025
  • ICLR 2025 Workshop on FM-Wild
  • NeurIPS 2025 Workshop on Efficient Reasoning

Talks

  • Tencent WeChat AI Department (Shenzhen, Jun 2022)
  • DataFun (Virtual, Aug 2022)
  • TechBeat (Virtual, Sep 2022)

Awards

  • Tencent Technology Breakthrough Award (Gold Prize), 2022H2 - Project Lead, Ekko
  • ACM-ICPC Asia Xi'an Regional Contest - Bronze (2017)
  • CCPC Guangdong Division - Second Prize (out of 177 teams)