Research Interests

Efficient AI (MLSys): scalable and cost-effective training and serving; system–algorithm co-design. Current focus on recommender and LLM systems under strict latency and cost constraints, including model freshness, distributed model updates, and KV-cache management.

News

  • Reviewing for CVPR 2025, ICLR 2025 Workshop (FM-Wild), NeurIPS 2025 Workshop (Efficient Reasoning)
  • Ekko published at OSDI 2022; invited talks at Tencent WeChat AI, DataFun, TechBeat
  • Tencent Technology Breakthrough Award (Gold Prize) — Project Lead, Ekko

Selected Publications

Full list on Google Scholar.

  1. IEEE Access Dynamic Barycenter Averaging Kernel in RBF Networks for Time Series Classification

    Kejian Shi, Hongyang Qin, , Sen Li, Lifeng Shen, Qianli Ma

    2019.

Research & Industry Experience

Tencent (WeChat) · Guangzhou, China

Senior Software Development Engineer — Efficient ML Systems

Ekko: low-latency model update for multi-terabyte DLRMs (published in part as OSDI '22)

  • Problem. Scaling DLRMs improved offline accuracy but degraded online engagement; root cause: stale models from increased model-update latency.
  • Key idea. Co-designed deployment mechanisms with model-aware policies (compressed update dissemination, accuracy-aware scheduling, SLO-aware placement, safe rollback).
  • Technical contributions. WAN bandwidth −92 %, machine cost −49 %, 2.4 s model-update latency; 10,000× model-size scaling (GB → tens of TB).
  • Outcomes. Core techniques published as OSDI '22 (co-first author). Deployed in WeChat recommendation stacks, serves 1 B+ users daily. Official WeChat blog reports +40 % DAU and +87 % total VV over six months after full adoption (alongside product iteration and operations).

Data and feature platform: safe, scalable pipelines

  • Problem. Modern feature pipelines are long and increasingly multimodal; cross-process operator composition creates high overhead and expensive data movement.
  • Approach. WebAssembly-based runtime for in-process isolation (safety + resource constraints) and locality-aware operator placement near data sources.
  • Outcome. Data movement reduced up to 1,200× on representative workloads; widely used within WeChat for data preparation.

LLM serving systems

  • Building cost-effective serving mechanisms around remote KV-cache storage and compression.

LLVM

Developer (commit access) — Google Summer of Code 2018

  • Improved Semi-NCA performance and optimization pipeline; shipped in LLVM 9.0 (reported speedups up to 1,980× on real-world samples).
  • Unified APIs on dominator trees; shipped in LLVM 7.0.

Education

Talks

Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update.

Tencent WeChat AI Department, Shenzhen (Jun 2022) ; DataFun, Virtual (Aug 2022) ; TechBeat, Virtual (Sep 2022)

Academic Service & Selected Awards

Reviewer

  • CVPR 2025
  • ICLR 2025 Workshop on FM-Wild
  • NeurIPS 2025 Workshop on Efficient Reasoning

Awards

  • Tencent Technology Breakthrough Award (Gold Prize) — Project Lead, Ekko (internal highest technical honor) (2022)
  • Bronze Medal, ACM-ICPC Asia Xi'an Regional Contest (2017)
  • Second Prize, 15th China Collegiate Programming Contest (Guangdong Division, out of 177 teams)

Selected Write-ups & Links