* co-first author. Supervised by Luo Mai.
Research Interests
Efficient AI (MLSys): scalable and cost-effective training and serving; system–algorithm co-design. Current focus on recommender and LLM systems under strict latency and cost constraints, including model freshness, distributed model updates, and KV-cache management.
News
- Reviewing for CVPR 2025, ICLR 2025 Workshop (FM-Wild), NeurIPS 2025 Workshop (Efficient Reasoning)
- Ekko published at OSDI 2022; invited talks at Tencent WeChat AI, DataFun, TechBeat
- Tencent Technology Breakthrough Award (Gold Prize) — Project Lead, Ekko
Selected Publications
Full list on Google Scholar.
-
- IEEE Access Dynamic Barycenter Averaging Kernel in RBF Networks for Time Series Classification
2019.
Research & Industry Experience
Tencent (WeChat) · Guangzhou, China
Senior Software Development Engineer — Efficient ML Systems
Ekko: low-latency model update for multi-terabyte DLRMs (published in part as OSDI '22)
- Problem. Scaling DLRMs improved offline accuracy but degraded online engagement; root cause: stale models from increased model-update latency.
- Key idea. Co-designed deployment mechanisms with model-aware policies (compressed update dissemination, accuracy-aware scheduling, SLO-aware placement, safe rollback).
- Technical contributions. WAN bandwidth −92 %, machine cost −49 %, 2.4 s model-update latency; 10,000× model-size scaling (GB → tens of TB).
- Outcomes. Core techniques published as OSDI '22 (co-first author). Deployed in WeChat recommendation stacks, serves 1 B+ users daily. Official WeChat blog reports +40 % DAU and +87 % total VV over six months after full adoption (alongside product iteration and operations).
Data and feature platform: safe, scalable pipelines
- Problem. Modern feature pipelines are long and increasingly multimodal; cross-process operator composition creates high overhead and expensive data movement.
- Approach. WebAssembly-based runtime for in-process isolation (safety + resource constraints) and locality-aware operator placement near data sources.
- Outcome. Data movement reduced up to 1,200× on representative workloads; widely used within WeChat for data preparation.
LLM serving systems
- Building cost-effective serving mechanisms around remote KV-cache storage and compression.
Developer (commit access) — Google Summer of Code 2018
- Improved Semi-NCA performance and optimization pipeline; shipped in LLVM 9.0 (reported speedups up to 1,980× on real-world samples).
- Unified APIs on dominator trees; shipped in LLVM 7.0.
Education
South China University of Technology
B.Eng. in Computer Science and Technology (Innovation Class)
GPA 3.85 / 4.00 · Rank 1 / 28
Talks
Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update.
Tencent WeChat AI Department, Shenzhen (Jun 2022) ; DataFun, Virtual (Aug 2022) ; TechBeat, Virtual (Sep 2022)
Academic Service & Selected Awards
Reviewer
- CVPR 2025
- ICLR 2025 Workshop on FM-Wild
- NeurIPS 2025 Workshop on Efficient Reasoning
Awards
- Tencent Technology Breakthrough Award (Gold Prize) — Project Lead, Ekko (internal highest technical honor) (2022)
- Bronze Medal, ACM-ICPC Asia Xi'an Regional Contest (2017)
- Second Prize, 15th China Collegiate Programming Contest (Guangdong Division, out of 177 teams)