Chijun Sima

About

Hi, I am Chijun.

I am a senior system software developer/researcher in the infrastructure team at Tencent. I was fortunate enough to work with Prof. Luo Mai on Project Ekko. I obtained my bachelor's degree at the South China University of Technology, advised by Prof. Qianli Ma. I am also an open-source contributor to the LLVM compiler backend.

My research interests lie in Computer Systems focusing on Cloud Systems, Machine Learning Systems, Serverless and Compilers.

Publications

Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update

Chijun Sima*, Yao Fu*, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, Luo Mai

OSDI 2022

Experience

Tencent

Jul 2020 - Present

• Designed and implemented Ekko (OSDI '22), a novel DLRS that enables low-latency model updates, in collaboration with other developers/researchers from Tencent and the University of Edinburgh. It serves over a billion users daily and significantly reduces the model update latency compared to state-of-the-art systems. It is widely used inside Tencent and improves the profitability of several recommenders.

• Designed and implemented Lightning, an SFI runtime powered by WebAssembly. It is later used for implementing storage functions inside Tencent's multi-tenant storage services (e.g., Ekko/OSDI '22, PaxosStore/VLDB 2017). It is also in wide use as a serverless runtime inside Tencent. Compared with previous solutions, it has tighter resource control on tenants and is more scalable by leveraging tiered compilation and compilation cache techniques.

• Designed Ekko Elastic by incorporating a model shard placement scheduler inside Ekko. It provisions machines by using operation research techniques on resource requirements of different ML models. It helps cut the cluster cost by up to 30% compared to static provisioning.

• Implemented a consensus algorithm library with WAN optimizations (e.g., parallel phase 2). Notably, it uses an adaptive batching algorithm, improving latency and throughput. It was later used for geo-replication at Tencent, powering the message queue system inside Wechat and serving over a billion users.

LLVM

May 2018 - Present

• Improved the performance of the Semi-NCA algorithm implementation and worked to preserve the dominator tree along the optimization pipeline. (Speed up real-world samples up to 1980x.) Relevant changes were committed to LLVM 9.0.

• Unified API for updating Dominators in LLVM. The new API was committed to LLVM 7.0 and is in wide use. Mentored by Jakub Kuderski.