Skip to yearly menu bar Skip to main content


Poster

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Buyun Zhang · Liang Luo · Yuxin Chen · Jade Nie · Xi Liu · Shen Li · Yanli Zhao · Yuchen Hao · Yantao Yao · Ellie Wen · Jongsoo Park · Maxim Naumov · Wenlin Chen


Abstract:

Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to handle increasingly complex real-world datasets.In this paper, we propose an effective network architecture based purely on stacked factorization machines, and a synergistic upscaling strategy, collectively dubbed Wukong, to establish a scaling law in the domain of recommendation. Wukong is highly scalable, adeptly capturing high-order interactions through the addition of extra layers, and effectively broadening the range of interactions by augmenting the capacity of layers.We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms Sota models quality-wise. Further, we assessed Wukong's scalability on an internal, large-scale dataset. The results show that Wukong retains its superiority in quality over Sota models, while holding the scaling law across two orders of magnitude in model complexity, extending beyond 100 Gflop or equivalently 160 PF-days of total training compute, where prior arts fall short.

Live content is unavailable. Log in and register to view live content