How expressive of an offload does RDMA support? We find that the answer is a whole lot! RedN accepted at NSDI’22.
OmniReduce accepted at SIGCOMM’21. Congratulations Jiawei and Elton! Also, we will run a tutorial on Network-Accelerated Distributed Deep Learning at SIGCOMM. Stay tuned.
Congratulations Arnaud and Ahmed! Two papers accepted at INFOCOM’21.
Congratulations Waleed! Assise is accepted at OSDI’20.
Bilal gets papers in at VLDB’20 and SoCC’20! In the VLDB paper, we systematically investigate the problem of cloud configuration using black-box optimization methods and uncover how different methods behave with more than 20 workloads. At SoCC, we propose a new method, Vanir, that reins in configuring data analytics clusters formed by multiple distributed systems whose joint optimization is required.
We survey popular gradient compression techniques for distributed deep learning and perform a comprehensive comparative evaluation. Read our technical report.
Is there a discrepancy between the theory and practice of gradient compression for distributed deep learning? We argue so in our AAAI’20 paper.
Our paper on improving how to reason about and explain the behavior of reinforcement learning agents in networking applications accepted at NetAI’19.