{FlexPipe}: Maximizing Training Efficiency for Transformer-based Models with {Variable-Length} Inputs

Hairui Zhao; Qi Tian; Hongliang Li; Zizhong Chen

Authors:

Hairui Zhao, Jilin University and University of California, Riverside; Qi Tian, Jilin University; Hongliang Li, Jilin University and Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, China; Zizhong Chen, University of California, Riverside

Abstract:

Transformer achieves promising results among various deep learning architectures. Training transformer-based models (transformers) typically involves various parallelisms, such as data parallelism and pipeline parallelism (PP). Variable-length datasets have been adopted to facilitate multi-task training of transformers, which degrades training efficiency. Though many efforts have significantly improved the variable-length training, these efforts primarily focus on optimizations within a single iteration. However, substantial fluctuations of computation and memory requirements across iterations can also lead to inefficiency overall due to the static partitioning of distributed frameworks. Thus, this paper proposes FlexPipe from the perspective of a distributed system to enable high throughput variable-length training of transformers. To our knowledge, FlexPipe is the first flexible pipeline framework that dynamically adjusts PP by a live flexibility mechanism without training loss. We introduce a novel problem which aims at maximizing training throughput by adjusting the parallel configurations, along with an efficient heuristic algorithm to solve the problem. Extensive experiments show that FlexPipe achieves an average 1.25× training throughput compared to state-of-the-art methods.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Zhao-Hairui PDF

FlexPipe: Maximizing Training Efficiency for Transformer-based Models with Variable-Length Inputs

Open Access Media