Lushan Song, Fudan University and ByteDance; Qizhi Zhang and Yu Lin, ByteDance; Haoyu Niu, Fudan University; Daode Zhang, ByteDance; Zheng Qu and Weili Han, Fudan University; Jue Hong, Quanwei Cai, and Ye Wu, ByteDance
Secure data alignment, which securely aligns the data between parties, is the first and crucial step in vertical privacy-preserving machine learning (VPPML). Practical applications, e.g. advertising, require VPPML for personalized services. Meanwhile, the data held by parties in these applications are usually unbalanced. Existing secure unbalanced data alignment approaches typically rely on Cuckoo Hashing, which introduces redundant data outside the intersection, leading to significantly increasing communication size during secure training in VPPML. Though secure shuffle operations can trim these redundant data, these operations would incur huge communication overhead. As a result, these secure approaches should be optimized for efficiency in VPPML scenarios.
In this paper, we propose Suda, an efficient and secure unbalanced data alignment framework for VPPML. By leveraging polynomial-based operations rather than Cuckoo Hashing, Suda efficiently, directly, and exclusively outputs data shares in the intersection without expensive secure shuffle operations. Consequently, Suda efficiently and seamlessly aligns with secure training in VPPML. Specifically, we first design a novel and efficient batch private information retrieval (PIR) protocol based on the oblivious polynomial reduction and evaluation protocols. Second, we design a batch PIR-to-share protocol extended from the batch PIR protocol with the oblivious polynomial interpolation protocol. Note that the batch PIR-to-share protocol securely obtains feature shares rather than the plaintext features which are the outputs of the batch PIR protocol. Comprehensive experiment results demonstrate that: (1) Suda outperforms the state-of-the-art secure data alignment framework by 31.14 x ∼ 210.78 x in communication size and up to 8.21 x in running time; and (2) Suda outperforms the state-of-the-art batch PIR framework by up to 11.53 x in server time.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.