Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, Yanze Jiang, and Zhihua Tian, National University of Singapore; Wei Zou and Jinyuan Jia, Pennsylvania State University; Jiaheng Zhang, National University of Singapore
Large Language Models (LLMs) have demonstrated remarkable capabilities of generating texts resembling human language. However, they can be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to address these concerns, which embeds a message (e.g., a bit string) into a text generated by an LLM. By embedding the user ID (represented as a bit string) into generated texts, we can trace generated texts to the user, known as content source tracing. The major limitation of existing watermarking techniques is that they achieve sub-optimal performance for content source tracing in real-world scenarios. The reason is that they cannot accurately or efficiently extract a long message from a generated text. We aim to address the limitations.
In this work, we introduce a new watermarking method for LLM-generated text grounded in pseudo-random segment assignment. We also propose multiple techniques to further enhance the robustness of our watermarking algorithm. We conduct extensive experiments to evaluate our method. Our experimental results show that our method achieves a much better tradeoff between extraction accuracy and time complexity, compared with existing baselines. For instance, when embedding a message of length 20 into a 200-token generated text, our method achieves a match rate of 97.6%, while the state-of-the-art work Yoo et al. only achieves 49.2%. Additionally, we prove that our watermark can tolerate edits within an edit distance of 17 on average for each paragraph under the same setting.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.