Skip to yearly menu bar Skip to main content


Poster

Proactive Detection of Voice Cloning with Localized Watermarking

Robin San Roman · Pierre Fernandez · Hady Elsahar · Alexandre Defossez · Teddy Furon · Tuan Tran


Abstract:

In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present Audioseal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. Audioseal employs a generator / detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables Audioseal to achieve better imperceptibility. Audioseal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, Audioseal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed—achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

Live content is unavailable. Log in and register to view live content