- Published on
Audio Deepfake Detection by using Machine and Deep Learning at WINCOM 2023
- Authors
- Name
- Faruk Kaledibi
Unveiling the World of Audio Deepfake Detection
Recently, at the WINCOM 2023 conference hosted by Istanbul Technical University, I had the privilege of presenting a paper alongside Faruk Kaledibi. Our focus was on the critical issue of fake voices, particularly in the realm of cybersecurity, forensics, and social media.
What's the Buzz About?
In a nutshell, our research dives into the challenges posed by synthetic audio, especially with advanced technologies like Google Audio LM in the picture. The ability to craft eerily convincing fake voices opens the door to potential fraud, identity theft, and information pollution.
Our Approach
Cutting through the noise, we proposed a solution that harnesses the power of machine and deep learning. Our secret sauce? Mel-frequency cepstral coefficients (MFCCs) for feature extraction. These coefficients serve as unique fingerprints that help us tell the difference between real and fake voices.
Why It Matters
Imagine a bank manager falling prey to a cloned voice, authorizing a $35 million transfer, all because the voice seemed genuine. Our study underscores the urgency for foolproof detection methods to tackle the misuse of AI-generated voices.
The Nuts and Bolts
We kept things simple. Our model, language-agnostic, relies on the MFCC feature extraction approach. The three main modules – Audio Object Transformation, MFCC Attribute Extraction, and Audio Deepfake Analysis and Detection – form the backbone of our solution.
Standing Out in the Crowd
In a field where most studies focus on English, our work stands out. We bring a fresh perspective, emphasizing the importance of language-agnostic solutions for a global impact.
What We Found
Testing our model on a dataset of 28 speech samples, we saw promising results. Accuracy ranged from 75% to 88%, with the Support Vector Classifier consistently delivering top-notch performance.
Wrapping Up and Looking Ahead
In conclusion, our paper sheds light on the need for robust audio deepfake detection systems. As we move forward, we plan to fine-tune our algorithms and explore applications in cybersecurity. Future studies will involve stress-testing our model under various conditions.
A shoutout to the EUREKA cluster ITEA project VESTA and TUBITAK for their support in making this research possible.
In a world where synthetic speech technologies bring both convenience and risks, our work contributes to building a safer digital landscape. Stay tuned for more updates on our journey to outsmart the deepfake game!