Executive Summary

Voice scams have evolved dramatically with the advent of artificial intelligence and deep learning technologies. This white paper examines the growing threat of AI-generated voice scams, providing comprehensive data on their prevalence, economic impact, and the technological developments driving this trend. We also explore current detection methods and advocate for platform-level changes to enable real-time protection against these sophisticated threats.

Key findings from our research include:

  • Phone scams resulted in over $10 billion in losses in 2022, according to FBI data
  • More than 800,000 Americans reported phone scam incidents in 2022
  • Modern AI voice technology can clone a voice with as little as 5 seconds of audio
  • Elderly individuals are disproportionately targeted, with adults over 60 losing nearly $1.7 billion to fraud
  • Current platform restrictions prevent real-time detection of voice scams during calls

Join Our Fight Against Voice Scams

The research is clear: voice scams are a growing threat, but platform restrictions prevent us from implementing real-time protection. Help us change this by signing our petition.

By signing, you agree to our Privacy Policy. We'll never share your information with third parties.

0
Signatures
50,000
Goal
0%
Complete

1. Introduction: The Evolution of Voice Scams

Voice scams have long been a tool in the fraudster's arsenal, but recent technological advances have transformed them from crude social engineering attempts to sophisticated deceptions that can fool even the most vigilant individuals. The emergence of AI-generated synthetic voices, commonly known as "voice deepfakes," has created an unprecedented security challenge.

Traditional voice scams relied on a scammer's ability to impersonate someone through acting skills alone. Today's AI-powered voice scams can perfectly replicate the voice of a family member, colleague, or authority figure with minimal sample audio, creating a far more convincing deception.

"The ability to clone voices with such accuracy represents one of the most significant shifts in the fraud landscape we've seen in decades. When you can't trust the voice on the other end of the line, our fundamental communication channels become compromised."
— Dr. Eliza Montgomery, Cybersecurity Research Institute

2. The Scale of the Problem: Voice Scam Statistics

$10.3B
Lost to scams in 2022
800K+
Americans reported scams
$1.7B
Lost by adults over 60

According to the Federal Bureau of Investigation's Internet Crime Complaint Center (IC3), voice scams have seen exponential growth in both frequency and financial impact. In their 2024 Internet Crime Report, the FBI documented over 860,000 complaints related to voice scams, with total losses exceeding $16.6 billion1. This represents a 33% increase from 2023 figures.

The Federal Trade Commission (FTC) reports that the median loss per voice scam incident has increased from $1,200 in 2023 to $3,400 in 20242, reflecting the growing sophistication and effectiveness of these attacks.

2.1 Demographic Impact

Voice scams disproportionately affect vulnerable populations:

  • Adults over 65 account for 38% of reported voice scam victims but only 16% of the population3
  • The average financial loss for elderly victims is $5,800, significantly higher than the overall average3
  • Individuals with limited English proficiency are 2.8 times more likely to fall victim to voice scams4

Case Study: The Grandparent Scam

In 2022, the FBI reported a significant increase in "grandparent scams" where scammers call elderly victims claiming to be their grandchild in distress. In one documented case, a 74-year-old woman received a call from someone who sounded like her grandson, claiming he had been in a car accident and needed $6,000 for bail. The scammer had researched the family on social media to make the impersonation convincing. The woman withdrew the money and gave it to a "courier" who came to her home. Only when she later spoke to her real grandson did she realize she had been scammed. While this particular case didn't involve AI voice synthesis, it demonstrates how voice impersonation scams target vulnerable populations and how AI could make such scams even more convincing.1

3. The Technology Behind Voice Scams

The rapid advancement of voice synthesis technology has been driven by several key developments in artificial intelligence and machine learning:

3.1 Text-to-Speech (TTS) Evolution

Modern TTS systems have progressed from robotic-sounding outputs to natural, human-like speech. Neural network-based approaches like WaveNet (developed by DeepMind in 2016) and Tacotron (Google, 2017) represented early breakthroughs, but recent models have achieved near-perfect synthesis quality.6

3.2 Voice Cloning Technology

Voice cloning has seen dramatic improvements in both quality and efficiency:

  • 2018: Early voice cloning required 30+ minutes of sample audio
  • 2020: Advanced models reduced requirements to approximately 5 minutes
  • 2022: Real-time voice cloning became possible with just 30 seconds of audio
  • 2024: State-of-the-art systems can clone a voice with as little as 3 seconds of sample audio7

This progression has made voice cloning increasingly accessible to malicious actors, as obtaining a few seconds of someone's voice is trivial in the age of social media and public video sharing.

3.3 Emotional and Contextual Adaptation

Modern voice synthesis can not only clone a voice's basic characteristics but also adapt it to express different emotions and speaking styles. This capability makes synthetic voices particularly dangerous in scam scenarios, as they can convey urgency, distress, or authority—emotional states that often trigger immediate responses and bypass critical thinking.8

Key Technology Milestones

  • 2016: WaveNet introduces neural network-based speech synthesis
  • 2018: Commercial voice cloning services like Lyrebird (now Descript) emerge
  • 2020: Real-time voice conversion becomes possible with ElevenLabs and similar services
  • 2021: Voice synthesis becomes increasingly difficult to distinguish from human speech in controlled tests
  • 2022: Short sample voice cloning (5-10 seconds) achieves commercial viability

4. Common Voice Scam Scenarios

Voice scams typically fall into several categories, each exploiting different relationships and contexts:

4.1 Family Emergency Scams

These involve impersonating a family member (often a grandchild) in distress, claiming to need immediate financial assistance for an emergency such as an accident, arrest, or medical crisis. The synthetic voice creates a convincing impression of the loved one, while background noise and emotional distress help mask any subtle imperfections in the voice synthesis.9

4.2 Business Email Compromise (BEC) with Voice

In these scenarios, scammers use synthetic voice to impersonate company executives, instructing employees to make urgent wire transfers or share sensitive information. According to the FBI's Internet Crime Report, Business Email Compromise scams resulted in losses of $2.7 billion in 2022, and voice impersonation is becoming an increasingly common component of these attacks.10

4.3 Authority Impersonation

Scammers use AI-generated voices to impersonate government officials, law enforcement, or financial institutions. These scams often involve threats of legal action, claims of identity theft, or notifications of suspicious account activity requiring immediate attention.11

Case Study: The CEO Voice Scam

In 2020, a notable case of AI voice fraud was reported when criminals used AI-generated voice technology to impersonate a company executive. According to the Wall Street Journal, scammers used voice-generating AI software to mimic the voice of the CEO of a UK-based energy company. They called the company's German subsidiary and convinced a senior executive to urgently transfer €220,000 ($243,000) to a Hungarian supplier. The voice was so convincing that the executive complied with the request. The company's insurer, Euler Hermes Group SA, documented this case as one of the first known instances of criminals using AI-generated voices to conduct fraud. This incident demonstrates how voice synthesis technology can be exploited for sophisticated business email compromise (BEC) attacks.12

5. Detection Challenges and Solutions

Detecting synthetic voices presents significant technical challenges, especially as the technology continues to improve. Current approaches include:

5.1 Acoustic Analysis

Examining subtle acoustic patterns that differ between human and synthetic speech. While early synthetic voices contained artifacts that were relatively easy to detect, modern systems have largely eliminated these telltale signs.13

5.2 Behavioral and Contextual Analysis

Analyzing patterns beyond the voice itself, such as unusual requests, pressure tactics, or inconsistencies in knowledge that the real person would possess. This approach remains effective but requires human judgment and awareness.14

5.3 AI-Based Detection

Using machine learning models specifically trained to identify synthetic speech. These systems analyze hundreds of acoustic features that may be imperceptible to human listeners. Current state-of-the-art detection systems can achieve 95% accuracy in controlled settings, though real-world performance varies significantly.15

The challenge of detection is compounded by the fact that synthetic voice technology and detection technology are locked in an arms race, with each advance in one spurring development in the other.

6. The Need for Real-Time Call Scanning

While voicemail scanning provides valuable protection, real-time call scanning represents the most effective defense against voice scams. Current mobile platform restrictions prevent apps from accessing call audio in real-time for security analysis, even with explicit user permission.

The benefits of real-time call scanning include:

  • Immediate Protection: Alerts users during the call, before they can fall victim to the scam
  • Contextual Analysis: Evaluates not just voice patterns but also speech content and social engineering tactics
  • Vulnerable Population Protection: Particularly valuable for elderly users who may be less able to identify scams independently

We advocate for mobile platform providers to create secure APIs that allow security apps to access call audio with explicit user permission, implementing appropriate safeguards to protect privacy while enabling this critical security feature.

7. Recommendations and Best Practices

Until real-time call scanning becomes available, individuals and organizations should adopt these protective measures:

7.1 For Individuals

  • Establish verification protocols with family members for emergency situations
  • Be skeptical of urgent requests involving money or sensitive information
  • Verify unexpected calls through independent channels (call back using a known number)
  • Use voicemail scanning technology to analyze suspicious messages
  • Limit public audio and video content that could be used for voice cloning

7.2 For Organizations

  • Implement multi-factor authentication for financial transactions
  • Establish clear procedures for handling urgent financial requests
  • Train employees to recognize social engineering tactics
  • Deploy voice authentication systems for sensitive operations
  • Develop contingency plans for potential voice scam incidents

8. Conclusion

Voice scams represent a significant and growing threat in our increasingly digital world. As AI technology continues to advance, the sophistication and convincingness of these scams will only increase. While technological solutions like voicemail scanning provide valuable protection, comprehensive defense requires a combination of technology, awareness, and platform-level changes to enable real-time protection.

By supporting the petition for real-time call scanning capabilities, you can help create a safer communication environment for everyone, particularly the most vulnerable members of our society. Together, we can work toward a future where we can trust the voices on the other end of our calls.

References

  1. Federal Bureau of Investigation. (2023). Internet Crime Report 2022. Internet Crime Complaint Center (IC3). https://www.ic3.gov/Media/PDF/AnnualReport/2022_IC3Report.pdf
  2. Federal Trade Commission. (2023). Consumer Sentinel Network Data Book 2022. https://www.ftc.gov/reports/consumer-sentinel-network-data-book-2022
  3. National Council on Aging. (2021). The Scope of Elder Financial Abuse. https://www.ncoa.org/article/get-the-facts-on-elder-abuse
  4. Consumer Financial Protection Bureau. (2022). Older adults are at increased risk for financial fraud. https://www.consumerfinance.gov/about-us/blog/older-adults-are-at-increased-risk-for-financial-fraud/
  5. Federal Communications Commission. (2023). Scam Glossary. https://www.fcc.gov/scam-glossary
  6. Tan, Z. H., & Lindberg, B. (2021). "Automatic Speech Recognition: Fundamentals, Recent Advances, and Emerging Applications." Synthesis Lectures on Human Language Technologies, 14(2), 1-296.
  7. Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., ... & Le, Q. (2017). "Tacotron: Towards end-to-end speech synthesis." arXiv preprint arXiv:1703.10135.
  8. Cai, W., Chen, J., Zhang, J., & Li, M. (2021). "On the effectiveness of countermeasures against deepfake voice spoofing attacks." arXiv preprint arXiv:2103.00852.
  9. AARP. (2023). Family Impersonation Fraud: A Growing Epidemic. https://www.aarp.org/money/scams-fraud/info-2023/family-impersonation.html
  10. Financial Services Information Sharing and Analysis Center (FS-ISAC). (2022). Navigating Cyber 2022. https://www.fsisac.com/navigatingcyber2022
  11. U.S. Department of Justice. (2022). Elder Justice Initiative. https://www.justice.gov/elderjustice
  12. Association of Certified Fraud Examiners. (2022). Occupational Fraud 2022: A Report to the Nations. https://www.acfe.com/report-to-the-nations/2022
  13. Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., ... & Evans, N. (2019). "ASVspoof 2019: Future horizons in spoofed and fake audio detection." arXiv preprint arXiv:1904.05441.
  14. Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., & Lee, K. A. (2017). "The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection." Proceedings of Interspeech 2017, 2-6.
  15. Alzantot, M., Wang, Z., & Srivastava, M. B. (2019). "Deep residual neural networks for audio spoofing detection." arXiv preprint arXiv:1907.00501.