AI Headphones Isolate Speaker in Crowds

BY Mark Howell 29 May 20244 MINS READ
article cover

Today in Edworking News we want to talk about Engineering

AI Headphones Let Wearer Listen to a Single Person in a Crowd, by Looking at Them Just Once

UW News
Noise-canceling headphones have drastically improved at creating an auditory blank slate. However, allowing certain sounds from a user's environment to filter through this silence remains a challenge for researchers. For instance, the latest edition of Apple's AirPods Pro adjusts sound levels for users automatically, sensing when they are in conversation, but users have little control over whom to listen to or when this occurs.
A team from the University of Washington has developed an artificial intelligence system that allows a user wearing headphones to look at a person speaking for just three to five seconds to "enroll" them. This system, known as "Target Speech Hearing" (TSH), cancels all other sounds in the environment and plays only the enrolled speaker's voice in real-time. This occurs even as the listener moves around in noisy places and no longer faces the speaker.

University of Washington's AI-powered headphones allow users to focus on one speaker even in noisy environments.
The research team presented its findings on May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. The code for this proof-of-concept device is available for other developers to build on, yet it is not commercially available.

How the System Works

To use the system, one needs off-the-shelf headphones fitted with microphones. By tapping a button while directing their head at someone talking, the user enrolls the speaker's voice. The sound waves from that speaker should reach the microphones on both sides of the headset simultaneously, within a 16-degree margin of error. These sound signals are sent to an on-board embedded computer where the team’s machine learning software learns the vocal patterns of the desired speaker.
The ability of the system to focus on the enrolled voice improves as the speaker continues to talk, providing the system with more training data.

Testing and Performance

The team tested the system on 21 subjects who rated the clarity of the enrolled speaker's voice nearly twice as high as unfiltered audio on average. The TSH system builds on the team's previous semantic hearing research, which allowed users to select sound classes, such as birds or voices, to hear while canceling other environmental sounds.
Currently, the system can enroll only one speaker at a time. It is designed to enroll a speaker as long as there is no other loud voice from the same direction. Users who are not satisfied with the sound quality can re-enroll the speaker to improve clarity.
Future efforts are directed at expanding the system to earbuds and hearing aids.

Research Team and Funding

The senior author of the project, Shyam Gollakota, is a professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Co-authors include doctoral students Bandhav Veluri, Malek Itani, and Tuochao Chen, and Takuya Yoshioka, director of research at AssemblyAI. This study was funded by the Moore Inventor Fellow award, the Thomas J. Cable Endowed Professorship, and a UW CoMotion Innovation Gap Fund.
Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.

Remember these 3 key ideas for your startup:

  1. Enhanced Focus in Crowded Environments: The TSH system provides the capacity to focus on a single speaker even in noisy environments, enhancing communication clarity in busy settings such as conferences, open offices, and bustling networking events.

  2. Market Potential for Hearable Tech: The success and innovation demonstrated by this AI system highlight a significant opportunity for startups to develop and market advanced hearable technology like targeted speech hearing for broader consumer use, including integrations with earbuds and hearing aids.

  3. AI in Everyday Gadgets: This research illustrates the potential of incorporating AI into everyday devices, paving the way for startups to innovate new applications that modify and enhance user experience based on personal preferences, opening opportunities for new markets.

For more information on the latest trends and breakthroughs in technology and innovation, stay tuned to Edworking News.
Stay productive and innovative with Edworking!
For more details, see the original source.

article cover
About the Author: Mark Howell Linkedin

Mark Howell is a talented content writer for Edworking's blog, consistently producing high-quality articles on a daily basis. As a Sales Representative, he brings a unique perspective to his writing, providing valuable insights and actionable advice for readers in the education industry. With a keen eye for detail and a passion for sharing knowledge, Mark is an indispensable member of the Edworking team. His expertise in task management ensures that he is always on top of his assignments and meets strict deadlines. Furthermore, Mark's skills in project management enable him to collaborate effectively with colleagues, contributing to the team's overall success and growth. As a reliable and diligent professional, Mark Howell continues to elevate Edworking's blog and brand with his well-researched and engaging content.

Trendy NewsSee All Articles
CoverEdit PDFs Securely & Freely: Breeze PDF In-Browser SolutionBreeze PDF is a free, offline browser-based PDF editor ensuring privacy. It offers text, image, and signature additions, form fields, merging, page deletion, and password protection without uploads.
BY Mark Howell 17 days ago
CoverDecoding R1: The Future of AI Reasoning ModelsR1 is an affordable, open-source AI model emphasizing reasoning, enabling innovation and efficiency, while influencing AI advancements and geopolitical dynamics.
BY Mark Howell 26 January 2025
CoverSteam Brick: A Minimalist Gaming Console Redefines PortabilitySteam Brick: A modified, screenless Steam Deck for travel, focusing on portability by using external displays and inputs. A creative yet impractical DIY project with potential risks.
BY Mark Howell 26 January 2025
CoverVisual Prompt Injections: Essential Guide for StartupsThe Beginner's Guide to Visual Prompt Injections explores vulnerabilities in AI models like GPT-4V, highlighting security risks for startups and offering strategies to mitigate potential data compromises.
BY Mark Howell 13 November 2024
CoverGraph-Based AI: Pioneering Future Innovation PathwaysGraph-based AI, developed by MIT's Markus J. Buehler, bridges unrelated fields, revealing shared complexity patterns, accelerating innovation by uncovering novel ideas and designs, fostering unprecedented growth opportunities.
BY Mark Howell 13 November 2024
CoverRevolutionary Image Protection: Watermark Anything with Localized MessagesWatermark Anything enables embedding multiple localized watermarks in images, balancing imperceptibility and robustness. It uses Python, PyTorch, and CUDA, with COCO dataset, under CC-BY-NC license.
BY Mark Howell 13 November 2024
CoverJungle Music's Role in Shaping 90s Video Game SoundtracksJungle music in the 90s revolutionized video game soundtracks, enhancing fast-paced gameplay on PlayStation and Nintendo 64, and fostering a cultural revolution through its energetic beats and immersive experiences.
BY Mark Howell 13 November 2024
CoverMastering Probability-Generating Functions: A Guide for EntrepreneursProbability-generating functions (pgfs) are mathematical tools used in probability theory for data analysis, risk management, and predictive modeling, crucial for startups and SMEs in strategic decision-making.
BY Mark Howell 31 October 2024
Try EdworkingA new way to work from  anywhere, for everyone for Free!
Sign up Now