YAMNet: Neural Network for Sound Event Classification

Engineering and design

YAMNet, a pretrained acoustic detection model, has been widely used for sound event classification. However, the model’s effectiveness in detecting specific music instrument sounds has been limited.
The client give us samples of news sounds music instrument to train yamnet, the samples given were shorter that the net acept,.
This job aims to improve YAMNet’s performance by creating new samples with longer durations for each sound and modifying the category tree in YAMNet. The results demonstrate that the proposed approach effectively identifies and classifies the desired sounds, paving the way for more accurate sound event recognition in various applications.

YAMNet is a powerful acoustic detection model that utilizes the MobileNet_v1 architecture and is trained on the AudioSet dataset, containing more than 2 million labeled YouTube videos. Despite its effectiveness, YAMNet faces challenges in detecting specific sounds. The current study proposes a solution to enhance YAMNet’s performance in this regard.

Problem: The study aims to address the following issues:

Reading in an audio signal.

Calling classifySound to return detected sounds.

Identifying ONLY especific sounds from the folders file.

Solution: The solution involves creating new samples with longer durations (up to 1 second unless) for each sound, which are subsequently used to train YAMNet. This approach allows YAMNet to accurately detect and classify sounds from the specified folders.

Results: The proposed method effectively identifies and classifies specific sounds. Some samples required re-recording with additional silent time after the achieve optimal results.

This work demonstrates that by creating new samples with longer durations and modifying the sound category in YAMNet, the model’s performance in detecting specific sounds from folders given can be significantly improved. This enhanced version of YAMNet may find applications in various domains, such as music production, sound event recognition, and multimedia applications, where accurate sound detection and classification are essential.

Author: Carlos Saldana