Skip to content

Curiosity13/Urban-Sound-Classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Urban-Sound-Classification

Dataset

The UrbanSound8k dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy.All excerpts are taken from field recordings uploaded to www.freesound.org.
8732 audio files of urban sounds (see description above) in WAV format. The sampling rate, bit depth, and number of channels are the same as those of the original file uploaded to Freesound (and hence may vary from file to file).
The UrbanSound8k dataset used for model training, can be downloaded from the following link: https://urbansounddataset.weebly.com/

Librosa

Librosa was used for data preprocessing and feature extraction. Features used:

MEL Features

MFCC

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum").

MFCC of a dog bark

image

Melspectrogram

A mel-scaled spectrogram.

Melspectrogram of a dog bark

image

Chroma Features

In music, the term chroma feature or chromagram closely relates to the twelve different pitch classes. Chroma-based features, which are also referred to as "pitch class profiles", are a powerful tool for analyzing music whose pitches can be meaningfully categorized (often into twelve categories) and whose tuning approximates to the equal-tempered scale. One main property of chroma features is that they capture harmonic and melodic characteristics of music, while being robust to changes in timbre and instrumentation.

Chroma_stft

A chromagram from a waveform or power spectrogram.

Chromagram of a dog bark

image

Chroma_cqt

Constant-Q chromagram.

Constant-Q chromagram of a dog bark

image

Chroma_cens

The chroma variant “Chroma Energy Normalized” (CENS).

Chroma cens of a dog bark

image

Neural Network Implementation

A simple Neural Network with dropouts was used.
The following results are obtained by training on folders 1-9 and testing on folder 10.
Train accuracy: 93.14%
Test accuracy: 66.06%

Convolutional Neural Network Implementation

A convolutional Neural Network with dropouts was used.
The following results are obtained by training on folders 1-9 and testing on folder 10.
Train accuracy: 95.90%
Test accuracy: 73.11%

About

Sound Classification using Neural Networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%