Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks

Ahmet Melih Basbug, Mustafa Sert

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Automatic understanding of audio events and acoustic scenes has been an active research topic for researchers from signal processing and machine learning communities. Recognition of acoustic scenes in the real life scenarios is a challenging task due to the diversity of environmental sounds and uncontrolled environments. Efficient methods and feature representations are needed to cope with these challenges. In this study, we address the acoustic scene classification of raw audio signal and propose a cascaded CNN architecture that uses spatial pyramid pooling (SPP, also referred to as spatial pyramid matching) method to aggregate local features coming from convolutional layers of the CNN. We use three well known audio features, namely MFCC, Mel Energy, and spectrogram to represent audio content and evaluate the effectiveness of our proposed CNN-SPP architecture on the DCASE 2018 acoustic scene performance dataset. Our results show that, the proposed CNN-SPP architecture with the spectrogram feature improves the classification accuracy.

Original languageEnglish
Title of host publicationProceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages128-131
Number of pages4
ISBN (Electronic)9781538667835
DOIs
Publication statusPublished - Mar 11 2019
Event13th IEEE International Conference on Semantic Computing, ICSC 2019 - Newport Beach, United States
Duration: Jan 30 2019Feb 1 2019

Publication series

NameProceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019

Conference

Conference13th IEEE International Conference on Semantic Computing, ICSC 2019
CountryUnited States
CityNewport Beach
Period1/30/192/1/19

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software

Fingerprint Dive into the research topics of 'Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks'. Together they form a unique fingerprint.

  • Cite this

    Basbug, A. M., & Sert, M. (2019). Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks. In Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019 (pp. 128-131). [8665547] (Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICOSC.2019.8665547