Brief Review: Deep Neural Networks for the Recognition and Classification of Heart Murmurs Using Neuromorphic Auditory Sensors

Modified LeNet-5 & Modified AlexNet are Proposed

Sik-Ho Tsang
4 min readNov 22, 2023
Block diagram of the complete system implemented on an FPGA using a PDM microphone

Deep Neural Networks for the Recognition and Classification of Heart Murmurs Using Neuromorphic Auditory Sensors
Modified LeNet & Modified AlexNet, by University of Seville
2017 TBCAS, Over 140 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2013 … 2021
[CardioXNet] 2022 [CirCor] [CNN-LSTM] [DsaNet] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • A novel convolutional neural network based tool is proposed for classifying between healthy people and pathological patients using a neuromorphic auditory sensor for FPGA.
  • The samples are segmented and preprocessed using the neuromorphic auditory sensor to decompose their audio information into frequency bands and, after that, sonogram images with the same size are generated for training.

Outline

  1. System Overview
  2. Modified LeNet and Modified AlexNet
  3. Results

1. System Overview

Block diagram of the system architecture.

The heart sound recordings used in this work are obtained from the PhysioNet/CinC Challenge database. Audio recordings were resampled to 2000 Hz and have been divided into three different sets of mutually exclusive populations, using 75% of them to train the network, 15% for validation and 10% to test the network.

  • The segmentation windows of 1, 1.25 and 1.5 seconds (without overlapping) are used.
  • Audio samples are sent to the audio input of an AER-Node platform [28].
Mono-aural Neuromorphic Auditory Sensor for FPGA with an I2S audio ADC and AER interface.
  • A 64-channel mono NAS (Neuromorphic Auditory Sensor) [22] is programmed on the Spartan-6 FPGA that the AER-Node board has, which decomposes the audio signal into frequency bands and packetizes the information using the AER (Address-Event Representation) protocol [23].

An USBAERmini2 board [25] receives this information and sends it to the computer through a USB port. Then, a script running on MATLAB collects the AER packets received and stores them into AEDAT files [24].

Outputs of the different preprocessing steps: the first image (a) is the original audio signal after the segmentation process; the second one (b) is the AER information obtained from the NAS’ output; and the last one (c) is the grayscale sonogram image obtained with NAVIS

A grayscale sonogram image is generated for each AEDAT file using Neuromorphic Auditory VISualizer Tool (NAVIS) [29], which is a desktop software application.

2. Modified LeNet and Modified AlexNet

LeNet-5 and AlexNet
  • The above figures show the conventional LeNet-5 and 1-Channel-Path AlexNet.
  • The input layer was adapted to be able to work with the proper image size that matches its corresponding dataset (50×64 for the 1 s sample length dataset, 63×64 for the 1.25 s dataset and 75×64 for the 1.5 s dataset).
Modified LeNet-5 and Modified AlexNet

LeNet and AlexNet were modified, with reducing kernel sizes and the stride.

3. Results

3.1. Modified LeNet-5

Modified LeNet-5

In Modified LeNet-5, the 1 s dataset achieves the best result (93.68%), while the 1.25 s and the 1.5 s datasets achieve 93.57% and 91.14% accuracy ratios, respectively.

3.2. Default AlexNet

Default AlexNet

In Default AlexNet, the 1.25 s dataset achieved the best result (90.70%), while the 1s and the 1.5 s datasets achieved 89.61% and 89.91% accuracy ratios, respectively.

3.3. Modified AlexNet

Modified AlexNet

In Modified AlexNet, the 1.5 s dataset achieved the best result (97.05%), while the 1s and the 1.25 s datasets achieved 94.88% and 95.95% accuracy ratios, respectively.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.