Don't just hear it
Listen to It!
Upload your audio file to witness real-time CNN feature maps, interactive spectrograms, and intelligent classification results from our refined ResNet-based audio classifier.
Ready to hear what the model hears?
Upload an audio clip and explore spectrograms, feature maps, and predictions.
Loading 3D Architecture...
How the Model Works
A concise tour of the audio CNN pipeline powering the visualizer.
Mel-Spectrogram Input
We transform raw audio into a 2D time–frequency map that preserves perceptual pitch relationships.
Convolutional Layer 1
Stacks of 2D convolutions extract local time–frequency patterns like onsets, harmonics, and textures.
4 Convolutional Layers with 16 blocks
Skip connections stabilize training and help the model learn deeper, richer audio features.
Global Average Pooling
Aggregation over time/frequency yields compact descriptors resilient to temporal shifts.
Flatten + Dropout
Flatten the features and apply dropout to prevent overfitting.
Final Linear Layer
Final classification layer maps features to probabilities over 50 ESC-50 classes.