Music Genre Classifier

 Status : Completed

Tags: python ML OpenCV numpy flask librosa



AIM

Automating the music classification using machine learning to make the selection of songs quick and less cumbersome.


COMPONENTS AND TECHNOLOGIES USED

  • python

  • flask

  • librosa

  • ml

  • numpy

  • heroku


OVERVIEW

Introduction

If one has to classify the songs or music manually, one has to listen to many songs and then select the genre. This is not only time-consuming but also difficult. Automating the music classification aims to make the selection of songs quick and less cumbersome.


 

Description

A music genre classifier is a software program that predicts the genre of a piece of music in audio format. These devices are used for tasks such as automatically tagging music for distributors such as Spotify and Billboard and determining appropriate background music for events.

The ambiguity of genre classification makes machine intelligence well-suited to this task. Given enough audio data, of which large amounts can be easily harvested from online music, machine learning can observe and make predictions using these ill-defined patterns.

This project aims to build a proof-of-concept music genre classifier using a deep learning approach that can correctly predict the genre and confidence level of Western music from popular candidate genres (classical, jazz, rap, rock, etc.. ).

 

Tech Stack and Libraries

  1. Backend language Python- Python was among the top five most widely used programming languages around the world, only yielding the palm to JavaScript, HTML/CSS, and SQL. Popularity is a reasonably good reference point when it comes to choosing the best technology for app development and, in particular, for designing the app’s back-end
  2. Python Web Framework FLASK - Flask is used for developing web applications using Python, implemented on Werkzeug and Jinja2. Advantages of using Flask framework are: A built-in development server and a fast debugger are provided.
  3. HTML, CSS,  Javascript
  4. Audio Analysis Library LIBROSA - Librosa is valuable Python music and sound investigation library that helps programming designers to fabricate applications for working with sound and music document designs utilizing Python. This Python bundle for music and sound examination is essentially utilized when we work with sound information, like in the music age (utilizing Lstm's), Automatic Speech Recognition.
  5. Pandas, Numpy, Matplotlib - 

These data analysis and visualization libraries build the base of exploratory data analysis, like transforming the training testing and validation sets.

  1. Data Analysis Library - ScikitLearn
  2. Stacking Classifier LIbrary(XGBoost and SVC) - This module was used to create a stack model of the two individual classifiers mentioned above for creating an ensemble.
  3. Spotify API - for real-time recommendations on the predictions page 
  4. record.js API - used for recording audio and passing it as a file for predictions on the app.
  5.  Heroku for Deployment.

 

Dataset Used:

 

The dataset used was -  GTZAN 

The dataset is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). The files were collected in 2000-2001 from various sources, including personal CDs, radio, and microphone recordings, to represent various recording conditions.

 

Content

  • genres original - A collection of 10 genres with 100 audio files each, all having a length of 30 seconds (the famous GTZAN dataset, the MNIST of sounds)
  • images original - A visual representation for each audio file. One way to classify data is through neural networks. Because NNs (like CNN, what we will be using today) usually take in some sort of image representation, the audio files were converted to Mel Spectrograms to make this possible.
  • 2 CSV files - Containing features of the audio files. One file has for each song (30 seconds long) a mean and variance computed over multiple features that can be extracted from an audio file. The other file has the same structure, but the songs were split before into 3 seconds audio files (this way, increasing 10 times the amount of data we fuel into our classification models). With data, more is always better.

 

The Steps Involved:

1. Preprocessing:

  1. plotting the .wav file for review
  2. normalization of the values of the feature data of the CSV,
  3. encoding the genre names, 
  4. train test split

2. First, Tried the CNN model on the Mel Spectrogram data provided with the parent dataset.

a)it did not give good accuracy due to overfitting, as the dataset wasn't enough to train a CNN.

 

b)data augmentation was tried but did not work due to incompatibility of the Spectrograms dimensions created by the librosa library.

3. We trained the final model, whose description is given in detail below that was trained on the CSV data provided with the dataset with an accuracy of ~90% was achieved.

4. For the prediction, we created a custom function(named - getdataf) that would create an exact replica of the feature table of the dataset provided.

5. The predictions were then mapped to the genre map dictionary created, which was the encoding.

 

ML MODEL BREAKDOWN

The dataset we used for the model train was the GTZan dataset. The model is an ensemble of two independent classifiers

A) 

  1. SVC (Support Vector Classifier ) Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms used for Classification and Regression problems. However, primarily, it is used for Classification problems in Machine Learning.
  2. The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.

B)

  1. XGBoost is an implementation of Gradient Boosted decision trees. XGBoost models majorly dominate in many Kaggle Competitions.

  1. In this algorithm, decision trees are created in sequential form. Weights play an essential role in XGBoost. Weights are assigned to all the independent variables, which are then fed into the decision tree, which predicts results. The weight of variables predicted wrong by the tree is increased, and these variables are then fed to the second decision tree. These individual classifiers/predictors then ensemble to give a strong and more precise model. It can work on regression, classification, ranking, and user-defined prediction problems.

Finally, Using the Stacking Classifier Library in Python the ensemble of SVC and XGBoost Classifier was made to fit the training set of the data. The trained model produced a decent accuracy of around ~ 85% and using the hyperparameter-tuned SVC Classifier increased the accuracy score and F1 score to ~ 90% for the prediction accuracy of the validation set.

WebApp details

In this project, we make our app using the Python web framework FLASK, 

→ the main page of the app provides the user with a UI enabling them to upload an audio file(.wav type), and the user will get the prediction of the genre of the audio file. 


 

→ On the main, the user is provided with a link to go to the page from where they can even record their own audio pieces for its genre prediction from our model.

After recording the audio, they can listen to the recorded audio from the player available. Then, the audio will be saved as a static file by clicking on the Confirm button. After waiting for a few seconds, the user can click on Predict for the genre prediction.

We were able to achieve accuracy even on the small audio piece that the user recorded (any part of the song ), and the genre differs on the basis of what that audio piece consists of rather than the whole song’s genre.

 

→Apart from the Genre prediction, our web app provides the user with recommendations of the other popular songs of the predicted genre which the user might want to listen to.

The recommendations are fetched from the Spotify API in real-time. As a further development of the app, the user can be provided with the facility of searching and playing the music of their genre liking on the web app itself. 

 

The Predictions and Recommendations page:

 

Source Code

Github: https://github.com/roboclub-mnnit/Music_Genre_classification-2022-23-Project

Video: https://www.youtube.com/watch?v=1MGkDt1iHKA

Research paper referred:

https://www.researchgate.net/publication/357912712_Machine_Learning-Based_Music_Genre_Classification_with_Pre-_Processed_Feature_Analysis

 

Resources  

Official Python Documentation:

https://www.python.org/doc/

Librosa Module(audio analysis library)

https://librosa.org/doc/latest/index.html

GTZan Dataset on Kaggle:

https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification

Audio Analysis Using Python Tutorial:

1. Sound of AI official Youtube channel 

https://www.youtube.com/@ValerioVelardoTheSoundofAI/playlists

2. https://www.youtube.com/playlist?list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf

Spotify API Documentation:

 

https://developer.spotify.com/documentation/web-api

How to use Spotify API

 

Real-life Applications

The project finds a large spectrum of real-life areas of application. 

  1. The ambiguity of genre classification makes machine intelligence well-suited to this task. Given enough audio data, of which large amounts can be easily harvested from online music, machine learning can observe and make predictions using these ill-defined patterns.
  2. The project can be further extended as an audio piece analysis software to extract the intricacies of the pitch, timber, and context of the audio file.

 

CONTRIBUTORS

Name

Branch

Reg. no.

Alok Kumar Singh 

CSE

20214240

Shreyansh Sinha

CHEM

20218002

Siddhant Bhardwaj

MECH

20213067

MENTORS

  1.  Anurag Gupta
  2. Prakhar Agarwal
  3. Purushotam Kumar Agrawal