EECS 351 Language Classification
Classification
Each language consists of different tonalities, grammatical structures, and phonetics etc. Writing an algorithm to distinguish each would involve a detailed and specific understanding of each language, a complex and difficult task.Therefore, in this case, machine learning offers a better approach as ML identifies trends/patterns that are not apparently available to humans. In our analysis, we attempted various ML classifiers in Matlab to group and classify the audio files by language.
For our project, we initially tested binary classification using various MATLAB classifiers to compare performances and later select a classifier to attempt to build upon and increase performance. The three classifiers we tested were K-Nearest Neighbors, Support Vector Machine, and Binary Classification Tree. For this stage of the project, we used the 14 MFCC features of each time window. When implementing our K-Nearest Neighbors classifier, we also used MATLAB’s relieff function to extract the features most beneficial for training. Trimming our matrix of features with relieff allowed us to improve our model accuracy while also reducing model size and computation time. In the later stage of our project, we use relieff similarly, but pass in a combined matrix of 14 MFCCs over time and 3 additional pitch features.