top of page
EECS 351 Language Classification
Binary Classification Results
For our initial model, we decided to start off by implementing a binary classifier for English and Mandarin Chinese. We first took the raw training data from the Common Voice dataset, computed Fourier coefficients, and then fed that into our system for training. After doing so, the first results that we obtained are displayed below in a confusion matrix. Using 200 clips of each language for training, we obtained an accuracy of 87% for the Chinese classification, while only a 32% accuracy for its English counterpart.
Binary Classifier: About
Binary KNN Classification, K = 3
Binary Classifier: Welcome
After looking at the low accuracies obtained for the English classification, we manually picked out audio samples with better quality to train our model with. We also systematically trimmed audio clips upon finding strange artifacts of MFCC at the beginning of many clips. Finally, we also decided to implement the relieff function (discussed in Classification), which helped us determine the most useful features in classification and select the top predictors for model training. After applying these changes to our model, we obtained the following results.
Binary Classifier: Text
Binary KNN Classification, K = 3, # Predictors = 2500
Binary Classifier: About Me
Binary Classifier: Text
bottom of page