Multi-Language Classification Results

Once we were confident in our binary classification, we decided to introduce the third language, Hindi, into our system. After integrating the third class into our code, we ended up with the results below.

Multi-Language Classifier: About

Multi-class KNN, K = 3, # Predictors = 2500, 47.3% Accuracy

Multi-Language Classifier: Image

As shown in the confusion matrix above, introducing Hindi and following the same procedure did not present us with the best results. Classification accuracy of Mandarin and English both decreased to between 50-60%, but the more troubling result was that Hindi classification accuracy was 24%, being even worse on average than theoretical random guessing. From this point, we wanted to improve our classifier to the point where the accuracy for classifying all three languages approaches the same accuracy. After including extracted pitch analysis features (discussed in Data) and adjusting training parameters, we obtained the following results.

Multi-Language Classifier: Text

KNN:

Multi-Language Classifier: Text

Multi-class KNN, K = 10, # Predictors = 3000, 43.33% Accuracy

Multi-Language Classifier: Image

Multi-class KNN, K = 10, No Relieff, 43.67% Accuracy

Multi-Language Classifier: Image

When comparing with our initial multi-class classification results, it can be seen that the Hindi classification accuracy has notably improved from our initial results. Using relieff, all three languages have comparable accuracies instead of Hindi having a significantly lower accuracy (see Discussion).

Multi-Language Classifier: Text

Other Classifiers:

Multi-Language Classifier: Text

Multi-class Error-Correcting Output Codes (SVM), 35% Accuracy

Multi-Language Classifier: Image

Multi-class Binary Classification Tree, 36% Accuracy

Multi-Language Classifier: Image

When we use our new feature matrix to train other classifiers (error-correcting output codes and binary classification tree), it can be seen that both perform poorly compared to K-Nearest Neighbors classification, possibly due to training with the full feature matrix instead of a trimmed version.

Discussion

Return to Top

Multi-Language Classifier: Text