Tuesday, January 20, 2015

ML in Neuroscience - Exceeding Chance Level By Chance

Combrisson and Jerbi note that this problem is well known to statisticians and computer scientists. However, they say, it is often overlooked in neuroscience, especially among researchers using neuroimaging methods such as fMRI, EEG and MEG.

So how serious is this problem? To find out, the authors generated samples of random ‘brain activity’ data, arbitrarily split the samples into two ‘classes’, and used three popular machine learning tools to try to decode the classification. The methods were Linear Discriminant Analysis (LDA), Naive Bayes (NB) classifier, and the Support Vector Machine (SVM). The MATLAB scripts for this is made available here.

By design, there was no real signal in these data. It was all just noise – so the classifiers were working at chance performance.

However, Combrisson and Jerbi show that the observed chance performance regularly exceeds the theoretical level of 50%, when the sample size is small. Essentially, the variability (standard deviation) of the observed correct classification rate is inversely proportion to the sample size. Therefore, with smaller sample sizes, the chance that the chance performance level is (by chance) high, increases. This was true of LDA, NB and SVM alike, and regardless of the type of cross-validation performed.

The only solution, Combrisson and Jerbi say, is to forget theoretical chance performance, and instead evaluate machine learning results for statistical significance against sample-size specific thresholds.


- More Here

No comments: