diff --git a/MAB_for_BCI_1/README.md b/MAB_for_BCI_1/README.md index 61022b8fe12118fde95fb41f449d34e9b2d11123..18f57f0a066ef86c79636372b8cedb0909ed23ad 100644 --- a/MAB_for_BCI_1/README.md +++ b/MAB_for_BCI_1/README.md @@ -1,27 +1,46 @@ # MAB for BCI - example 1 ## Aim of example -The purpose of this example is to showcase how multi-armed bandits can be used for BCI systems. The reader is welcome to use this code as a startingpoint for other projects. +The purpose of this example is to showcase how multi-armed bandits (MABs) can be used for Brain-Computer Interface (BCI) systems and provide code as a startingpoint for such application. The reader is welcome to use this code for other projects. ## Overview -The example uses a data set with four motor imagery classes. -The aim is to find the two most easily distinguishable classes while reducing the needed training data to find these two classes. -The two classes could be used as 'yes' and 'no' in a BCI setting. -In other words, this example showcases how multi-armed bandits can be used during calibration to find suitable classes for a BCI system, similar to the ''One button BCI'' system in [1]. +The example uses a data set with four motor imagery classes (right hand, left hand, tongue, and feet) (BCI competition IV, dataset 2a [1], downloaded through MOABB [2]). +The aim is to find the two most easily distinguishable classes (e.g. right hand vs. tongue) while reducing the needed training data to find these two classes. -See more details in the [`MAB_for_BCI_1.ipynb`](MAB_for_BCI_1.ipynb) file. +In a uniform setting, calibration data from all classes would be collected and then the best combination of two classes would be found. Here, the goal is identify the best combination of classes while data collection is ongoing, so that the remaining data only is collected from these two best classes. +To simulate this, the setup is that each MAB action correspond to one combination of classes (e.g. right hand vs. tongue) and has a classifier distuinguishing between these two classes. For each action the MAB agent chooses, one new training instance is given to the corresponding classifier. The agent gets a positive reward for choosing an action that increases the classification accuracy between the corresponding classes or if the accuracy stays above 90 % (the classification accuracy is already ok and the agent chose an action that is identified as good). + +The two classes could for example be used as 'yes' and 'no' in a BCI setting. +In other words, this example showcases how multi-armed bandits can be used during calibration to find suitable classes for a BCI system, similar to the ''One button BCI'' system in [3]. + +Details on how to create the multi-armed bandit class and how to run the training is found and explained in the [`MAB_for_BCI_1.ipynb`](MAB_for_BCI_1.ipynb) file, which is a python notebook that contains both code and text explanations. ## Output from example -The following two images are generaded from the example. The first, comparing different algorithms regarding received reward and classification accuracy in each step. The second, what actions were chosen by the policy. +The following two figures are generaded from the example. The first, comparing different algorithms regarding received reward and classification accuracy in each step. The second, what actions were chosen by the policy.  +The first subfigure above shows the average reward per time step. +The Thompson sampling and the UCB algorithm find an action that gives a good reward in each step, while the random action does not find a good action. +Finding a good action means that the corresponding two classes are easily distinguishable. + +The second subfigure shows the average accuracy on test data for the action the agent chose. +We see that the Thompson sampling and UCB algorithm, on average, choose actions with higher accuracy than the random policy, indicating that using multi-armed bandits to find distinguishable classes is a good approach compared to randomly picking training data. This means a reduction of calibration time for the BCI system, since only relevant data is collected. + +  +From the second figure we see that the random policy chooses all actions as often, while both the Thompson sampling and UCB seem to prefer the right hand vs. tongue action. +Thus, it is probable that the right hand vs. tongue are most easily distinguishable for this particular user. + ## References -[1] Fruitet, J., Carpentier, A., Munos, R., and Clerc, M. (2012). Bandit Algorithms boost Brain Computer359 +[1] Tangermann, M., Müller, K. R., Aertsen, A., Birbaumer, N., Braun, C., Brunner, C., ... & Blankertz, B. (2012). Review of the BCI competition IV. Frontiers in neuroscience, 55. + +[2] Jayaram, V., & Barachant, A. (2018). MOABB: trustworthy algorithm benchmarking for BCIs. Journal of neural engineering, 15(6), 066011. + +[3] Fruitet, J., Carpentier, A., Munos, R., and Clerc, M. (2012). Bandit Algorithms boost Brain Computer359 Interfaces for motor-task selection of a brain-controlled button. In _Advances in Neural Information360 Processing Systems_, eds. P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger361 (Lake Tahoe, Nevada, United States: Neural Information Processing Systems (NIPS) Foundation),362 -vol. 25, 458–466 \ No newline at end of file +vol. 25, 458–466