12/10: We looked at the two implementations considered for the exercise and settled on using
"rbm_MNIST_test.py". It was easy to get running and already created the visualization of the filter and Gibbs sampled images, i.e., the feature column vectors, corresponding to each hidden node. We also tried changing the number of hidden nodes to 10 just to verify that it would not be a good idea as was obvious when looking at the found features.
16/10: After reviewing the videos on the training we had a hard time understanding the updating scheme used in the code:
Which also can use more Gibbs sampling steps by increasing nbr_gibbs. Did not seem to help.
Initializing the feature matrix with small Gaussian variables did not seem to help.
Next we implemented more iterations of the training data by creating an outer for-loop. Experimentally about 10 epochs was sufficient for reaching a local minimum.
We tried initializing the weight matrix with random variables.
By increasing the number of samples we got worse reconstruction error, ||X-\tilde(X)||, from 0.0704 to 0.0551 (10 steps) 0.0562 (40 steps). However, the computational time increased approximately 3x (10 steps) and 8x (40 steps).
Note that, it's hard to use the reconstruction error as an indication of performance and, for example, no significant difference was seen in the learned feature vectors.
We tried to implement a classifier as well, using a fully connected single layer using p(h|X) as input. The results were approx 96% accuracy, while a two layer fully connected model, using same size of hidden layers, achieved approx 97%.