Objectives
This week I wanted to examine the grokking phenomenon more closely. The previous update showed that grokked models exhibit maximally separable spline code distances.
I aim to confirm if grokking during training results in the most desirable rearrangement of latent space in order to produce interpretable regions.
Neuron firing behavior when grokking
We compare a single digit class 7 across a non-grokked model (trained over 1000 epochs) and a grokked model. Figure 1 shows the PCA plots of the MNIST samples.
We observe the stark contrast in the PC plots: non-grokked models exhibit a nearly-Gaussian distribution while the grokked model clearly shows 2-3 clusters where most samples are located. The subplots below show the same trend - the top 30 varying neurons for non-grokked models exhibit maximum variance while grokked neuron variance appears to be left-tailed.
Plotting the ReLU firing activity across the model explains this trend:
- Grokked models have sparser representations, with fewer neurons firing (<20%) overall. Grokking appears to result in a near linear representation, as classification is performed only at the final layers.
- Non-grokked models have ~50% firing activity while still exhibiting similar activation patterns at select neurons.