`OneSoftmax` should be implemented as shown in https://www.evanmiller.org/attention-is-off-by-one.html. This is also related to the changes in https://github.com/JuliaGNI/GeometricMachineLearning.jl/pull/213/commits/aeacd0e29ff0e1969796d11e9744007e94354334.