this project implements LeCunn's 1998 paper "gradient based learning applied to document recognition". the model performed a 0.96 on the f1 macro metric with minimal training epochs. code and models are available.
also on https://paperswithcode.com/paper/gradient-based-learning-applied-to-document
- custom implementations of:
- a convolution layer class which supports sparse connections between channels and shared weights making it more flexible than pytorch's nn.conv2d!
- average pooling, loss function, and optimizer using pytorch tensors operations
- maximum a posteriori loss and sdlm optimizer
- follows original paper's specifications strictly for architecture, hyperparameters, initialization, and even the stylized 10x8x12 bitmap the 0-9 digits used as weights in the rbf layer!!! https://ibb.co/d6ktzc0
- trained with 60,000 samples and validated with 10,000 on mnist dataset
-
macro f1 score: 0.964
-
per-digit f1 scores:
Digit F1 Score 0 0.980 1 0.990 2 0.960 3 0.955 4 0.979 5 0.979 6 0.955 7 0.955 8 0.938 9 0.950 -
all digit-f1 scores above 0.93!!
implementation details:
- input: 32x32 grayscale images
- architecture:
- conv5x5 (6 maps) -> avgpool2x2
- conv5x5 (16 maps) -> avgpool2x2
- fc (120) -> fc (84) -> rbf output