Very Deep Convolutional Networks for Large-scale Image Recognition. Simonyan, Zisserman. ICLR 2015

Discusses approach that got 1st, 2nd place in imagenet challenge 2014

Basic idea is to use very small convolutions (3×3) and a deep network (16-19 layers)

Made the implementation public

Works well on other data sets as well

Last year people moved to make smaller receptive windows, smaller stride, and using training data more thoroughly, (at multiple scales)

224×224: only preprocessing is doing mean-subtraction of RGB values for each pixel

“local response normalization” didnt help performance and consumed more memory

Earlier state of the art used 11×11 convolutions w/stride 4 (or 7×7 stride 2)

Here they only did 3×3 with stride 1

They also have 3 non-linear rectification layers instead of 1, so the decisions made by those layers can be more flexible

Their smaller convolutions have a much smaller number of parameters, which can be seen as a form of regularization

Optimized multinomial logistic regression using minibatch (size 256) gradient descent from backprop + momentum.

“The training was regularised by weight decay (the L2 penalty multiplier set to 5 · 10−4 ) and dropout regularisation for the first two fully-connected layers (dropout ratio set to 0.5).”

<How does this weight decay work exactly? Need to check it out>

Ends up training faster than Krizhevsky et al., 2012’s network because of some pretraining, and also because the network is narrower, but deeper (more regularized)

Pretrain 1st 4 convolutional layers, and last 3 fully connected layers

They found out later that pretraining wasn’t really needed if they used a particular random initialization procedure

Implementation based on Caffe, including very efficient paralleization

With 4 Titan GPUs, took 2-3 weeks to train

Adding further layers didn’t improve performance, although they say it might have if the data set was even larger

“scale jittering helps” <i guess this has to do with how images are cropped and scaled to fit in 224×224, and randomizing this process a bit helps>

“Notably, we did not depart from the classical ConvNet architecture of LeCun et al. (1989), but improved it by substantially increasing the depth.”

Method was simpler than a number of other near state-of-the-art

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy