Papers

Paper	Description
SVM
Deep Learning using Linear Support Vector Machines	claiming that the L2SVM outperforms Softmax
Softmax
Hierarchical Softmax	large category
Autograd
Automatic differentiation in machine learning	backpropagation
Efficient BackProp	Efficient BackProp from Yann LeCun
Activation
Approximation by Superpositions of Sigmoidal Function	universal approximators
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification	RELU, RELU weights initialization
Deep Neural Networks
Do Deep Nets Really Need to be Deep?
FitNets: Hints for Thin Deep Nets
The Loss Surfaces of Multilayer Networks
Understanding the difficulty of training deep feedforward neural networks	weights initialization
Normalization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	Fully Connected and Spatial Batch Normalization
Layer Normalization	Instead of normalizing over the batch, we normalize over the features
Group Normalization	Layer Normalization Variant for CNN
Regularization
Elastic net regularization	L1 regularization, L2 regularization
Dropout: A Simple Way to Prevent Neural Networks from Overfitting	dropout
Improving neural networks by preventing co-adaptation of feature detectors
Dropout Training as Adaptive Regularization	dropout relation to the other regularization techniques
DropConnect
Optimization
Advances in optimizing Recurrent Networks	Nesterov Momentum
Large Scale Distributed Deep Networks	comparing L-BFGS and SGD variants
SFO	combine the advantages of SGD with advantages of L-BFGS
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization	Adagrad - adaptive learning rate method
RMSProp: Divide the gradient by a running average of its recent magnitude	rmsprop
Adam: A Method for Stochastic Optimization	Adam
Unit Tests for Stochastic Optimization	a standardized benchmark for stochastic optimization
Practical Recommendations for Gradient-Based Training of Deep Architectures
HyperParameter Search
Random Search for Hyper-Parameter Optimization
Convolutional Neural Networks
Gradient-Based Learning Applied to Document Recognition	LeNet-5
ImageNet Classification with Deep Convolutional Neural Networks	AlexNet
Visualizing and Understanding Convolutional Networks	ZF Net - Improvement on AlexNet
(Inception Network) Going Deeper with Convolutions	GoogLeNet
Inception-ResNet and the Impact of Residual Connections on Learning	Inception-v4
Very Deep Convolutional Networks for Large-Scale Image Recognition	VGGNet
(ResNet) Deep residual networks for image recognition	Residual Network
Identity Mappings in Deep Residual Networks
Network in Network
Multi-Scale Context Aggregation by Dilated Convolutions	Dilated convolutions
Striving for Simplicity: The All Convolutional Net
A guide to convolution arithmetic for deep learning	transpose convolution and checkerboard artifacts