Papers

Papers

Paper Description
SVM
Deep Learning using Linear Support Vector Machines claiming that the L2SVM outperforms Softmax
Softmax
Hierarchical Softmax large category
Autograd
Automatic differentiation in machine learning backpropagation
Efficient BackProp Efficient BackProp from Yann LeCun
Activation
Approximation by Superpositions of Sigmoidal Function universal approximators
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification RELU, RELU weights initialization
Deep Neural Networks
Do Deep Nets Really Need to be Deep?
FitNets: Hints for Thin Deep Nets
The Loss Surfaces of Multilayer Networks
Understanding the difficulty of training deep feedforward neural networks weights initialization
Normalization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Fully Connected and Spatial Batch Normalization
Layer Normalization Instead of normalizing over the batch, we normalize over the features
Group Normalization Layer Normalization Variant for CNN
Regularization
Elastic net regularization L1 regularization, L2 regularization
Dropout: A Simple Way to Prevent Neural Networks from Overfitting dropout
Improving neural networks by preventing co-adaptation of feature detectors
Dropout Training as Adaptive Regularization dropout relation to the other regularization techniques
DropConnect
Optimization
Advances in optimizing Recurrent Networks Nesterov Momentum
Large Scale Distributed Deep Networks comparing L-BFGS and SGD variants
SFO combine the advantages of SGD with advantages of L-BFGS
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization Adagrad - adaptive learning rate method
RMSProp: Divide the gradient by a running average of its recent magnitude rmsprop
Adam: A Method for Stochastic Optimization Adam
Unit Tests for Stochastic Optimization a standardized benchmark for stochastic optimization
Practical Recommendations for Gradient-Based Training of Deep Architectures
HyperParameter Search
Random Search for Hyper-Parameter Optimization
Convolutional Neural Networks
Gradient-Based Learning Applied to Document Recognition LeNet-5
ImageNet Classification with Deep Convolutional Neural Networks AlexNet
Visualizing and Understanding Convolutional Networks ZF Net - Improvement on AlexNet
(Inception Network) Going Deeper with Convolutions GoogLeNet
Inception-ResNet and the Impact of Residual Connections on Learning Inception-v4
Very Deep Convolutional Networks for Large-Scale Image Recognition VGGNet
(ResNet) Deep residual networks for image recognition Residual Network
Identity Mappings in Deep Residual Networks
Network in Network
Multi-Scale Context Aggregation by Dilated Convolutions Dilated convolutions
Striving for Simplicity: The All Convolutional Net
A guide to convolution arithmetic for deep learning transpose convolution and checkerboard artifacts