SVM
|
Deep Learning using Linear Support Vector Machines |
claiming that the L2SVM outperforms Softmax |
Softmax
|
Hierarchical Softmax |
large category |
Autograd
|
Automatic differentiation in machine learning |
backpropagation |
Efficient BackProp |
Efficient BackProp from Yann LeCun |
Activation
|
Approximation by Superpositions of Sigmoidal Function |
universal approximators |
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification |
RELU, RELU weights initialization |
Deep Neural Networks
|
Do Deep Nets Really Need to be Deep? |
|
FitNets: Hints for Thin Deep Nets |
|
The Loss Surfaces of Multilayer Networks |
|
Understanding the difficulty of training deep feedforward neural networks |
weights initialization |
Normalization
|
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift |
Fully Connected and Spatial Batch Normalization |
Layer Normalization |
Instead of normalizing over the batch, we normalize over the features |
Group Normalization |
Layer Normalization Variant for CNN |
Regularization
|
Elastic net regularization |
L1 regularization, L2 regularization |
Dropout: A Simple Way to Prevent Neural Networks from Overfitting |
dropout |
Improving neural networks by preventing co-adaptation of feature detectors |
|
Dropout Training as Adaptive Regularization |
dropout relation to the other regularization techniques |
DropConnect |
|
Optimization
|
Advances in optimizing Recurrent Networks |
Nesterov Momentum |
Large Scale Distributed Deep Networks |
comparing L-BFGS and SGD variants |
SFO |
combine the advantages of SGD with advantages of L-BFGS |
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization |
Adagrad - adaptive learning rate method |
RMSProp: Divide the gradient by a running average of its recent magnitude |
rmsprop |
Adam: A Method for Stochastic Optimization |
Adam |
Unit Tests for Stochastic Optimization |
a standardized benchmark for stochastic optimization |
Practical Recommendations for Gradient-Based Training of Deep Architectures |
|
HyperParameter Search
|
Random Search for Hyper-Parameter Optimization |
|
Convolutional Neural Networks
|
Gradient-Based Learning Applied to Document Recognition |
LeNet-5 |
ImageNet Classification with Deep Convolutional Neural Networks |
AlexNet |
Visualizing and Understanding Convolutional Networks |
ZF Net - Improvement on AlexNet |
(Inception Network) Going Deeper with Convolutions |
GoogLeNet |
Inception-ResNet and the Impact of Residual Connections on Learning |
Inception-v4 |
Very Deep Convolutional Networks for Large-Scale Image Recognition |
VGGNet |
(ResNet) Deep residual networks for image recognition |
Residual Network |
Identity Mappings in Deep Residual Networks |
|
Network in Network |
|
Multi-Scale Context Aggregation by Dilated Convolutions |
Dilated convolutions |
Striving for Simplicity: The All Convolutional Net |
|
A guide to convolution arithmetic for deep learning |
transpose convolution and checkerboard artifacts |