|
SVM
|
| Deep Learning using Linear Support Vector Machines |
claiming that the L2SVM outperforms Softmax |
|
Softmax
|
| Hierarchical Softmax |
large category |
|
Autograd
|
| Automatic differentiation in machine learning |
backpropagation |
| Efficient BackProp |
Efficient BackProp from Yann LeCun |
|
Activation
|
| Approximation by Superpositions of Sigmoidal Function |
universal approximators |
| Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification |
RELU, RELU weights initialization |
|
Deep Neural Networks
|
| Do Deep Nets Really Need to be Deep? |
|
| FitNets: Hints for Thin Deep Nets |
|
| The Loss Surfaces of Multilayer Networks |
|
| Understanding the difficulty of training deep feedforward neural networks |
weights initialization |
|
Normalization
|
| Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift |
Fully Connected and Spatial Batch Normalization |
| Layer Normalization |
Instead of normalizing over the batch, we normalize over the features |
| Group Normalization |
Layer Normalization Variant for CNN |
|
Regularization
|
| Elastic net regularization |
L1 regularization, L2 regularization |
| Dropout: A Simple Way to Prevent Neural Networks from Overfitting |
dropout |
| Improving neural networks by preventing co-adaptation of feature detectors |
|
| Dropout Training as Adaptive Regularization |
dropout relation to the other regularization techniques |
| DropConnect |
|
|
Optimization
|
| Advances in optimizing Recurrent Networks |
Nesterov Momentum |
| Large Scale Distributed Deep Networks |
comparing L-BFGS and SGD variants |
| SFO |
combine the advantages of SGD with advantages of L-BFGS |
| Adaptive Subgradient Methods for Online Learning and Stochastic Optimization |
Adagrad - adaptive learning rate method |
| RMSProp: Divide the gradient by a running average of its recent magnitude |
rmsprop |
| Adam: A Method for Stochastic Optimization |
Adam |
| Unit Tests for Stochastic Optimization |
a standardized benchmark for stochastic optimization |
| Practical Recommendations for Gradient-Based Training of Deep Architectures |
|
|
HyperParameter Search
|
| Random Search for Hyper-Parameter Optimization |
|
|
Convolutional Neural Networks
|
| Gradient-Based Learning Applied to Document Recognition |
LeNet-5 |
| ImageNet Classification with Deep Convolutional Neural Networks |
AlexNet |
| Visualizing and Understanding Convolutional Networks |
ZF Net - Improvement on AlexNet |
| (Inception Network) Going Deeper with Convolutions |
GoogLeNet |
| Inception-ResNet and the Impact of Residual Connections on Learning |
Inception-v4 |
| Very Deep Convolutional Networks for Large-Scale Image Recognition |
VGGNet |
| (ResNet) Deep residual networks for image recognition |
Residual Network |
| Identity Mappings in Deep Residual Networks |
|
| Network in Network |
|
| Multi-Scale Context Aggregation by Dilated Convolutions |
Dilated convolutions |
| Striving for Simplicity: The All Convolutional Net |
|
| A guide to convolution arithmetic for deep learning |
transpose convolution and checkerboard artifacts |