Deep Learning Resources


Book

  • Goodfellow, I., Bengio, Y., and Courville, A. Deep Learning. MIT Press, 2016. [Link]
  • Nielsen, M. A. Neural Networks and Deep Learning. Determination Press, 2015. [Link]
  • Epelbaum, T. Deep Learning: Technical Introduction. arXiv preprint, 2017. [Link]

Survey

  • LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature 521, 7553 (2015), 436–444. [Link]
  • Schmidhuber, J. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85–117. [Link]
  • Deng, L. Three classes of deep learning architectures and their applications: a tutorial survey. APSIPA transactions on signal and information processing (2012). [Link]
  • Wang, H., Raj, B., and Xing, E. P. On the origin of deep learning. arXiv preprint arXiv:1702.07800 (2017). [Link]

Architecture

Recurrent Neural Network

  • [Original-RNN] Investigations on dynamic neural networks. Schmidhuber Thesis in Germany [Link in Germany]
  • [Original-LSTM] Hochreiter, S., and Schmidhuber, J. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780. [Link]
  • [Refined-RNN-LSTM] Graves, A. Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013). [Link]
  • [GRU] Cho, K., et. al. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). [Link]
  • [Comparison] Jozefowicz, R., Zaremba, W., and Sutskever, I. An empirical exploration of recurrent network architectures ̇ In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (2015), pp. 2342–2350. [Link]
  • [Seq2Seq] Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. In Advances in neural information processing systems (2014), pp. 3104–3112. [Link]

Convolutional Neural Network

  • [AlexNet] Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012), pp. 1097–1105. [Link]
  • [VGGNet] Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). [Link]
  • [GoogLeNet]Szegedy, C., et. al. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (2015), pp. 1–9. [Link]
  • [ResNet]He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778. [Link]

Unsupervised and Deep Generative Models

  • [RBM/DBN] Hinton, G. E., and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. science 313, 5786 (2006), 504–507. [Link]
  • [Autoencoder] Le, Q. V. Building high-level features using large scale unsupervised learning. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (2013), IEEE, pp. 8595–8598. [Link]
  • [RNN] Graves, A. Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013). [Link]
  • [Seq2Seq] Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. In Advances in neural information processing systems (2014), pp. 3104–3112. [Link]
  • [VAE] Kingma, D. P., and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013). [Link]
  • [GAN] Goodfellow, I., et. al. Generative adversarial nets. In Advances in neural information processing systems (2014), pp. 2672–2680. [Link]
  • [VAE+RNN+Attention] Gregor, K., et. al. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015). [Link]
  • [PixelRNN] Oord, A. v. d., Kalchbrenner, N., and Kavukcuoglu, K. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016). [Link]
  • [PixelCNN] van den Oord, A.,et al. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems (2016), pp. 4790–4798. [Link]

Application

Natural Language Processing

Image

Time Series

Library

  • Theano (Univ Montreal) [Link]
    Lang: Python
    (+) Decent high-level wrappers (Keras, Lasagne)
    (-) No multi-GPU support, bulkier
  • Caffe (UC Berkeley) [Link]
    Lang: C++, interface to Python and Matlab
    (+) Great for CNN
    (-) Not so great for RNN
  • TensorFlow (Google) [Link]
    Lang: C++, Python
    (+) Low-level library, excellent documentation and community, visualization tool
    (-) Slower, quite hard to debug, not too many pre-trained models
  • Torch (NYU) [Link]
    Lang: C, LUA, Python (PyTorch)
    (+) Easier to code and debug than TensorFlow, a lot of pre-trained models
    (-) Documentation isn't as polished as TensorFlow
  • Keras [Link]
    Lang: Python
    (+) Easy to use, high-level library, run on top of Theano or Tensorflow
    (-) Hard to debug, difficult to create new architecture
  • Lasagne [Link]
    Lang: Python
    (+) High-level library, run on top of Theano
    (-) Hard to debug, difficult to create new architecture
  • CNTK (Microsoft) [Link]
    Lang: C++
  • Apache MXNet (Amazon) [Link]
    Lang: C++
  • Deeplearning4j (Skymind) [Link]
    Lang: Java
  • Chainer [Link]
    Lang: Python
*(+) and (-) are gathered from my own experience, Quora (1, 2), Tarry Singh
*Interesting analysis of framework use by academic papers (March 2017) from Alwyn Matthew here
% of papers 	 framework
----------------------------
9.1 tensorflow
7.1 caffe
4.6 theano
3.3 torch
2.5 keras
1.7 matconvnet
1.2 lasagne
0.5 chainer
0.3 mxnet
0.3 cntk
0.2 pytorch
0.1 deeplearning4j