Обзор методов обучения глубоких нейронных сетей

Андрей Владимирович Созыкин

Аннотация


Глубокие нейронные сети в настоящее время становятся одним из самых популярных подходов к созданию систем искусственного интеллекта, таких как распознавание речи, обработка естественного языка, компьютерное зрение и т.п. В статье представлен обзор истории развития и современного состояния методов обучению глубоких нейронных сетей. Рассматривается модель искусственной нейронной сети, алгоритмы обучения нейронных сетей, в том числе алгоритм обратного распространения ошибки, применяемый для обучения глубоких нейронных сетей. Описывается развитие архитектур нейронных сетей: неокогнитрон, автокодировщики, сверточные нейронные сети, ограниченная машина Больцмана, глубокие сети доверия, сети долго-краткосрочной памяти, управляемые рекуррентные нейронные сети и сети остаточного обучения. Глубокие нейронные сети с большим количеством скрытых слоев трудно обучать из-за проблемы исчезающего градиента. В статье рассматриваются методы решения этой проблемы, которые позволяют успешно обучать глубокие нейронные сети с более чем ста слоями. Приводится обзор популярных библиотек глубокого обучения нейронных сетей, которые сделали возможным широкое практическое применение данной технологии. В настоящее время для задач компьютерного зрения используются сверточные нейронные сети, а для обработки последовательностей, в том числе естественного языка, — рекуррентные нейронные сети, прежде всего сети долго-краткосрочной памяти и управляемые рекуррентные нейронные сети.

Ключевые слова


глубокое обучение;нейронные сети;машинное обучение

Полный текст:

PDF

Литература


eCun Y., Bengio Y., Hinton G. Deep Learning. Nature. 2015. vol. 521. pp. 436–444. DOI: 10.1038/nature14539.

Ravı D., Wong Ch., Deligianni F., et al. Deep Learning for Health Informatics. IEEE Journal of Biomedical and Health Informatics. 2017. vol. 21, no. 1. pp. 4–21. DOI: 10.1109/JBHI.2016.2636665.

Schmidhuber J. Deep Learning in Neural Networks: an Overview. Neural Networks. 2015. vol. 1. pp. 85–117, DOI: 10.1016/j.neunet.2014.09.003.

McCulloch W.S., Pitts W. A Logical Calculus of the Ideas Immanent in Nervous Activity. The Bulletin of Mathematical Biophysics. 1943. vol. 5, no. 4. pp. 115–133. DOI: 10.1007/BF02478259.

Hinton G., Salakhutdinov R. Reducing the Dimensionality of Data with Neural Networks. Science. 2006. vol. 313, no. 5786. pp. 504–507. DOI: 10.1126/science.1127647.

Hinton G.E., Osindero S., Teh Y.-W. A Fast Learning Algorithm for Deep Belief Nets. Neural Computing. 2006. vol. 18, no. 7. pp. 1527–1554. DOI: 10.1162/neco.2006.18.7.1527.

Sıma J. Loading Deep Networks Is Hard. Neural Computation. 1994. vol. 6, no. 5. pp. 842–850. DOI: 10.1162/neco.1994.6.5.842.

Windisch D. Loading Deep Networks Is Hard: The Pyramidal Case. Neural Computation. 2005. vol. 17, no. 2. pp. 487–502. DOI: 10.1162/0899766053011519.

Gomez F.J., Schmidhuber J. Co-Evolving Recurrent Neurons Learn Deep Memory POMDPs. Proc. of the 2005 Conference on Genetic and Evolutionary Computation (GECCO) (Washington, DC, USA, June 25–29, 2005), 2005. pp. 491–498. DOI: 10.1145/1068009.1068092.

Ciresan D.C., Meier U., Gambardella L.M., Schmidhuber J. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation. 2010. vol. 22, no. 12. pp. 3207–3220. DOI: 10.1162/NECO_a_00052.

He K., Zhang X., Ren S., et al. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, NV, USA, 27–30 June 2016), 2016. pp. 770–778. DOI: 10.1109/CVPR.2016.90.

Rumelhart D.E., Hinton G.E., McClelland J.L. A General Framework for Parallel Distributed Processing. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. vol. 1, pp. 45–76. DOI: 10.1016/B978-1-4832-1446-7.50010-8.

LeCun Y., Bottou L., Orr G.B. E cient BackProp. Neural Networks: Tricks of the Trade. 1998. pp. 9–50. DOI: 10.1007/3-540-49430-8_2.

Broomhead D.S., Lowe D. Multivariable Functional Interpolation and Adaptive Networks. Complex Systems. vol. 2. pp. 321–355. DOI: 10.1016/0167-6911(92)90025-N.

Stone M.N. The Generalized Weierstrass Approximation Theorem. Mathematics Magazine. 1948. vol. 21, no. 4. pp. 167–184. DOI: 10.2307/3029750.

Gorban A.N., Dudin-Barkovsky V.L., Kirdin A.N., et al. Nejroinformatika [Neuroinformatics]. Novosibirsk, Science, 1998. 296 p.

Hornik K., Stinchcombe M., White H. Multilayer Feedforward Networks are Universal Approximators. Neural Networks. 1989. vol. 2, no. 5. pp. 359–366. DOI: 10.1016/0893-6080(89)90020-8.

Mhaskar H.N., Micchelli Ch.A. Approximation by Superposition of Sigmoidal and Radial Basis Functions. Advances in Applied Mathematics. 1992. vol. 13, no. 13. pp. 350–373. DOI: 10.1016/0196-8858(92)90016-P.

Hebb D.O. The Organization of Behavior. New York: Wiley. 1949. 335 p. DOI: 10.1016/S0361-9230(99)00182-3.

Noviko A.B. On Convergence Proofs on Perceptrons. Symposium on the Mathematical Theory of Automata. 1962. vol. 12. pp. 615–622.

Rosenblatt F. The Perceptron: a Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review. 1958. pp. 65–386. DOI: 10.1037/h0042519.

Widrow B., Ho M. Associative Storage and Retrieval of Digital Information in Networks of Adaptive Neurons. Biological Prototypes and Synthetic Systems. 1962. vol. 1. 160 p. DOI: 10.1007/978-1-4684-1716-6_25.

Narendra K.S., Thathatchar M.A.L. Learning Automata – a Survey. IEEE Transactions on Systems, Man, and Cybernetics. 1974. vol. 4. pp. 323–334. DOI: 10.1109/tsmc.1974.5408453.

Rosenblatt F. Principles of Neurodynamics; Perceptrons and the Theory of Brain Mechanisms. 1962. Washington: Spartan Books. 616 p. DOI: 10.1007/978-3-642-70911-1_20.

Grossberg S. Some Networks That Can Learn, Remember, and Reproduce any Number of Complicated Space-Time Patterns. International Journal of Mathematics and Mechanics. 1969. vol. 19. pp. 53–91. DOI: 10.1512/iumj.1970.19.19007.

Kohonen T. Correlation Matrix Memories. IEEE Transactions on Computers. 1972. vol. 100, no. 4. pp. 353–359. DOI: 10.1109/tc.1972.5008975.

von der Malsburg C. Self-Organization of Orientation Sensitive Cells in the Striate Cortex. Kybernetik. 1973. vol. 14, no. 2. pp. 85–100. DOI: 10.1007/bf00288907.

Willshaw D.J., von der Malsburg C. How Patterned Neural Connections Can Be Set Up by Self-Organization. Proceedings of the Royal Society London B. 1976. vol. 194. pp. 431–445. DOI: 10.1098/rspb.1976.0087.

Ivakhnenko A.G. Heuristic Self-Organization in Problems of Engineering Cybernetics. Automatica. 1970. vol. 6, no. 2. pp. 207–219. DOI: 10.1016/0005-1098(70)90092-0.

Ivakhnenko A.G. Polynomial Theory of Complex Systems. IEEE Transactions on Systems, Man and Cybernetics. 1971. vol. 4. pp. 364–378. DOI: 10.1109/tsmc.1971.4308320.

Ikeda S., Ochiai M., Sawaragi Y. Sequential GMDH Algorithm and Its Application to River Flow Prediction. IEEE Transactions on Systems, Man and Cybernetics. 1976. vol. 7. pp. 473–479. DOI: 10.1109/tsmc.1976.4309532.

Witczak M, Korbicz J, Mrugalski M., et al. A GMDH Neural Network-Based Approach to Robust Fault Diagnosis: Application to the DAMADICS Benchmark Problem. Control Engineering Practice. 2006. vol. 14, no. 6. pp. 671–683. DOI: 10.1016/j.conengprac.2005.04.007.

Kondo T., Ueno J. Multi-Layered GMDH-type Neural Network Self-Selecting Optimum Neural Network Architecture and Its Application to 3-Dimensional Medical Image Recognition of Blood Vessels. International Journal of Innovative Computing, Information and Control. 2008. vol. 4, no. 1. pp. 175–187.

Linnainmaa S. The Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding Errors. University of Helsinki. 1970.

Linnainmaa S. Taylor Expansion of the Accumulated Rounding Error. BIT Numerical Mathematics. 1976. vol. 16, no. 2. pp. 146–160. DOI: 10.1007/bf01931367.

Werbos P.J. Applications of Advances in Nonlinear Sensitivity Analysis. Lecture Notes in Control and Information Sciences. 1981. vol. 38, pp. 762–770. DOI: 10.1007/BFb0006203.

Parker D.B. Learning Logic. Technical Report TR-47, Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology, Cambridge, MA. 1985.

LeCun Y. A Theoretical Framework for Back-Propagation. Proceedings of the 1988 Connectionist Models Summer School (Pittsburgh, Pennsylvania, USA, June 17–26, 1988), 1988. P. 21–28.

Rumelhart D.E., Hinton G.E., Williams R.J. Learning Internal Representations by Error Propagation. Parallel Distributed Processing. 1986. vol. 1. pp. 318–362. DOI: 10.1016/b978-1-4832-1446-7.50035-2.

Qian N. On the Momentum Term in Gradient Descent Learning Algorithms. Neural Networks: The O cial Journal of the International Neural Network Society. 1999. vol. 12, no. 1. pp. 145–151. DOI: 10.1016/s0893-6080(98)00116-6.

Duchi J., Hazan E., Singer Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research. 2011. vol. 12. pp. 2121–2159.

Kingma D.P., Ba J.L. Adam: a Method for Stochastic Optimization. International Conference on Learning Representations (San Diego, USA, May 7-9, 2015), 2015. pp. 1–13.

Fukushima K. Neocognitron: a Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Una ected by Shift in Position. Biological Cybernetics. 1980. vol. 36, no. 4. pp. 193–202. DOI: 10.1007/BF00344251.

Wiesel D.H., Hubel T.N. Receptive Fields of Single Neurones in the Cat’s Striate Cortex. The Journal of Physiology. 1959. vol. 148, no. 3. pp. 574–591. DOI: 10.1113/jphysiol.1959.sp006308.

Fukushima K. Arti cial Vision by Multi-Layered Neural Networks: Neocognitron and its Advances. Neural Networks. 2013. vol. 37. pp. 103–119. DOI: 10.1016/j.neunet.2012.09.016.

Fukushima K. Training Multi-Layered Neural Network Neocognitron. Neural Networks. 2013. vol. 40. pp. 18–31. DOI: 10.1016/j.neunet.2013.01.001.

Fukushima K. Increasing Robustness Against Background Noise: Visual Pattern Recognition by a Neocognitron. Neural Networks. 2011. vol. 24, no. 7. pp. 767–778. DOI: 10.1016/j.neunet.2011.03.017.

Ballard D.H. Modular Learning in Neural Networks. Proceedings of the Sixth National Conference on Arti cial Intelligence (Seattle, Washington, USA, July 13–17, 1987), 1987. vol. 1. pp. 279–284.

Hinton G.E., McClelland J.L. Learning Representations by Recirculation. Neural Information Processing Systems. 1998. American Institute of Physics. pp. 358–366.

Wolpert D.H. Stacked Generalization. Neural Networks. 1992. vol. 5, no. 2. pp. 241–259. DOI: 10.1016/s0893-6080(05)80023-1.

Ting K.M., Witten I.H. Stacked Generalization: When Does It Work? Proceedings of the International Joint Conference on Arti cial Intelligence (IJCAI) (Nagoya, Japan, August 23–29, 1997), 1997. pp. 866–871.

LeCun Y., Boser B., Denker J.S., et al. Back-Propagation Applied to Handwritten Zip Code Recognition. Neural Computation. 1998. vol. 1, no. 4. pp. 541–551. DOI: 10.1162/neco.1989.1.4.541.

LeCun Y., Boser B., Denker J.S., et al. Handwritten Digit Recognition with a Back-Propagation Network. Advances in Neural Information Processing Systems 2. Morgan Kaufmann. 1990. pp. 396–404.

Baldi P., Chauvin Y. Neural Networks for Fingerprint Recognition. Neural Computation. 1993. vol. 5, no. 3. pp. 402–418. DOI: 10.1007/978-3-642-76153-9_35.

Elman J.L. Finding Structure in Time. Cognitive Science. 1990. vol. 14, no. 2. pp. 179–211. DOI: 10.1207/s15516709cog1402_1.

Jordan M.I. Serial Order: a Parallel Distributed Processing Approach. Institute for Cognitive Science, University of California, San Diego. ICS Report 8604. 1986. pp. 40.

Jordan M.I. Serial Order: a Parallel Distributed Processing Approach. Advances in Psychology. 1997. vol. 121. pp. 471–495. DOI: 10.1016/s0166-4115(97)80111-2.

Hochreiter S. Untersuchungen zu Dynamischen Neuronalen Netzen. Diploma thesis, Institut fur Informatik, Lehrstuhl Prof. Brauer. Technische Universitat Munchen, 1991.

Hochreiter S., Bengio Y., Frasconi P., et al. Gradient Flow in Recurrent Nets: the Di culty of Learning Long-Term Dependencies. A Field Guide to Dynamical Recurrent Neural Networks. Wiley-IEEE Press. 2001. pp. 237–243. DOI: 10.1109/9780470544037.ch14.

Bengio Y., Simard P., Frasconi P. Learning Long-Term Dependencies with Gradient Descent is Di cult. IEEE Transactions on Neural Networks. 1994. vol. 5, no. 2. pp. 157–166. DOI: 10.1109/72.279181.

Tino P., Hammer B. Architectural Bias in Recurrent Neural Networks: Fractal Analysis. Neural Computation. 2004. vol. 15, no. 8. pp. 1931–1957. DOI: 10.1162/08997660360675099.

Hochreiter S., Schmidhuber J. Bridging Long Time Lags by Weight Guessing and “Long Short-Term Memory”. Spatiotemporal Models in Biological and Arti cial Systems. 1996. vol. 37. pp. 65–72.

Schmidhuber J., Wierstra D., Gagliolo M., et al. Training Recurrent Networks by Evolino.. Neural Computation. 2007. vol. 19, no. 3. pp. 757–779. DOI: 10.1162/neco.2007.19.3.757.

Levin L.A. Universal Sequential Search Problems. Problems of Information Transmission. 1997. vol. 9, no. 3. pp. 265–266.

Schmidhuber J. Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability. Neural Networks. 1997. vol. 10, no. 5. pp. 857–873. DOI: 10.1016/s0893-6080(96)00127-x.

Muller, M.F. Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in O(N) Time. Computer Science Department, Aarhus University, Denmark. 1993. no. PB-432. DOI: 10.7146/dpb.v22i432.6748.

Pearlmutter B.A. Fast Exact Multiplication by the Hessian. Neural Computation. 1994. vol. 6, no. 1. pp. 147–160. DOI: 10.1162/neco.1994.6.1.147.

Schraudolph N.N. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent. Neural Computation. 2002. vol. 14, no. 7. pp. 1723–1738. DOI: 10.1162/08997660260028683.

Martens J. Deep Learning via Hessian-Free Optimization. Proceedings of the 27th International Conference on Machine Learning (ICML-10) (Haifa, Israel, June 21–24, 2010), 2010. pp. 735–742.

Martens J., Sutskever I. Learning Recurrent Neural Networks with Hessian-Free Optimization. Proceedings of the 28th International Conference on Machine Learning (ICML-11) (Bellevue, Washington, USA, June 28 – July 02, 2011), 2011. pp. 1033–1040.

Schmidhuber J. Learning Complex, Extended Sequences Using the Principle of History Compression. Neural Computation. 1992. vol. 4, no. 2. pp. 234–242. DOI: 10.1162/neco.1992.4.2.234.

Connor J., Martin D.R., Atlas L.E. Recurrent Neural Networks and Robust Time Series Prediction. IEEE Transactions on Neural Networks. 1994. vol. 5, no. 2. pp. 240–254. DOI: 10.1109/72.279188.

Dor ner G. Neural Networks for Time Series Processing. Neural Network World. 1996. vol. 6. pp. 447–468.

Schmidhuber J., Mozer M.C., Prelinger D. Continuous History Compression. Proceedings of International Workshop on Neural Networks (Aachen, Germany, 1993), 1993. pp. 87–95.

Hochreiter S., Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997. vol. 9, no. 8. pp. 1735–1780. DOI: 10.1162/neco.1997.9.8.1735.

Gers F.A., Schmidhuber J., Cummins F. Learning to Forget: Continual Prediction with LSTM. Neural Computation. 2000. vol. 12, no. 10. pp. 2451–2471. DOI: 10.1162/089976600300015015.

Perez-Ortiz J.A., Gers F.A., Eck D., et al. Kalman Filters Improve LSTM Network Performance in Problems Unsolvable by Traditional Recurrent Nets. Neural Networks. 2003. vol. 16, no. 2. pp. 241–250. DOI: 10.1016/s0893-6080(02)00219-8.

Weng J., Ahuja N., Huang T.S. Cresceptron: a Self-Organizing Neural Network Which Grows Adaptively. International Joint Conference on Neural Networks (IJCNN) (Baltimore, MD, USA, 7–11 June 1992). 1992. vol. 1. pp. 576–581. DOI: 10.1109/ijcnn.1992.287150.

Weng J.J., Ahuja N., Huang T.S. Learning Recognition and Segmentation Using the Cresceptron. International Journal of Computer Vision. 1997. vol. 25, no. 2. pp. 109–143. DOI: 10.1023/a:1007967800668.

Ranzato M.A., Huang F.J., Boureau Y.L., et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition. IEEE Conference on Computer Vision and Pattern Recognition (Minneapolis, MN, USA, 17–22 June 2007), 2007. pp. 1–8. DOI: 10.1109/cvpr.2007.383157.

Scherer D., Muller A., Behnke S. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition. Lecture Notes in Computer Science. 2010. vol. 6354, pp. 92–101. DOI: 10.1007/978-3-642-15825-4_10.

Smolensky P. Information Processing in Dynamical Systems: Foundations of Harmony Theory. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. vol. 1. pp. 194–281.

Hinton G.E., Sejnowski T.E. Learning and Relearning in Boltzmann Machines. Parallel Distributed Processing. 1986. vol. 1. pp. 282–317.

Memisevic R., Hinton G.E. Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines. Neural Computation. 2010. vol. 22, no. 6. pp. 1473–1492. DOI: 10.1162/neco.2010.01-09-953.

Mohamed A., Hinton G.E. Phone Recognition Using Restricted Boltzmann Machines. IEEE International Conference on Acoustics, Speech and Signal Processing (Dallas, TX, USA, 14–19 March 2010), 2010. pp. 4354–4357. DOI: 10.1109/icassp.2010.5495651.

Salakhutdinov R., Hinton G. Semantic Hashing. International Journal of Approximate Reasoning. 2009. vol. 50, no. 7. pp. 969–978. DOI: 10.1016/j.ijar.2008.11.006.

Bengio Y., Lamblin P., Popovici D., et al. Greedy Layer-Wise Training of Deep Networks. Advances in Neural Information Processing Systems 19. 2007. pp. 153–160.

Vincent P., Hugo L., Bengio Y., et al. Extracting and Composing Robust Features with Denoising Autoencoders. Proceedings of the 25th international Conference on Machine learning (Helsinki, Finland, July 05–09, 2008). 2008. pp. 1096–1103. DOI: 10.1145/1390156.1390294.

Erhan D., Bengio Y., Courville A., et al. Why Does Unsupervised Pre-Training Help Deep Learning?. Journal of Machine Learning Research. 2010. vol. 11. pp. 625–660.

Arel I., Rose D.C., Karnowski T.P. Deep Machine Learning – a New Frontier in Arti cial Intelligence Research. Computational Intelligence Magazine, IEEE. 2010. vol. 5, no. 4. pp. 13–18. DOI: 10.1109/mci.2010.938364.

Viren J., Sebastian S. Natural Image Denoising with Convolutional Networks. Advances in Neural Information Processing Systems (NIPS) 21. 2009. pp. 769–776.

Razavian A.Sh., Azizpour H., Sullivan J., at al. CNN Features O -the-Shelf: An Astounding Baseline for Recognition. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (Washington, DC, USA, June 23–28, 2014), 2014. pp. 512–519. DOI: 10.1109/cvprw.2014.131.

Ruochen W., Zhe X. A Pedestrian and Vehicle Rapid Identi cation Model Based on Convolutional Neural Network. Proceedings of the 7th International Conference on Internet Multimedia Computing and Service (ICIMCS ’15) (Zhangjiajie, China, August 19–21, 2015), 2015. pp. 32:1–32:4. DOI: 10.1145/2808492.2808524.

Boominathan L., Kruthiventi S.S., Babu R.V. CrowdNet: A Deep Convolutional Network for Dense Crowd Counting. Proceedings of the 2016 ACM on Multimedia Conference (Amsterdam, The Netherlands, October 15–19, 2016), 2016. pp. 640–644. DOI: 10.1145/2964284.2967300.

Kinnikar A., Husain M., Meena S.M. Face Recognition Using Gabor Filter And Convolutional Neural Network. Proceedings of the International Conference on Informatics and Analytics (Pondicherry, India, August 25–26, 2016), 2016. pp. 113:1–113:4. DOI: 10.1145/2980258.2982104.

Hahnloser R.H.R., Sarpeshkar R., Mahowald M.A., et al. Digital Selection and Analogue Ampli cation Coexist in a Cortex-Inspired Silicon Circuit. Nature. 2000. vol. 405. pp. 947–951. DOI: 10.1038/35016072.

Hahnloser R.H.R., Seung H.S., Slotine J.J. Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks. Neural Computation. 2003. vol. 15, no. 3. pp. 621–638. DOI: 10.1162/089976603321192103.

Glorot X., Bordes A., Bengio Y. Deep Sparse Recti er Neural Networks. Journal of Machine Learning Research. 2011. vol. 15. pp. 315–323.

Glorot X., Bengio Y. Understanding the Di culty of Training Deep Feedforward Neural Networks. Proceedings of the International Conference on Arti cial Intelligence and Statistics (AISTATS’10) (Sardinia, Italy, May 13–15, 2010).

Society for Arti cial Intelligence and Statistics. 2010. pp. 249–256.

He K., Zhang X., Ren Sh. et al. Delving Deep into Recti ers: Surpassing

Human-Level Performance on ImageNet Classi cation. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) (Santiago, Chile, December 7–13, 2015), 2015. pp. 1026–1034. DOI: 10.1109/ICCV.2015.123.

Io e S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. JMLR Workshop and Conference Proceedings. Proceedings of the 32nd International Conference on Machine Learning (Lille, France, July 06–11, 2015), 2015. vol. 37. pp. 448–456.

Szegedy C., Liu W, Jia Y. et al. Going Deeper with Convolutions. IEEE Conference on Computer Vision and Pattern Recognition (Boston, MA, USA, June 7–12, 2015), 2015. pp. 1–9. DOI: 10.1109/CVPR.2015.7298594.

Szegedy C., Vanhoucke V., Io e S., et al. Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Seattle, WA, USA, Jun 27–30, 2016), 2016. pp. 2818–2826. DOI: 10.1109/cvpr.2016.308.

Szegedy C., Io e S., Vanhoucke V., et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the Thirty-First AAAI Conference on Arti cial Intelligence (AAAI-17) (San Francisco, California, USA, February 4–9, 2017), 2017. pp. 4278–4284.

Cho K., van Merrienboer B., Gulcehre C., et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Doha, Qatar, October 25–29, 2014), 2014. pp. 1724–1734. DOI: 10.3115/v1/d14-1179.

Cho K., van Merrienboer B., Bahdanau D., et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (Doha, Qatar, October 25, 2014), 2014. pp. 103–111. DOI: 10.3115/v1/w14-4012.

Chung, J., Gulcehre, C., Cho, K., et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. NIPS 2014 Workshop on Deep Learning (Montreal, Canada, December 12, 2014), 2014. pp. 1–9.

He K., Sun J. Convolutional Neural Networks at Constrained Time Cost. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, June 07–12, 2015), 2015. pp. 5353–5360. DOI: 10.1109/CVPR.2015.7299173.

Jia Y., Shelhamer E., Donahue J., et al. Ca e: Convolutional Architecture for Fast Feature Embedding. Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, FL, USA, November 03–07, 2014), 2014. pp. 675–678. DOI: 10.1145/2647868.2654889.

Kruchinin D., Dolotov E., Kornyakov K. et al. Comparison of Deep Learning Libraries on the Problem of Handwritten Digit Classi cation. Analysis of Images, Social Networks and Texts. Communications in Computer and Information Science. 2015. vol. 542. pp. 399–411. DOI: 10.1007/978-3-319-26123-2_38.

Bahrampour S., Ramakrishnan N., Schott L., et al. Comparative Study of Deep Learning Software Frameworks. Available at: https://arxiv.org/abs/1511.06435 (accessed: 02.07.2017).

Bergstra J., Breuleux O., Bastien F., et al. Theano: a CPU and GPU Math Expression Compiler. Proceedings of the Python for Scienti c Computing Conference (SciPy) (Austin, TX, USA, June 28 – July 3, 2010), 2010. pp. 3–10.

Abadi M., Agarwal A. Barham P. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16) (Savannah, GA, USA, November, 2–4, 2016), 2016. pp. 265–283.

Collobert R., Kavukcuoglu K., Farabet C. Torch7: a Matlab-like Environment for Machine Learning. BigLearn, NIPS Workshop (Granada, Spain, December 12–17, 2011), 2011.

Seide F., Agarwal A. CNTK: Microsoft’s Open-Source Deep-Learning Toolkit. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) (San Francisco, California, USA, August 13–17, 2016), 2016. pp. 2135–2135. DOI: 10.1145/2939672.2945397.

Viebke A., Pllana S. The Potential of the Intel(r) Xeon Phi for Supervised Deep Learning. IEEE 17th International Conference on High Performance Computing and Communications (HPCC) (New York, USA, August 24–26, 2015), 2015. pp. 758–765. DOI: 10.1109/hpcc-css-icess.2015.45.

Chollet. F., et al. Keras. 2015. Available at: https://github.com/fchollet/keras (accessed: 02.07.2017).

PadlePadle: PArallel Distributed Deep LEarning. Available at: http://www. paddlepaddle.org/ (accessed: 02.07.2017).

Chen T., Li M., Li Y. MXNet: A Flexible and E cient Machine Learning Library for Heterogeneous Distributed Systems. Available at: https://arxiv.org/abs/1512.01274 (accessed: 02.07.2017).

I

ntel Nervana Reference Deep Learning Framework Committed to Best Performance on all Hardware. Available at: https://www.intelnervana.com/neon/ (accessed: 02.07.2017).

Shi Sh., Wang Q., Xu P. Benchmarking State-of-the-Art Deep Learning Software Tools. Available at: https://arxiv.org/abs/1608.07249 (accessed: 02.07.2017).

Weiss K., Khoshgoftaar T.M., Wang D. A Survey of Transfer Learning. Journal of Big Data. 2016. vol. 3, no. 1. pp. 1–9. DOI: 10.1186/s40537-016-0043-6.

Ba J., Mnih V., Kavukcuoglu K. Multiple Object Recognition with Visual Attention. Proceedings of the International Conference on Learning Representations (ICLR) (San Diego, USA, May 7–9, 2015), 2015. pp. 1–10.

Graves A., Mohamed A.R., Hinton G. Speech Recognition with Deep Recurrent Neural Networks. IEEE International Conference on Acoustics, Speech and Signal Processing (Vancouver, Canada, May 26–31, 2013), 2013. pp. 6645–6649. DOI: 10.1109/ICASSP.2013.6638947.




DOI: http://dx.doi.org/10.14529/cmse170303