Обзор применения глубоких нейронных сетей и параллельных архитектур в задачах фрагментации горных пород

Михаил Владимирович Ронкин, Елена Николаевна Акимова, Владимир Евгеньевич Мисилов, Кирилл Игоревич Решетников

Аннотация


Оценка производительности добычи полезных ресурсов, в том числе определение геометрических размеров объектов горной породы в открытом карьере, является одной из наиболее важных задач в горнодобывающей промышленности. Задача фрагментации горных пород решается с помощью методов компьютерного зрения, таких как экземплярная сегментация или семантическая сегментация. В настоящее время для решения таких задач для цифровых изображений используются нейронные сети глубокого обучения. Нейронные сети требуют больших вычислительных мощностей для обработки цифровых изображений высокого разрешения и больших наборов данных. Для решения этой проблемы в литературе предлагается использование облегченных архитектур нейронных сетей, а также методов оптимизации производительности, таких как параллельные вычисления с помощью центральных, графических и специализированных процессоров. В обзоре рассматриваются последние достижения в области нейронных сетей глубокого обучения для решения задач компьютерного зрения применительно к фрагментации горных пород и вопросы повышения производительности реализаций нейронных сетей на различных параллельных архитектурах.

Ключевые слова


компьютерное зрение; сверточные нейронные сети; глубокое обучение; экземплярная сегментация; семантическая сегментация; обнаружение объектов; параллельные вычисления; задачи горнодобывающей промышленности; фрагментация горных пород

Полный текст:

PDF

Литература


Fu Y., Aldrich C. Deep learning in mining and mineral processing operations: a review // IFAC-PapersOnLine. 2020. Vol. 53, no. 2. P. 11920–11925. DOI: 10.1016/j.ifacol.2020.12.712.

Zhou W., Wang H., Wan Z. Ore Image Classification Based on Improved CNN // Computers and Electrical Engineering. 2022. Vol. 99. P. 107819. DOI: 10.1016/j.compeleceng.2022.107819.

Liu X., Wang H., Jing H., et al. Research on intelligent identification of rock types based on faster R-CNN method // IEEE Access. 2020. Vol. 8. P. 21804–21812. DOI: 10.1109/ACCESS.2020.2968515.

Amiripallia S.S., Rao G.N., Beharaa J., Sanjay K. Mineral Rock Classification Using Convolutional Neural Network // First International Conference on Recent Trends in Computing (ICRTC 2021), Virtual, Kopargaon, India, May 21–22, 2021. Advances in Parallel Computing. Vol. 39 / ed. by M. Rajesh, K. Vengatesan, M. Gnanasekar, et al. Amsterdam, The Netherlands: IOS Press, 2021. P. 499–505. DOI: 10.3233/APC210235.

Karimpouli S., Tahmasebi P. Segmentation of digital rock images using deep convolutional autoencoder networks // Computers & Geosciences. 2019. Vol. 126. P. 142–150. DOI: 10.1016/j.cageo.2019.02.003.

He M., Zhang Z., Ren J., et al. Deep convolutional neural network for fast determination of the rock strength parameters using drilling data // International Journal of Rock Mechanics and Mining Sciences. 2019. Vol. 123. P. 104084. DOI: 10.1016/j.ijrmms.2019.104084.

Alzubaidi F., Mostaghimi P., Swietojanski P., et al. Automated lithology classification from drill core images using convolutional neural networks // Journal of Petroleum Science and Engineering. 2021. Vol. 197. P. 107933. DOI: 10.1016/j.petrol.2020.107933.

Chen T., Hu N., Niu R., et al. Object-Oriented Open-Pit Mine Mapping Using Gaofen-2 Satellite Image and Convolutional Neural Network, for the Yuzhou City, China // Remote Sensing. 2020. Vol. 12, no. 23. P. 3895. DOI: 10.3390/rs12233895.

Baek J., Choi Y. Deep neural network for predicting ore production by truck-haulage systems in open-pit mines // Applied Sciences. 2020. Vol. 10, no. 5. P. 1657. DOI: 10.3390/app10051657.

Williams J., Singh J., Kumral M., Ramirez Ruiseco J. Exploring deep learning for diglimit optimization in open-pit mines // Natural Resources Research. 2021. Vol. 30, no. 3. P. 2085–2101. DOI: 10.1007/s11053-021-09864-y.

Somua-Gyimah G., Frimpong S., Nyaaba W., Gbadam E. A computer vision system for terrain recognition and object detection tasks in mining and construction environments // 2019 SME Annual Conference and Expo and CMA 121st NationalWestern Mining Conference, Denver, CO, USA, February 24–27, 2019. Society for Mining, Metallurgy and Exploration (SME), 2019.

Zeng F., Jacobson A., Smith D., et al. Lookup: Vision-only real-time precise underground localisation for autonomous mining vehicles // 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, May 20–24, 2019. IEEE, 2019. P. 1444–1450. DOI: 10.1109/ICRA.2019.8794453.

Vu T., Bao T., Hoang Q.V., et al. Measuring blast fragmentation at Nui Phao open-pit mine, Vietnam using the Mask R-CNN deep learning model // Mining Technology. 2021. Vol. 130, no. 4. P. 232–243. DOI: 10.1080/25726668.2021.1944458.

Zyuzin V., Ronkin M., Porshnev S., Kalmykov A. Computer vision system for the automatic asbestos content control in stones // Big Data and AI Conference 2020, Moscow, Russian Federation, September 17–18, 2020. IOP Publishing: Journal of Physics: Conference Series. Vol. 1727, 2021. P. 012014. DOI: 10.1088/1742-6596/1727/1/012014.

Zyuzin V., Ronkin M., Porshnev S., Kalmykov A. Automatic Asbestos Control Using Deep Learning Based Computer Vision System // Applied Sciences. 2021. Vol. 11, no. 22. P. 10532. DOI: 10.3390/app112210532.

Ronkin M., Kalmykov A., Reshetnikov K., Zyuzin V. Investigation of Object Detection Based Method for Open-Pit Blast Quality Estimation // 2022 Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), Yekaterinburg, Russian Federation, September 19–21, 2022. IEEE, 2022. P. 248–251. DOI: 10.1109/USBEREIT56278.2022.9923353.

Gao R., Sun Z., Li W., et al. Automatic coal and gangue segmentation using U-Net based fully convolutional networks // Energies. 2020. Vol. 13, no. 4. P. 829. DOI: 10.3390/en13040829.

Sangaiah A.K. Deep learning and parallel computing environment for bioengineering systems. St. Louis, MO, USA: Academic Press, 2019. 280 p.

Ronkin M. V., Akimova E. N., Misilov V. E. Review of deep learning approaches in solving rock fragmentation problems // AIMS Mathematics. 2023. Vol. 8, no. 10. P. 23900–23940. DOI: 10.3934/math.20231219.

Liu X., Zhang Y., Jing H., et al. Ore image segmentation method using U-Net and Res_Unet convolutional networks // RSC advances. 2020. Vol. 10, no. 16. P. 9396–9406. DOI: 10.1039/C9RA05877J.

Si L., Xiong X., Wang Z., Tan C. A deep convolutional neural network model for intelligent discrimination between coal and rocks in coal mining face // Mathematical Problems in Engineering. 2020. Vol. 2020. P. 2616510. DOI: 10.1155/2020/2616510.

Su C., Xu S.-j., Zhu K.-y., Zhang X.-c. Rock classification in petrographic thin section images based on concatenated convolutional neural networks // Earth Science Informatics. 2020. Vol. 13, no. 4. P. 1477–1484. DOI: 10.1007/s12145-020-00505-1.

Ronkin M., Reshetnikov K., Zyuzin V. Open-Pits asbestos. 2022. DOI: 10.17632/pfdbfpfygh. (accessed: 16.12.2022).

Ronkin M., Reshetnikov K., Zyuzin V., et al. Asbest veins in the open pit conditions. 2022. DOI: 10.17632/y2jfk63tpd. (accessed: 16.12.2022).

Babaeian M., Ataei M., Sereshki F., et al. A new framework for evaluation of rock fragmentation in open pit mines // Journal of Rock Mechanics and Geotechnical Engineering. 2019. Vol. 11, no. 2. P. 325–336. DOI: 10.1016/j.jrmge.2018.11.006.

Li H., Pan C., Chen Z.and Wulamu A., Yang A. Ore image segmentation method based on U-Net and watershed // Comput. Mater. Contin. 2020. Vol. 65. P. 563–578. DOI: 10.32604/cmc.2020.09806.

Mkwelo S., Nicolls V., De Jager G. Watershed-based segmentation of rock scenes and proximity-based classification of watershed regions under uncontrolled lighting // SAIEE Africa Research Journal. 2005. Vol. 96, no. 1. P. 28–34. DOI: 10.23919/SAIEE.2005.9488146.

Bamford T., Esmaeili K., Schoellig A.P. A deep learning approach for rock fragmentation analysis // International Journal of Rock Mechanics and Mining Sciences. 2021. Vol. 145. P. 104839. DOI: 10.1016/j.ijrmms.2021.104839.

Jung D., Choi Y. Systematic review of machine learning applications in mining: Exploration, exploitation, and reclamation // Minerals. 2021. Vol. 11, no. 2. P. 148. DOI: 10.3390/min11020148.

Franklin J.A., Katsabanis T. Measurement of blast fragmentation. Rotterdam, the Netherlands: A. A. Balkema, 1996. 324 p.

Tosun A. A modified Wipfrag program for determining muckpile fragmentation // Journal of the Southern African Institute of Mining and Metallurgy. 2018. Vol. 118, no. 10. P. 1113–1199. DOI: 10.17159/2411-9717/2018/v118n10a13.

Latham J.-P., Kemeny J., Maerz N., et al. A blind comparison between results of four image analysis systems using a photo-library of piles of sieved fragments // Fragblast. 2003. Vol. 7, no. 2. P. 105–132. DOI: 10.1076/frag.7.2.105.15899.

Ronneberger O., Fischer P., Brox T. U-Net: Convolutional networks for biomedical image segmentation // International Conference on Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Munich, Germany, October 5–9, 2015. Proceedings, Part III. Vol. 9351 / ed. by N. Navab, J. Hornegger, W. Wells, A. Frangi. Springer, 2015. P. 234–241. Lecture Notes in Computer Science. DOI: 10.1007/978-3-319-24574-4_28.

Siddique N., Paheding S., Elkin C.P., Devabhaktuni V. U-net and its variants for medical image segmentation: A review of theory and applications // IEEE Access. 2021. Vol. 9. P. 82031–82057. DOI: 10.1109/ACCESS.2021.3086020.

Yin X.-X., Sun L., Fu Y., et al. U-Net-Based Medical Image Segmentation // Journal of Healthcare Engineering. 2022. Vol. 2022. P. 4189781 DOI: 10.1155/2022/4189781.

Wu J., Liu W., Li C., et al. A State-of-the-art Survey of U-Net in Microscopic Image Analysis: from Simple Usage to Structure Mortification // CoRR. 2022. Vol. abs/2202.06465. arXiv: 2202.06465. URL: https://arxiv.org/abs/2202.06465.

Beucher S. Use of watersheds in contour detection // International Workshop on Image Processing: Real-time Edge and Motion detection/estimation, Rennes, France, September 17–21, 1979. CCETT, 1979.

Guo Q., Wang Y., Yang S., Xiang Z. A method of blasted rock image segmentation based on improved watershed algorithm // Scientific Reports. 2022. Vol. 12, no. 1. P. 1–21. DOI: 10.1038/s41598-022-11351-0.

Gu W., Bai S., Kong L. A review on 2D instance segmentation based on deep neural networks // Image and Vision Computing. 2022. Vol. 120. P. 104401. DOI: 10.1016/j.imavis.2022.104401.

Hafiz A.M., Bhat G.M. A survey on instance segmentation: state of the art // International journal of multimedia information retrieval. 2020. Vol. 9, no. 3. P. 171–189. DOI: 10.1007/s13735-020-00195-x.

He K., Gkioxari G., Doll´ar P., Girshick R. Mask R-CNN // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020. Vol. 42, no. 2. P. 386–397. DOI: 10.1109/TPAMI.2018.2844175.

He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 27–30, 2016. IEEE, 2016. P. 770–778. DOI: 10.1109/CVPR.2016.90.

Ramesh C.S., et al. A Review on Instance Segmentation Using Mask R-CNN // Proceedings of the International Conference on Systems, Energy & Environment (ICSEE) 2021, Kerala, India, January 22–23, 2021. SSRN, 2021. P 183–186. DOI: 10.2139/ssrn.3794272.

Schenk F., Tscharf A., Mayer G., Fraundorfer F. Automatic muck pile characterization from UAV images // ISPRS Geospatial Week 2019, Enschede, The Netherlands, June 10–14, 2019. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. 2019. Vol. IV-2/W5. P. 163–170. DOI: 10.5194/isprs-annals-IV-2-W5-163-2019.

Maitre J., Bouchard K., Bedard L.P. Mineral grains recognition using computer vision and machine learning // Computers & Geosciences. 2019. Vol. 130. P. 84–93. DOI: 10.1016/j.cageo.2019.05.009.

Jocher G., Chaurasia A., Stoken A., et al. ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. DOI: 10.5281/zenodo.7347926.

Zaidi S.S.A., Ansari M.S., Aslam A., et al. A survey of modern deep learning based object detection models // Digital Signal Processing. 2022. P. 103514. DOI: 10.1016/j.dsp.2022.103514.

Mo Y., Wu Y., Yang X., et al. Review the state-of-the-art technologies of semantic segmentation based on deep learning // Neurocomputing. 2022. Vol. 493. P. 626–646. DOI: 10.1016/j.neucom.2022.01.005.

Minaee S., Boykov Y.Y., Porikli F., et al. Image segmentation using deep learning: A survey // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021. Vol. 44, no. 7. P. 3523–3542. DOI: 10.1109/TPAMI.2021.3059968.

Yuan X., Shi J., Gu L. A review of deep learning methods for semantic segmentation of remote sensing imagery // Expert Systems with Applications. 2021. Vol. 169. P. 114417. DOI: 10.1016/j.eswa.2020.114417.

PapersWithCode.com. Semantic segmentation benchmarks. URL: https:// paperswithcode.com/task/semantic-segmentation (accessed: 25.11.2022).

PapersWithCode.com. Real-time semantic segmentation benchmarks. URL: https: //paperswithcode.com/task/real-time-semantic-segmentation/latest (accessed: 25.11.2022).

Carvalho O.L.F.d., Carvalho Junior O.A. de, Albuquerque A.O.d., et al. Instance segmentation for large, multi-channel remote sensing imagery using Mask-RCNN and a mosaicking approach // Remote Sensing. 2020. Vol. 13, no. 1. P. 39. DOI: 10.3390/rs13010039.

PapersWithCode.com. Instance segmentation benchmarks. URL: https://paperswithcode.com/task/instance-segmentation (accessed: 25.11.2022).

PapersWithCode.com. Real-time Instance Segmentation on MSCOCO. URL: https://paperswithcode.com/sota/real-time-instance-segmentation-on-mscoco (accessed: 25.11.2022).

Hossain S., Lee D.-j. Deep learning-based real-time multiple-object detection and tracking from aerial imagery via a flying robot with GPU-based embedded devices // Sensors. 2019. Vol. 19, no. 15. P. 3371. DOI: 10.3390/s19153371.

Strudel R., Garcia R., Laptev I., Schmid C. Segmenter: Transformer for semantic segmentation // 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, October 10–17, 2021. IEEE, 2022. P. 7262–7272. DOI: 10.1109/ICCV48922.2021.00717.

Liu Z., Lin Y., Cao Y., et al. Swin transformer: Hierarchical vision transformer using shifted windows // 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, October 10–17, 2021. IEEE, 2022. P. 10012–10022. DOI: 10.1109/ICCV48922.2021.00986.

LeCun Y., Boser B., Denker J.S., et al. Backpropagation applied to handwritten zip code recognition // Neural computation. 1989. Vol. 1, no. 4. P. 541–551. DOI: 10.1162/neco.1989.1.4.541.

LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition // Proceedings of the IEEE. 1998. Vol. 86, no. 11. P. 2278–2324. DOI: 10.1109/5.726791.

Goodfellow I., Bengio Y., Courville A. Deep Learning. Cambridge, MA, USA: MIT Press, 2016. 800 p. URL: http://www.deeplearningbook.org.

Zhang A., Lipton Z.C., Li M., Smola A.J. Dive into deep learning // Cambridge, UK: Cambridge University Press, 2023. URL: https://D2L.ai

Krizhevsky A., Sutskever I., Hinton G.E. ImageNet classification with deep convolutional neural networks // Communications of the ACM. 2017. Vol. 60, no. 6. P. 84–90. DOI: 10.1145/3065386.

Alom M.Z., Taha T.M., Yakopcic C., et al. The history began from AlexNet: A comprehensive survey on deep learning approaches // CoRR. 2018. Vol. abs/1803.01164. arXiv: 1803.01164. URL: https://arxiv.org/abs/1803.01164.

Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition // 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015. Conference Track Proceedings // ed. by Y. Bengio, Y. LeCun. URL: https://arxiv.org/abs/1409.1556.

Szegedy C., Vanhoucke V., Ioffe S., et al. Rethinking the inception architecture for computer vision // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 27–30, 2016. IEEE, 2016. P. 2818–2826. DOI: 10.1109/CVPR.2016.308.

Lin M., Chen Q., Yan S. Network in network // 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014. Conference Track Proceedings / ed. by Y. Bengio, Y. LeCun. URL: https://arxiv.org/abs/1312.4400.

Chollet F. Xception: Deep learning with depthwise separable convolutions // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26, 2017. IEEE, 2017. P. 1251–1258. DOI: 10.1109/CVPR.2017.195.

Szegedy C., Liu W., Jia Y., et al. Going deeper with convolutions // 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 7–12, 2015. IEEE, 2015. P. 1–9. DOI: 10.1109/CVPR.2015.7298594.

Ioffe S., Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift // Proceedings of the 32nd International Conference on Machine Learning, ICML’15, Lille, France, July 7–9, 2015. Proceedings of Machine Learning Research. Vol. 37 / ed. by F. Bach, D. Blei. PMLR, 2015. P. 448–456. URL: https: //proceedings.mlr.press/v37/ioffe15.html.

Kingma D.P., Ba J. Adam: A method for stochastic optimization // 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015. Conference Track Proceedings / ed. by Y. Bengio, Y. LeCun. URL: https://arxiv.org/abs/1412.6980.

He K., Zhang X., Ren S., Sun J. Identity mappings in deep residual networks // European conference on computer vision – ECCV 2016, Amsterdam, The Netherlands, October 11–14, 2016. Proceedings, Part IV. Vol. 9908 / ed. by B. Leibe, J. Matas, N. Sebe, M. Welling. Springer, 2016. P. 630–645. Lecture Notes in Computer Science. DOI: 10.1007/978-3-319-46493-0_38.

PapersWithCode.com. Convolutional neural networks. URL: https://paperswithcode.com/methods/category/convolutional-neural-networks (accessed: 25.11.2022).

PapersWithCode.com. Most popular image models. URL: https://paperswithcode.com/methods/category/image-models (accessed: 25.11.2022).

Xie S., Girshick R., Dollar P., et al. Aggregated residual transformations for deep neural networks // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26, 2017. IEEE, 2017. P. 5987–5995. DOI: 10.1109/CVPR.2017.634.

Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. Densely connected convolutional networks // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26, 2017. IEEE, 2017. P. 2261–2269. DOI: 10.1109/CVPR.2017.243.

He T., Zhang Z., Zhang H., et al. Bag of tricks for image classification with convolutional neural networks // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 15–20, 2019. IEEE, 2020. P. 558–567. DOI: 10.1109/CVPR.2019.00065.

Kolesnikov A., Beyer L., Zhai X., et al. Big transfer (bit): General visual representation learning // European conference on computer vision – ECCV 2020, Glasgow, UK, August 23–28, 2020. Proceedings, Part V. Vol. 12350 / ed. by A. Vedaldi, H. Bischof, T. Brox, JM. Frahm. Springer, 2020. P. 491–507. Lecture Notes in Computer Science. DOI: 10.1007/978-3-030-58558-7_29.

Radosavovic I., Kosaraju R.P., Girshick R., et al. Designing network design spaces // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 13–19, 2020. IEEE, 2020. P. 10425–10433. DOI: 10.1109/CVPR42600.2020.01044.

Sandler M., Howard A., Zhu M., et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, June 18–23, 2018. IEEE, 2018. P. 4510–4520. DOI: 10.1109/CVPR.2018.00474.

Kyriakides G., Margaritis K. An introduction to neural architecture search for convolutional networks // CoRR. 2020. Vol. abs/2005.11074. arXiv: 2005.11074. URL: https://arxiv.org/abs/2005.11074.

He X., Zhao K., Chu X. AutoML: A survey of the state-of-the-art // Knowledge-Based Systems. 2021. Vol. 212. P. 106622. DOI: 10.1016/j.knosys.2020.106622.

Hu J., Shen L., Sun G. Squeeze-and-excitation networks // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020. Vol. 42, no. 8. P. 2011–2023. DOI: 10.1109/TPAMI.2019.2913372.

Tan M., Le Q. EfficientNet: Rethinking model scaling for convolutional neural networks // Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, June 9–15, 2019. Proceedings of Machine Learning Research. Vol. 97 / ed. by K. Chaudhuri, R. Salakhutdinov. PMLR, 2019. P. 6105–6114. URL: https://proceedings.mlr.press/v97/tan19a.html.

Dosovitskiy A., Beyer L., Kolesnikov A., et al. An image is worth 16x16 words: Transformers for image recognition at scale // 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. URL: https://arxiv.org/abs/2010.11929.

Khan S., Naseer M., Hayat M., et al. Transformers in vision: A survey // ACM Computing Surveys. 2021. Vol. 54, no. 10s. P. 1–41. DOI: 10.1145/3505244.

Vaswani A., Shazeer N., Parmar N., et al. Attention is all you need // 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, December 4–9, 2017. Advances in Neural Information Processing Systems. Vol. 30 / ed. by I. Guyon, U. von Luxburg, S. Bengio, et al. Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

Ba J.L., Kiros J.R., Hinton G.E. Layer normalization // CoRR. 2016. Vol. abs/1607.06450. arXiv: 1607.06450. URL: https://arxiv.org/abs/1607.06450.

Dai Z., Liu H., Le Q.V., Tan M. Coatnet: Marrying convolution and attention for all data sizes // 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Virtual, Online, December 6–14, 2021. Advances in Neural Information Processing Systems. Vol. 34 / ed. by M. Ranzato, A. Beygelzimer, Y. Dauphin, et al. Curran Associates, Inc., 2021. P. 3965–3977. URL: https://proceedings.neurips.cc/paper/2021/hash/20568692db622456cc42a2e853ca21f8-Abstract.html.

Tu Z., Talebi H., Zhang H., et al. MaxViT: Multi-axis Vision Transformer // 17th European conference on computer vision – ECCV 2022, Tel Aviv, Israel, October 23–27, 2022. Proceedings, Part XXIV. Vol. 13684 / ed. by S. Avidan, G. Brostow, M. Cisse, et al. Springer, 2022. P. 459–479. Lecture Notes in Computer Science. DOI: 10.1007/978-3-031-20053-3_27.

Mehta S., Rastegari M. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer // The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. URL: https://arxiv.org/abs/2110.02178.

Tolstikhin I.O., Houlsby N., Kolesnikov A., et al. MLP-Mixer: An all-MLP Architecture for Vision // 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Virtual, Online, December 6–14, 2021. Advances in Neural Information Processing Systems. Vol. 34 / ed. by M. Ranzato, A. Beygelzimer, Y. Dauphin, et al. Curran Associates, Inc., 2021. P. 24261–24272. URL: https://proceedings.neurips.cc/paper/2021/hash/cba0a4ee5ccd02fda0fe3f9a3e7b89fe-Abstract.html.

Touvron H., Cord M., Douze M., et al. Training data-efficient image transformers & distillation through attention // Proceedings of the 38th International Conference on Machine Learning, Virtual, July 18–24, 2021. Proceedings of Machine Learning Research. Vol. 139 / ed. by M. Meila, T. Zhang. PMLR, 2021. P. 10347-10357. URL: https://proceedings.mlr.press/v139/touvron21a.

Hospedales T.M., Antoniou A., Micaelli P., Storkey A.J. Meta-learning in neural networks: A survey // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021. Vol. 44, no. 9. P. 5149–5169. DOI: 10.1109/TPAMI.2021.3079209.

Naveed H. Survey: Image mixing and deleting for data augmentation // CoRR. 2023. Vol. abs/2106.07085. arXiv: 2106.07085. URL: https://arxiv.org/abs/2106.07085.

PapersWithCode.com. Image Classification on ImageNet. URL: https://paperswithcode. com/sota/image-classification-on-imagenet (accessed: 25.11.2022).

Bolya D., Zhou C., Xiao F., Lee Y.J. Yolact: Real-time instance segmentation // 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), October 27 – November 2, 2019. IEEE, 2020. P. 9157–9166. DOI: 10.1109/ICCV.2019.00925.

Cheng T., Wang X., Chen S., et al. Sparse Instance Activation for Real-Time Instance Segmentation // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, June 18–24, 2022. IEEE, 2022. P. 4433–4442. DOI: 10.1109/CVPR52688.2022.00439.

Wang C.-Y., Bochkovskiy A., Liao H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June 20–22, 2023. IEEE, 2023. P. 7464–7475.

Bolya D., Zhou C., Xiao F., Lee Y.J. Yolact++: Better real-time instance segmentation // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022. Vol. 44, no. 2. P. 1108–1121. DOI: 10.1109/TPAMI.2020.3014297.

Wang X., Zhang R., Kong T., et al. Solov2: Dynamic and fast instance segmentation // Annual Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual, December 6–12, 2020. Advances in Neural Information Processing Systems. Vol. 33 / ed. by H. Larochelle, and M. Ranzato, R. Hadsell, et al. Curran Associates, Inc., 2020. P. 17721–17732. URL: https://proceedings.neurips.cc/paper/2020/hash/cd3afef9b8b89558cd56638c3631868a-Abstract.html.

Wang X., Kong T., Shen C., et al. Solo: Segmenting objects by locations // European conference on computer vision – ECCV 2020, Glasgow, UK, August 23–28, 2020. Proceedings, Part V. Vol. 12350 / ed. by A. Vedaldi, H. Bischof, T. Brox, JM. Frahm. Springer, 2020. P. 649–665. Lecture Notes in Computer Science. DOI: https://doi.org/10.1007/978-3-030-58523-5_38.

Li C., Li L., Jiang H., et al. YOLOv6: A single-stage object detection framework for industrial applications // CoRR. 2022. Vol. abs/2209.02976. arXiv: 2209.02976. URL: https://arxiv.org/abs/2209.02976.

Jocher G. Ultralytics YOLOv8. URL: https://github.com/ultralytics/ultralytics (accessed: 20.04.2023).

Diwan T., Anirudh G., Tembhurne J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications // Multimedia Tools and Applications. 2023. Vol. 82. P. 9243–9275. DOI: 10.1007/s11042-022-13644-y.

Jiang P., Ergu D., Liu F., et al. A Review of Yolo algorithm developments // Procedia Computer Science. 2022. Vol. 199. P. 1066–1073. DOI: 10.1016/j.procs.2022.01.135.

Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 27–30, 2016. IEEE, 2016. P. 779–788. DOI: 10.1109/CVPR.2016.91.

Bochkovskiy A., Wang C.-Y., Liao H.-Y.M. YOLOv4: Optimal speed and accuracy of object detection // CoRR. 2020. Vol. abs/2004.10934. arXiv: 2004.10934. URL: https://arxiv.org/abs/2004.10934.

Tan M., Le Q. EfficientNetv2: Smaller models and faster training // Proceedings of the 38th International Conference on Machine Learning, Virtual, July 18–24, 2021. Proceedings of Machine Learning Research. Vol. 139 / ed. by M. Meila, T. Zhang. PMLR, 2021. P. 10096–10106. URL: http://proceedings.mlr.press/v139/tan21a.html.

Brock A., De S., Smith S.L., Simonyan K. High-performance large-scale image recognition without normalization // Proceedings of the 38th International Conference on Machine Learning, Virtual, July 18–24, 2021. Proceedings of Machine Learning Research. Vol. 139 / ed. by M. Meila, T. Zhang. PMLR, 2021. P. 1059–1071. URL: https://proceedings.mlr.press/v139/brock21a.html.

PapersWithCode.com. Real-Time Object Detection. URL: https://paperswithcode.com/task/real-time-object-detection (accessed: 25.11.2022).

Chen K., Pang J., Wang J., et al. Hybrid Task Cascade for Instance Segmentation // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 15–20, 2019. IEEE, 2020. P. 4974–4983. DOI: 10.1109/CVPR.2019.00511.

Chen L.-C., Zhu Y., Papandreou G., et al. Encoder-decoder with atrous separable convolution for semantic image segmentation // European conference on computer vision – ECCV 2018, Munich, Germany, September 8–14, 2018. Proceedings, Part VII. Vol. 11211 / ed. by V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss. Springer, 2018. P. 801–818. Lecture Notes in Computer Science. DOI: 10.1007/978-3-030-01234-2_49.

Intel. Open Vino. 2023. URL: https://docs.openvino.ai/2023.0/home.html (accessed: 23.08.2023).

Georgia Tech and Facebook Artificial Intelligence Research. NNpack acceleration package for neural networks on multi-core CPUs. 2022. URL: https://github.com/Maratyszcza/NNPACK (accessed: 01.10.2022).

Intel. oneDNN Intel math kernel library for deep neural networks (Intel MKL-DNN) and deep neural network library (DNNL). 2022. URL: https://github.com/oneapi-src/oneDNN (accessed: 01.10.2022).

Hadjis S., Abuzaid F., Zhang C., Re C. Caffe con Troll: Shallow Ideas to Speed Up Deep Learning // DanaC’15: Proceedings of the Fourth Workshop on Data analytics in the Cloud, Melbourne, Australia, May 31 – June 4, 2015 / ed. by A. Katsifodimos. New York: ACM, 2015. P. 1–4. DOI: 10.1145/2799562.2799641.

Dai, J. J., Ding, D., Shi, D., et al. BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, June 18–24, 2022. IEEE, 2022. P. 21407–21414. DOI: 10.1109/CVPR52688.2022.02076.

Capra M., Bussolino B., Marchisio A., et al. An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks // Future Internet. 2020. Vol. 12, no. 7. P. 113. DOI: 10.3390/fi12070113.

Dumas II J.D. Computer architecture: Fundamentals and principles of computer design. CRC Press, 2017. 447 p.

NVIDIA Corporation. Artificial Neural Network. 2022. URL: https://developer.nvidia.com/discover/artificial-neural-network (accessed: 01.10.2022).

NVIDIA Corporation. TensorRT SDK. 2023. URL: https://developer.nvidia.com/tensorrt (accessed: 08.23.2023).

Gholami A., Kim S., Dong Z., et al. A Survey of Quantization Methods for Efficient Neural Network Inference. Low-Power Computer Vision. New York: Chapman and Hall/CRC, 2022, P. 291–326. DOI: 10.1201/9781003162810-13.

Kikuchi Y., Fujita K., Ichimura T., et al. Calculation of Cross-correlation Function Accelerated by Tensor Cores with TensorFloat-32 Precision on Ampere GPU // 22nd International Conference – ICCS 2022, London, UK, June 11–23, 2022. Proceedings, Part II. Vol. 13351 / ed. by D. Groen, C. de Mulatier, M. Paszynski, et al. Springer, 2022. P. 277–290. Lecture Notes in Computer Science. DOI: 10.1007/978-3-031-08754-7_37.

Burel S., Evans A., Anghel L. Zero-Overhead Protection for CNN Weights // 2021 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Athens, Greece, October 6–8, 2021. IEEE, 2021. P. 1–6. DOI: 10.1109/DFT52944.2021.9568363.

Simons T., Lee D.-J. A Review of Binarized Neural Networks // Electronics. 2019. Vol. 8, no. 6. P. 661. DOI: 10.3390/electronics8060661.

Wang Y., Feng B., Ding Y. QGTC: Accelerating Quantized Graph Neural Networks via GPU Tensor Core // PPoPP’22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2–6, 2022. New York: ACM, 2022. P. 107–119. DOI: 10.1145/3503221.3508408.

Feng B., Wang Y., Geng T., et al. APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores // SC ’21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, USA, November 14–19, 2021. New York: ACM, 2021. P. 1–12. DOI: 10.1145/3458817.3476157.

Alemdar H., Leroy V., Prost-Boucle A., Petrot F. Ternary neural networks for resourceefficient AI applications // 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, May 14–19, 2017. IEEE, 2017. P. 2547–2554. DOI: 10.1109/IJCNN.2017.7966166.

Nurvitadhi E., Venkatesh G., Sim J., et al. Can FPGAs Beat GPUs in Accelerating Next- Generation Deep Neural Networks? // FPGA’17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, February 22-24, 2017. New York: ACM, 2017. P. 5–14. DOI: 10.1145/3020078.3021740.

Li Z., Wallace E., Shen S., et al. Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers // Proceedings of the International Conference on Machine Learning, Virtual, July 13–18, 2020. Proceedings of Machine Learning Research. Vol. 119 / ed. by H. Daume III, A. Singh. PMLR, 2020. P. 5958–5968. URL: https://proceedings.mlr.press/v119/li20m.html.

Qiu J., Wang J., Yao S., et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network // FPGA’16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, February 21–23, 2016. New York: ACM, 2016. P. 26–35. DOI: 10.1145/2847263.2847265.

Li C., Yang Y., Feng M., et al. Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs // SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, USA, November 13–18, 2016. IEEE, 2016. P. 633–644. DOI: 10.1109/SC.2016.53.

NVIDIA Corporation. Mixed-Precision Programming with CUDA 8. 2016. URL: https://developer.nvidia.com/blog/mixed-precision-programming-cuda-8/ (accessed: 01.10.2022).

Anzt H., Tsai Y.M., Abdelfattah A., et al. Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations // 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Virtual, November 12, 2020. IEEE, 2021. P. 26–38. DOI: 10.1109/PMBS51919.2020.00009.

Tian R., Zhao Z., Liu W., et al. SAMP: A Toolkit for Model Inference with Self-Adaptive Mixed-Precision // CoRR. 2022. Vol. abs/2209.09130. arXiv: 2209.09130. URL: https://arxiv.org/abs/2209.09130.

Linux Foundation. Automatic Mixed Precision package - torch.amp. 2022. URL: https: //pytorch.org/docs/stable/amp.html (accessed: 01.10.2022).

Honka T. Automatic Mixed Precision Quantization of Neural Networks using Iterative Correlation Coefficient Adaptation: PhD thesis / Honka Tapio. Tampere University, Finland, 2021. URL: https://trepo.tuni.fi/handle/10024/135952.

Liang T., Glossner J., Wang L., et al. Pruning and quantization for deep neural network acceleration: A survey // Neurocomputing. 2021. Vol. 461. P. 370–403. DOI: 10.1016/j.neucom.2021.07.045.

Wimmer P., Mehnert J., Condurache A.P. Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey // Artificial Intelligence Review. 2023. DOI: 10.1007/s10462-023-10489-1.

Sun W., Li A., Geng T., et al. Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors // IEEE Transactions on Parallel and Distributed Systems. 2023. Vol. 34, no. 1. P. 246–261. DOI: 10.1109/TPDS.2022.3217824.

Wang Y., Yang C., Farrell S., et al. Time-Based Roofline for Deep Learning Performance Analysis // 2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS), Atlanta, GA, USA, November 11, 2020. IEEE, 2020. P. 10–19. DOI: 10.1109/DLS51937.2020.00007.

Li Y., Liu Z., Xu K., et al. A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks // ACM Journal on Emerging Technologies in Computing Systems. 2018. Vol. 14, no. 2. P. 1–16. DOI: 10.1145/3154839.

Wu R., Guo X., Du J., Li J. Accelerating Neural Network Inference on FPGABased Platforms — A Survey // Electronics. 2021. Vol. 10, no. 9. P. 1025. DOI: 10.3390/electronics10091025.

Habib G., Qureshi S. Optimization and acceleration of convolutional neural networks: A survey // Journal of King Saud University — Computer and Information Sciences. 2022. Vol. 34, no. 7. P. 4244–4268. DOI: https://doi.org/10.1016/j.jksuci.2020.10.004.

Mittal S. A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform // Journal of Systems Architecture. 2019. Vol. 97. P. 428–442. DOI: https://doi.org/10.1016/j.sysarc.2019.01.011.

Xu W., Zhang Y., Tang X. Parallelizing DNN Training on GPUs: Challenges and Opportunities // WWW’21: Companion Proceedings of the Web Conference 2021, Ljubljana, Slovenia, April 19–23, 2021. New York: ACM, 2021. P. 174–178. DOI: 10.1145/3442442.3452055.

Le Q.V., Ngiam J., Coates A., et al. On optimization methods for deep learning // ICML’11: Proceedings of the 28th International Conference on International Conference on Machine Learning, Bellevue, WA, USA, June 28 – July 2, 2011. Madison, WI, USA: Omnipress, 2011. P. 265–272. DOI: 10.5555/3104482.3104516.

Shamir O. Without-Replacement Sampling for Stochastic Gradient Methods // 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 5–10, 2016. Advances in Neural Information Processing Systems. Vol. 29 / ed. by D. Lee, M. Sugiyama, U. Luxburg, et al. Curran Associates, Inc., 2016. URL: https://proceedings.neurips.cc/paper/2016/hash/c74d97b01eae257e44aa9d5bade97baf-Abstract.html.

Wei J., Zhang X., Ji Z., et al. Deploying and scaling distributed parallel deep neural networks on the Tianhe-3 prototype system // Scientific Reports. 2021. Vol. 11. P. 20244. DOI: 10.1038/s41598-021-98794-z.

Smith S.L., Le Q.V. A Bayesian Perspective on Generalization and Stochastic Gradient Descent // 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 – May 3, 2018. Conference Track Proceedings. URL: https://arxiv.org/abs/1710.06451.

Que C., Zhang X. Efficient Scheduling in Training Deep Convolutional Networks at Large Scale // IEEE Access. 2018. Vol. 6. P. 61452–61456. DOI: 10.1109/ACCESS.2018.2875407.

Xiang S., Li H. On the Effects of Batch andWeight Normalization in Generative Adversarial Networks // CoRR. 2017. Vol. abs/2005.11074. arXiv: 1704.03971. URL: https://arxiv.org/abs/1704.03971.

Gitman I., Ginsburg B. Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification // CoRR. 2017. Vol. abs/1709.08145. arXiv: 1709.08145. URL: https://arxiv.org/abs/1709.08145.

Dukler Y., Gu Q., Montufar G. Optimization Theory for ReLU Neural Networks Trained with Normalization Layers // Proceedings of the International Conference on Machine Learning, Virtual, July 13–18, 2020. Proceedings of Machine Learning Research. Vol. 119 / ed. by H. Daume III, A. Singh. PMLR, 2020. P. 2751–2760. URL: https://proceedings.mlr.press/v119/dukler20a.html.

Yu H., Yang S., Zhu S. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning // Proceedings of the AAAI Conference on Artificial Intelligence AAAI-19, Honolulu, HI, USA, January 27 – February 1, 2019. Vol. 33, no. 1. Palo Alto, CA, USA: AAAI Press, 2019. P. 5693–5700. DOI: 10.1609/aaai.v33i01.33015693.

Xu J., Wang J., Qi Q., et al. Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster // Journal of Grid Computing. 2021. Vol. 19. P. 8. DOI: 10.1007/s10723-021-09550-6.

Si T.N., Van Hung T., Ngoc D.V., Le Q.N. Using Stochastic Gradient Descent On Parallel Recommender System with Stream Data // 2022 IEEE/ACIS 7th International Conference on Big Data, Cloud Computing, and Data Science (BCD), Danang, Vietnam, August 4–6, 2022. IEEE, 2022. P. 88–93. DOI: 10.1109/BCD54882.2022.9900664.

Sukanya J., Gandhi K.R., Palanisamy V. An assessment of machine learning algorithms for healthcare analysis based on improved MapReduce // Advances in Engineering Software. 2022. Vol. 173. P. 103285. DOI: 10.1016/j.advengsoft.2022.103285.

Asadianfam S., Shamsi M., Kenari A.R. TVD-MRDL: traffic violation detection system using MapReduce-based deep learning for large-scale data // Multimedia Tools and Applications. 2021. Vol. 80, no. 2. P. 2489–2516. DOI: 10.1007/s11042-020-09714-8.

Kul S., Sayar A. Sentiment Analysis Using Machine Learning and Deep Learning on Covid 19 Vaccine Twitter Data with Hadoop MapReduce // Proceedings of the 6th International Conference on Smart City Applications (SCA2021), Virtual Safranbolu, Turkey, October 27–29, 2021. Innovations in Smart Cities Applications Volume 5. Vol. 393 / ed. by M. Ben Ahmed, A. A. Boudhir, I. R. Karas, , et al. Springer, 2022. P. 859–868. Lecture Notes in Computer Science. DOI: 10.1007/978-3-030-94191-8_69.

Snir M., Otto S. W., Huss-Lederman S., et al. MPI: The complete reference: The MPI core. Cambridge, MA, USA: MIT Press, 1998. 427 p.

Thao Nguyen T., Wahib M., Takano R. Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads // 2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW), Takayama, Japan, November 27–30, 2018. IEEE, 2018. P. 216–222. DOI: 10.1109/CANDARW.2018.00048.

Awan A.A., Bedorf J., Chu C.-H., et al. Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation // 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Larnaca, Cyprus, May 14–17, 2019. IEEE, 2019. P. 498–507. DOI: 10.1109/CCGRID.2019.00064.

Bhagirath, Mittal N., Kumar S. Machine Learning Computation on Multiple GPU’s using CUDA and Message Passing Interface // 2019 2nd International Conference on Power Energy, Environment and Intelligent Control (PEEIC), Greater Noida, India, October 18–19, 2019. IEEE, 2020. P. 18–22. DOI: 10.1109/PEEIC47157.2019.8976714.

Ghazimirsaeed S.M., Anthony Q., Shafi A., et al. Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR // 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S), Virtual, November 9–19, 2020. IEEE, 2021. P. 1–12. DOI: 10.1109/MLHPCAI4S51975.2020.00010.

Linux Foundation. Distributed Data Parallel - torch.nn. 2022. URL: https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html (accessed: 01.10.2022).

Jia Z., Zaharia M., Aiken A. Beyond Data and Model Parallelism for Deep Neural Networks. // Machine Learning and Systems (MLSys 2019), Stanford, CA, USA, March 31 – April 2, 2019. Proceedings of Machine Learning and Systems, Vol. 1 / ed. by A. Talwalkar, V. Smith, M. Zaharia. MLSYS, 2019. P. 1–13. URL: https://proceedings.mlsys.org/paper_files/paper/2019/hash/b422680f3db0986ddd7f8f126baaf0fa-Abstract.html.

Xu A., Huo Z., Huang H. On the Acceleration of Deep Learning Model Parallelism With Staleness // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 13–19, 2020. IEEE, 2020. P. 2085–2094. DOI: 10.1109/CVPR42600.2020.00216.

Ericson L., Mbuvha R. On the Performance of Network Parallel Training in Artificial Neural Networks // CoRR. 2017. Vol. abs/1701.05130. arXiv: 1701.05130. URL: https://arxiv.org/abs/1701.05130.

Chen C.-C., Yang C.-L., Cheng H.-Y. Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform // CoRR. 2018. Vol. abs/1809.02839. arXiv: 1809.02839. URL: https://arxiv.org/abs/1809.02839.

Bruna J., Zaremba W., Szlam A., LeCun Y. Spectral Networks and Locally Connected Networks on Graphs // 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014. Conference Track Proceedings / ed. by Y. Bengio, Y. LeCun. URL: http://arxiv.org/abs/1312.6203.

Chen Y.-h., Moreno I.L., Sainath T., et al. Locally-connected and convolutional neural networks for small footprint speaker recognition // Proceedings of 16th Annual Conference of the International Speech Communication Association (INTERSPEECH-2015), Dresden, Germany, September 6-10, 2015. ISCA, 2015. P. 1136–1140. DOI: 10.21437/Interspeech.2015-297.

Wadekar S.N. Locally connected neural networks for image recognition: PhD thesis / Wadekar Shakti Nagnath. Purdue University Graduate School, West Lafayette, IN, USA, 2019. URL: https://hammer.purdue.edu/articles/thesis/LOCALLY_CONNECTED_NEURAL_NETWORKS_FOR_IMAGE_RECOGNITION/11328404/1.

Ankile L.L., Heggland M.F., Krange K. Deep Convolutional Neural Networks: A survey of the foundations, selected improvements, and some current applications // CoRR. 2020. Vol. abs/2011.12960. arXiv: 2011.12960. URL: https://arxiv.org/abs/2011.12960.

Coates A., Huval B.,Wang T., et al. Deep learning with COTS HPC systems // Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, June 17– 19, 2013. Proceedings of Machine Learning Research. Vol. 28, no. 3 / ed. by S. Dasgupta, D. McAllester. PMLR, 2013. P. 1337–1345. URL: https://proceedings.mlr.press/v28/ coates13.html.

Girones R.G., Salcedo A.M. Forward-backward parallelism in on-line backpropagation // International Work-Conference on Artificial and Natural Neural Networks, IWANN’99, Alicante, Spain, June 2–4, 1999. Proceedings, Volume II. Vol. 1607 / ed. by J. Mira, J.V. Sanchez-Andr´es. Springer, 1999. P. 157–165. Lecture Notes in Computer Science. DOI: 10.1007/BFb0100482.

Xiao D., Yang C., Wu W. Efficient DNN training based on backpropagation parallelization // Computing. 2022. Vol. 104, P. 2431–2451. DOI: 10.1007/s00607-022-01094-1.

Zhang H., Dai Y., Li H., Koniusz P. Deep Stacked Hierarchical Multi-Patch Network for Image Deblurring // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 15–20, 2019. IEEE, 2020. P. 5971–5979. DOI: 10.1109/CVPR.2019.00613.

Xiang Y., Kim H. Pipelined data-parallel CPU/GPU scheduling for multi-DNN realtime inference // 2019 IEEE Real-Time Systems Symposium (RTSS), Hong Kong, China, December 3–6, 2019. IEEE, 2020. P. 392–405. DOI: 10.1109/RTSS46320.2019.00042.

Shi S., Tang Z., Wang Q., et al. Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees // 24th European Conference on Artificial Intelligence (ECAI 2020), Santiago de Compostela, Spain, August 29 – September 8, 2020. Frontiers in Artificial Intelligence and Applications. Vol. 325 / ed. by G. De Giacomo, A. Catala, B. Dilkina, et al. Amsterdam, The Netherlands: IOS Press, 2020. P. 1467–1474. DOI: 10.3233/FAIA200253.

Harlap A., Narayanan D., Phanishayee A., et al. PipeDream: Fast and Efficient Pipeline Parallel DNN Training // CoRR. 2018. Vol. abs/1806.03377. arXiv: 1806.03377. URL: https://arxiv.org/abs/1806.03377.

Huang Y., Cheng Y., Bapna A., et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism // Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, December 8–14, 2019. Advances in Neural Information Processing Systems. Vol. 32 / ed. by H. Wallach, H. Larochelle, A. Beygelzimer, et al. Curran Associates, Inc., 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/093f65e080a295f8076b1c5722a46aa2-Abstract.html.

Krizhevsky A., Sutskever I., Hinton G.E. ImageNet Classification with Deep Convolutional Neural Networks // Communications of the ACM. 2017. Vol. 60, no. 6. P. 84–90. DOI: 10.1145/3065386.

Sun P., Feng W., Han R., et al. Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes // CoRR. 2019. Vol. abs/1902.06855. arXiv: 1902.06855. URL: https://arxiv.org/abs/1902.06855.

Gaunt A.L., Johnson M.A., Riechert M., et al. AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks // CoRR. 2017. Vol. abs/1705.09786. arXiv: 1705.09786. URL: https://arxiv.org/abs/1705.09786.

Dean J., Corrado G., Monga R., et al. Large Scale Distributed Deep Networks // Conference on Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, USA, December 3–8, 2012. Advances in Neural Information Processing Systems. Vol. 25 / ed. by F. Pereira, C. Burges, L. Bottou, K. Weinberger. Curran Associates, Inc., 2012. URL: https://proceedings.neurips.cc/paper/2012/hash/6aca97005c68f1206823815f66102863-Abstract.html.

Lee S., Jha D., Agrawal A., et al. Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication // 2017 IEEE 24th International Conference on High Performance Computing (HiPC), Jaipur, India, December 18–21, 2017. IEEE, 2018. P. 183–192. DOI: 10.1109/HiPC.2017.00030.

Hu Y., Liu Y., Liu Z. A Survey on Convolutional Neural Network Accelerators: GPU, FPGA and ASIC // 2022 14th International Conference on Computer Research and Development (ICCRD), Shenzhen, China, January 7–9, 2022. IEEE, 2022. P. 100–107. DOI: 10.1109/ICCRD54409.2022.9730377.

Kim J.-Y. Chapter Five - FPGA based neural network accelerators // Advances in Computers (Vol. 122): Hardware Accelerator Systems for Artificial Intelligence and Machine Learning. Elsevier, 2021. P. 135–165. DOI: 10.1016/bs.adcom.2020.11.002.

Mittal S., Vibhu. A survey of accelerator architectures for 3D convolution neural networks // Journal of Systems Architecture. 2021. Vol. 115. P. 102041. DOI: 10.1016/j.sysarc.2021.102041.

Omondi A.R., Rajapakse J.C. FPGA implementations of neural networks. Dordrecht, The Netherlands: Springer, 2006. 365 p.

Reagen B., Adolf R., Whatmough P., et al. Deep learning for computer architects. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, 2017. 123 p. DOI: https://doi.org/10.2200/S00783ED1V01Y201706CAC041.

Genc H., Kim S., Amid A., et al. Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration // 2021 58th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, December 5–9, 2021. IEEE, 2021. P. 769–774. DOI: 10.1109/DAC18074.2021.9586216.

Ding W., Huang Z., Huang Z., et al. Designing efficient accelerator of depthwise separable convolutional neural network on FPGA // Journal of Systems Architecture. 2019. Vol. 97. P. 278–286. DOI: 10.1016/j.sysarc.2018.12.008.

Hu Y.H., Kung S.-Y. Systolic Arrays // Handbook of Signal Processing Systems / ed. by S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala. Cham: Springer International Publishing, 2019. P. 939–977. DOI: 10.1007/978-3-319-91734-4_26.

Liu Z.-G., Whatmough P.N., Mattina M. Systolic Tensor Array: An Efficient Structured- Sparse GEMM Accelerator for Mobile CNN Inference // IEEE Computer Architecture Letters. 2020. Vol. 19, no. 1. P. 34–37. DOI: 10.1109/LCA.2020.2979965.

Chen K.-C., Ebrahimi M., Wang T.-Y., Yang Y.-C. NoC-Based DNN Accelerator: A Future Design Paradigm // NOCS ’19: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, New York, NY, USA, October 17–18, 2019. New York: ACM, 2019. P. 1–8. DOI: 10.1145/3313231.3352376.

Sun X., Choi J., Chen C.-Y., et al. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks // Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, December 8–14, 2019. Advances in Neural Information Processing Systems. Vol. 32 / ed. by H. Wallach, H. Larochelle, A. Beygelzimer, et al. Curran Associates, Inc., 2019. P. 4900–4909. URL: https://proceedings.neurips.cc/paper/2019/hash/65fc9fb4897a89789352e211ca2d398f-Abstract.html.

Jia X., Song S., He W., et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes // CoRR. 2018. Vol. abs/1807.11205. arXiv: 1807.11205. URL: https://arxiv.org/abs/1807.11205.

Yang J.A., Huang J., Park J., et al. Mixed-Precision Embedding Using a Cache // CoRR. 2020. Vol. abs/2010.11305. arXiv: 2010.11305. URL: https://arxiv.org/abs/2010.11305.

Courbariaux M., Hubara I., Soudry D., et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 // CoRR. 2016. Vol. abs/1602.02830. arXiv: 1602.02830. URL: https://arxiv.org/abs/1602.02830.

Nour B., Cherkaoui S. How Far Can We Go in Compute-less Networking: Computation Correctness and Accuracy // IEEE Network. 2022. Vol. 36, no. 4. P. 197–202. DOI: 10.1109/MNET.012.2100157.




DOI: http://dx.doi.org/10.14529/cmse230401