Анализ производительности вывода моделей глубокого обучения на плате Banana Pi BPI-F3 на примере задачи классификации изображений
Аннотация
Ключевые слова
Полный текст:
PDFЛитература
Noor M.H.M., Ige A.O. A survey on state-of-the-art deep learning applications and challenges. Engineering Applications of Artificial Intelligence. 2025. Vol. 159, Part B. P. 111225. DOI: 10.1016/j.engappai.2025.111225.
Mezger B.W., et al. A Survey of the RISC-V Architecture Software Support. IEEE Access. 2022. Vol. 10. P. 51394–51411. DOI: 10.1109/ACCESS.2022.3174125.
He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016. P. 770–778. DOI: 10.1109/CVPR.2016.90.
Sandler M., et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 18–22, 2018. P. 4510–4520. DOI: 10.1109/CVPR.2018.00474.
Alibekov M.R., et al. Performance analysis methodology of deep neural networks inference on the example of an image classification problem. Numerical Methods and Programming. 2024. Vol. 25, no. 2. P. 127–141. (in Russian) DOI: 10.26089/NumMet.v25r211.
Demidovskij A., et al. OpenVINO Deep Learning Workbench: Comprehensive Analysis and Tuning of Neural Networks Inference. ICCV Workshop, 2019. URL: https://openaccess.thecvf.com/content_ICCVW_2019/papers/SDL-CV/Gorbachev_OpenVINO_Deep_Learning_Workbench_Comprehensive_Analysis_and_Tuning_of_Neural_ICCVW_2019_paper.pdf (accessed: 25.07.2025).
Arya M., Simmhan Y. A Preliminary Performance Analysis of LLM Inference on Edge Accelerators. 2024 IEEE 31st International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW), Bangalore, India, 2024. P. 183–184. DOI: 10.1109/HiPCW63042.2024.00069.
Verma G., et al. Performance Evaluation of Deep Learning Compilers for Edge Inference. 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Portland, OR, USA, 2021. P. 858–865. DOI: 10.1109/IPDPSW52791.2021.00128.
Chen Y.-R., et al. Experiments and optimizations for TVM on RISC-V Architectures with P Extension. 2020 International Symposium on VLSI Design, Automation and Test (VLSIDAT), Hsinchu, Taiwan, 2020. P. 1–4. DOI: 10.1109/VLSI-DAT49148.2020.9196477.
Christofas V., et al. Comparative Evaluation between Accelerated RISC-V and ARM AI Inference Machines. 2023 6th World Symposium on Communication Engineering (WSCE), Thessaloniki, Greece, 2023. P. 108–113. DOI: 10.1109/WSCE59557.2023.10365853.
Bhattacharjee D., et al. Full-Stack Evaluation of Machine Learning Inference Workloads for RISC-V Systems. RISC-V Summit 2024. URL: https://riscv-europe.org/summit/2024/media/proceedings/posters/58_poster.pdf (accessed: 12.10.2025).
Garcia A.M., et al. Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors. BigHPC2024: Special Track on Big Data and High-Performance Computing, co-located with the 3rd Italian Conference on Big Data and Data Science, ITADATA2024, Pisa, Italy, September 17–19, 2024. URL: https://ceur-ws.org/Vol-3785/paper110.pdf (accessed: 19.07.2025).
Martinez H., et al. Performance Analysis of BERT on RISC-V Processors with SIMD Units. High Performance Computing. ISC High Performance 2024 International Workshops. Vol. 15058 / eds. by M. Weiland, S. Neuwirth, C. Kruse, T. Weinzierl. Springer, 2024. P. 325–338. Lecture Notes in Computer Science. DOI: 10.1007/978-3-031-73716-9_23.
Suarez D., et al. Energy-Efficient Inference on RNN and LLM networks: A Quantized Evaluation on RISC-V, ARM, and x86 Devices. 16th ACM International Conference on Future and Sustainable Energy Systems (E-Energy ’25). Association for Computing Machinery, New York, NY, USA, 2025. P. 882–889. DOI: 10.1145/3679240.3735101.
Mukhin I., et al. Benchmarking Deep Learning Inference on RISC-V CPUs. Supercomputing. RuSCDays 2024. Vol. 15406 / eds. by V. Voevodin, A. Antonov, D. Nikitenko. Springer, 2025. P. 331–346. Lecture Notes in Computer Science. DOI: 10.1007/978-3-031-78459-0_24.
The official web site of the framework PyTorch. URL: https://pytorch.org (accessed: 19.07.2025).
The official web site of the framework TensorFlow Lite. URL: https://www.tensorflow.org/lite (accessed: 19.07.2025).
Chen T., et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. 13th USENIX Conference on Operating Systems Design and Implementation, Carlsbad, CA, USA, 2018. P. 579–594. DOI: 10.5555/3291168.3291211.
The official web site of the framework ExecuTorch. URL: https://pytorch.org/executorch-overview (accessed: 19.07.2025).
DLI: Deep Learning Inference Benchmark. GitHub Repo. URL: https://github.com/itlab-vision/dl-benchmark (accessed: 19.07.2025).
Kustikova V., et al. DLI: Deep Learning Inference Benchmark. Supercomputing. RuSCDays 2019. Vol. 1129 / eds. by V. Voevodin, S. Sobolev. Springer, 2019. P. 542–553. Communications in Computer and Information Science. DOI: 10.1007/978-3-030-36592-9_44.
Lin D., et al. Fixed Point Quantization of Deep Convolutional Networks. 33rd International Conference on Machine Learning, New York, USA, 2016. P. 2849–2858. URL: https://proceedings.mlr.press/v48/linb16.pdf (accessed: 19.07.2025).
Kozlov A., et al. Neural network compression framework for fast model inference. Intelligent Computing 2021. Vol. 285 / ed. by K. Arai. Springer, 2021. P. 213–232. Lecture Notes in Networks and Systems. DOI: 10.1007/978-3-030-80129-8_17.
The official web site of the framework ncnn. URL: https://github.com/Tencent/ncnn (accessed: 19.07.2025).
Deng J., et al. ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009. URL: https://www.image-net.org/static_files/papers/imagenet_cvpr09.pdf (accessed: 19.07.2025).
TorchVision: PyTorch’s Computer Vision library. 2016. URL: https://github.com/pytorch/vision (accessed: 19.07.2025).
TensorFlow Backend for ONNX. URL: https://github.com/onnx/onnx-tensorflow (accessed: 19.07.2025).
XNNPACK. High-efficiency floating-point neural network inference operators for mobile, server, and Web. URL: http://github.com/google/XNNPACK (accessed: 19.07.2025).
OpenVINO. OpenVINO is an open source toolkit for optimizing and deploying AI inference. URL: https://github.com/openvinotoolkit/openvino (accessed: 19.07.2025).
OpenVINO Toolkit – Open Model Zoo repository. Fork (branch ‘24.3.0/tvm’ for Apache TVM, branch ‘omz_executorch’ for ExecuTorch). URL: https://github.com/itlab-vision/open_model_zoo_tvm (accessed: 19.07.2025).
PyTorch RISC-V support. URL: https://discuss.pytorch.org/t/pytorch-risc-v-support/212065 (accessed: 19.07.2025).
Pirova A., et al. Performance optimization of BLAS algorithms with band matrices for RISC-V processors. Future Generation Computer Systems. 2026. Vol. 174. P. 107936. DOI: 10.1016/j.future.2025.107936.
DOI: http://dx.doi.org/10.14529/cmse250403




