Performance Analysis of Deep Learning Inference on the Banana Pi BPI-F3 Board Using the Image Classification Problem as an Example

Ivan S. Mukhin; Valentina D. Kustikova

doi:10.14529/cmse250403

Performance Analysis of Deep Learning Inference on the Banana Pi BPI-F3 Board Using the Image Classification Problem as an Example

Ivan S. Mukhin, Valentina D. Kustikova

Abstract

The paper analyzes the inference performance of the well-known neural networks ResNet-50 and MobileNetV2, which provide a solution for the problem of image classification, on the Banana Pi BPI-F3 board, which is built on the RISC-V architecture. The inference is launched by available frameworks: PyTorch, TensorFlow Lite, Apache TVM and ExecuTorch. The models are converted to the format of each target framework. The correctness of the problem solving is checked using the obtained neural networks. It is demonstrated that the accuracy indicators of image classification using these models correlate well with the published ones. Then, the optimal parameters for launching the inference for each framework and model are selected. A comparative analysis of the inference performance shows that ExecuTorch demonstrates the best results for both models. For the ResNet-50 model, the number of frames processed per second (FPS) varies from 2.649 to 3.339 fps with optimal parameters depending on the batch size of images processed in one forward pass through the network, for MobileNetV2 – from 11.26 to 29.96 fps. TensorFlow Lite is inferior to ExecuTorch by an average of ~ 2.1 times. PyTorch and Apache TVM demonstrate lower performance indicators. Probably, this is due to the fact that they are not fully optimized for the RISC-V architecture.

Keywords

deep learning; image classification; inference performance; PyTorch; TensorFlow Lite; Apache TVM; ExecuTorch; Banana Pi BPI-F3; RISC-V

Full Text:

PDF (Русский)

References

Noor M.H.M., Ige A.O. A survey on state-of-the-art deep learning applications and challenges. Engineering Applications of Artificial Intelligence. 2025. Vol. 159, Part B. P. 111225. DOI: 10.1016/j.engappai.2025.111225.

Mezger B.W., et al. A Survey of the RISC-V Architecture Software Support. IEEE Access. 2022. Vol. 10. P. 51394–51411. DOI: 10.1109/ACCESS.2022.3174125.

He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016. P. 770–778. DOI: 10.1109/CVPR.2016.90.

Sandler M., et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 18–22, 2018. P. 4510–4520. DOI: 10.1109/CVPR.2018.00474.

Alibekov M.R., et al. Performance analysis methodology of deep neural networks inference on the example of an image classification problem. Numerical Methods and Programming. 2024. Vol. 25, no. 2. P. 127–141. (in Russian) DOI: 10.26089/NumMet.v25r211.

Demidovskij A., et al. OpenVINO Deep Learning Workbench: Comprehensive Analysis and Tuning of Neural Networks Inference. ICCV Workshop, 2019. URL: https://openaccess.thecvf.com/content_ICCVW_2019/papers/SDL-CV/Gorbachev_OpenVINO_Deep_Learning_Workbench_Comprehensive_Analysis_and_Tuning_of_Neural_ICCVW_2019_paper.pdf (accessed: 25.07.2025).

Arya M., Simmhan Y. A Preliminary Performance Analysis of LLM Inference on Edge Accelerators. 2024 IEEE 31st International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW), Bangalore, India, 2024. P. 183–184. DOI: 10.1109/HiPCW63042.2024.00069.

Verma G., et al. Performance Evaluation of Deep Learning Compilers for Edge Inference. 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Portland, OR, USA, 2021. P. 858–865. DOI: 10.1109/IPDPSW52791.2021.00128.

Chen Y.-R., et al. Experiments and optimizations for TVM on RISC-V Architectures with P Extension. 2020 International Symposium on VLSI Design, Automation and Test (VLSIDAT), Hsinchu, Taiwan, 2020. P. 1–4. DOI: 10.1109/VLSI-DAT49148.2020.9196477.

Christofas V., et al. Comparative Evaluation between Accelerated RISC-V and ARM AI Inference Machines. 2023 6th World Symposium on Communication Engineering (WSCE), Thessaloniki, Greece, 2023. P. 108–113. DOI: 10.1109/WSCE59557.2023.10365853.

Bhattacharjee D., et al. Full-Stack Evaluation of Machine Learning Inference Workloads for RISC-V Systems. RISC-V Summit 2024. URL: https://riscv-europe.org/summit/2024/media/proceedings/posters/58_poster.pdf (accessed: 12.10.2025).

Garcia A.M., et al. Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors. BigHPC2024: Special Track on Big Data and High-Performance Computing, co-located with the 3rd Italian Conference on Big Data and Data Science, ITADATA2024, Pisa, Italy, September 17–19, 2024. URL: https://ceur-ws.org/Vol-3785/paper110.pdf (accessed: 19.07.2025).

Martinez H., et al. Performance Analysis of BERT on RISC-V Processors with SIMD Units. High Performance Computing. ISC High Performance 2024 International Workshops. Vol. 15058 / eds. by M. Weiland, S. Neuwirth, C. Kruse, T. Weinzierl. Springer, 2024. P. 325–338. Lecture Notes in Computer Science. DOI: 10.1007/978-3-031-73716-9_23.

Suarez D., et al. Energy-Efficient Inference on RNN and LLM networks: A Quantized Evaluation on RISC-V, ARM, and x86 Devices. 16th ACM International Conference on Future and Sustainable Energy Systems (E-Energy ’25). Association for Computing Machinery, New York, NY, USA, 2025. P. 882–889. DOI: 10.1145/3679240.3735101.

Mukhin I., et al. Benchmarking Deep Learning Inference on RISC-V CPUs. Supercomputing. RuSCDays 2024. Vol. 15406 / eds. by V. Voevodin, A. Antonov, D. Nikitenko. Springer, 2025. P. 331–346. Lecture Notes in Computer Science. DOI: 10.1007/978-3-031-78459-0_24.

The official web site of the framework PyTorch. URL: https://pytorch.org (accessed: 19.07.2025).

The official web site of the framework TensorFlow Lite. URL: https://www.tensorflow.org/lite (accessed: 19.07.2025).

Chen T., et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. 13th USENIX Conference on Operating Systems Design and Implementation, Carlsbad, CA, USA, 2018. P. 579–594. DOI: 10.5555/3291168.3291211.

The official web site of the framework ExecuTorch. URL: https://pytorch.org/executorch-overview (accessed: 19.07.2025).

DLI: Deep Learning Inference Benchmark. GitHub Repo. URL: https://github.com/itlab-vision/dl-benchmark (accessed: 19.07.2025).

Kustikova V., et al. DLI: Deep Learning Inference Benchmark. Supercomputing. RuSCDays 2019. Vol. 1129 / eds. by V. Voevodin, S. Sobolev. Springer, 2019. P. 542–553. Communications in Computer and Information Science. DOI: 10.1007/978-3-030-36592-9_44.

Lin D., et al. Fixed Point Quantization of Deep Convolutional Networks. 33rd International Conference on Machine Learning, New York, USA, 2016. P. 2849–2858. URL: https://proceedings.mlr.press/v48/linb16.pdf (accessed: 19.07.2025).

Kozlov A., et al. Neural network compression framework for fast model inference. Intelligent Computing 2021. Vol. 285 / ed. by K. Arai. Springer, 2021. P. 213–232. Lecture Notes in Networks and Systems. DOI: 10.1007/978-3-030-80129-8_17.

The official web site of the framework ncnn. URL: https://github.com/Tencent/ncnn (accessed: 19.07.2025).

Deng J., et al. ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009. URL: https://www.image-net.org/static_files/papers/imagenet_cvpr09.pdf (accessed: 19.07.2025).

TorchVision: PyTorch’s Computer Vision library. 2016. URL: https://github.com/pytorch/vision (accessed: 19.07.2025).

TensorFlow Backend for ONNX. URL: https://github.com/onnx/onnx-tensorflow (accessed: 19.07.2025).

XNNPACK. High-efficiency floating-point neural network inference operators for mobile, server, and Web. URL: http://github.com/google/XNNPACK (accessed: 19.07.2025).

OpenVINO. OpenVINO is an open source toolkit for optimizing and deploying AI inference. URL: https://github.com/openvinotoolkit/openvino (accessed: 19.07.2025).

OpenVINO Toolkit – Open Model Zoo repository. Fork (branch ‘24.3.0/tvm’ for Apache TVM, branch ‘omz_executorch’ for ExecuTorch). URL: https://github.com/itlab-vision/open_model_zoo_tvm (accessed: 19.07.2025).

PyTorch RISC-V support. URL: https://discuss.pytorch.org/t/pytorch-risc-v-support/212065 (accessed: 19.07.2025).

Pirova A., et al. Performance optimization of BLAS algorithms with band matrices for RISC-V processors. Future Generation Computer Systems. 2026. Vol. 174. P. 107936. DOI: 10.1016/j.future.2025.107936.

DOI: http://dx.doi.org/10.14529/cmse250403

Username
Password
Remember me

Series "Computational Mathematics and Software Engineering"

Performance Analysis of Deep Learning Inference on the Banana Pi BPI-F3 Board Using the Image Classification Problem as an Example

Abstract

Keywords

Full Text:

References