TASC Software for HPC Performance Analysis: Current State and Latest Developments
Аннотация
Ключевые слова
Полный текст:
PDF (English)Литература
Voevodin V.V., Shaikhislamov D.I., Nikitenko D.A. How to assess the quality of supercomputer resource usage. Supercomputing Frontiers and Innovations. 2022. Vol. 9, no. 3. P. 4–18. DOI: 10.14529/jsfi220301.
High Performance Computing Market Size to Surpass USD 64.65. URL: https://www.globenewswire.com/news-release/2022/04/04/2415844/0/en/High-Performance-Computing-Market-Size-to-Surpass-USD-64-65-Bn-by-2030.html (accessed: 14.08.2024).
High Performance Computing Market Size, Growth Report. URL: https://www.fortunebusinessinsights.com/industry-reports/high-performance-computing-hpc-and-high-performance-data-analytics-hpda-market-100636 (accessed: 14.08.2024).
Shvets P., Voevodin V., Zhumatiy S. Primary automatic analysis of the entire flow of supercomputer applications. CEUR Workshop Proceedings. 2018. P. 20–32.
Shvets P., Voevodin V. “Endless” Workload Analysis of Large-Scale Supercomputers. Lobachevskii Journal of Mathematics. 2021. Vol. 42. P. 184–194. DOI: 10.1134/s1995080221010236.
Voevodin V.V., Nikitenko D.A. Recurrent Monitoring of Supercomputer Noise. Supercomputing Frontiers and Innovations. 2023. Vol. 10, no. 3. P. 27–35. DOI: 10.14529/jsfi230304.
Jones M.D., White J.P., Innus M., et al. Workload Analysis of Blue Waters. 2017. DOI: 10.48550/arXiv.1703.00924. arXiv: 1703.00924.
Simakov N.A., White J.P., DeLeon R.L., et al. A Workload Analysis of NSF’s Innovative HPC Resources Using XDMoD. 2018. DOI: 10.48550/arXiv.1801.04306. arXiv: 1801.04306.
Hart D.L. Measuring TeraGrid: workload characterization for a high-performance computing federation. The International Journal of High Performance Computing Applications. 2011. Nov. Vol. 25, no. 4. P. 451–465. DOI: 10.1177/1094342010394382.
Patel T., Liu Z., Kettimuthu R., et al. Job characteristics on large-scale systems: longterm analysis, quantification, and implications. SC20: International conference for high performance computing, networking, storage and analysis. IEEE, 2020. P. 1–17. DOI: 10.1109/SC41405.2020.00088.
Kostenetskiy P., Shamsutdinov A., Chulkevich R., et al. HPC TaskMaster-Task Efficiency Monitoring System for the Supercomputer Center. International Conference on Parallel Computational Technologies. Springer, 2022. P. 17–29. DOI: 10.1007/978-3-031-11623-0_2.
Isakov M., Del Rosario E., Madireddy S., et al. HPC I/O throughput bottleneck analysis with explainable local models. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020. P. 1–13. DOI: 10.1109/SC41405.2020.00037.
Netti A., Shin W., Ott M., et al. A conceptual framework for HPC operational data analytics. 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021. P. 596–603. DOI: 10.1109/Cluster48925.2021.00086.
Ott M., Shin W., Bourassa N., et al. Global experiences with HPC operational data measurement, collection and analysis. 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2020. P. 499–508. DOI: 10.1109/CLUSTER49012.2020.00071.
Voevodin V.V., Antonov A.S., Nikitenko D.A., et al. Supercomputer Lomonosov-2: large scale, deep monitoring and fine analytics for the user community. Supercomputing Frontiers and Innovations. 2019. Vol. 6, no. 2. P. 4–11. DOI: 10.14529/jsfi190201.
Stefanov K., Voevodin V., Zhumatiy S., Voevodin V. Dynamically reconfigurable distributed modular monitoring system for supercomputers (DiMMon). Procedia Computer Science. 2015. Vol. 66. P. 625–634. DOI: 10.1016/j.procs.2015.11.071.
Agrawal K., Fahey M.R., McLay R., James D. User Environment Tracking and Problem Detection with XALT. 2014 First International Workshop on HPC User Support Tools. IEEE, Nov. 2014. P. 32–40. DOI: 10.1109/HUST.2014.6.
Nikitenko D., Zhumatiy S., Paokin A., et al. Evolution of the Octoshell HPC center management system. International Conference on Parallel Computational Technologies. Springer, 2019. P. 19–33. DOI: 10.1007/978-3-030-28163-2_2.
Hoefler T., Mehlan T., Lumsdaine A., Rehm W. Netgauge: A network performance measurement framework. International Conference on High Performance Computing and Communications. Springer, 2007. P. 659–671.
Netgauge - Operating System Noise Measurement. URL: https://htor.inf.ethz.ch/research/netgauge/osnoise/ (accessed: 25.09.2024).
Top-down Microarchitecture Analysis Method. URL: https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitectureanalysis-method.html#GUID-FEA77CD8-F9F1-446A-8102-07D3234CDB68 (accessed: 14.08.2024).
Voevodin V., Stefanov K., Zhumatiy S. Overhead analysis for performance monitoring counters multiplexing. Russian Supercomputing Days. Springer, 2022. P. 461–474. DOI: 10.1007/978-3-031-22941-1_34.
Abraham M.J., Murtola T., Schulz R., et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015. Sept. Vol. 1–2. P. 19–25. DOI: 10.1016/j.softx.2015.06.001.
Thompson A.P., Aktulga H.M., Berger R., et al. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comp. Phys. Comm. 2022. Vol. 271. P. 108171. DOI: 10.1016/j.cpc.2021.108171.
DOI: http://dx.doi.org/10.14529/cmse240304