Methods and Tools for Organizing the Global Job Queue in the Geographically Distributed Computing System
Abstract
The geographically distributed computing infrastructure (DCI) considered in the paper includes high performance computing systems united by communication channels. Computing systems from the DCI are high-performance clusters differing in architecture and performance. Communication channels uniting clusters have different reliability and bandwidth. The considered model of DCI has a decentralized jobs management and dispatching scheme. This scheme implies that at any time malfunction of any computing cluster or a failure in the communication channel can cause cluster’s leaving the DCI. Cluster’s or channel’s troubleshooting means dynamically connecting the cluster to the DCI. The global job queue is organized in this computing infrastructure. Computing jobs have absolute priorities, and high priority job can interrupt low priority running jobs. Jobs from the global queue allocate on idle resources of computing systems. Forming and storing global job queue in conditions of dynamically changing DCI composition needs the reliable information system. The authors reviewed some distributed DBMSs as the basis of this information system. The article outlines the requirements for a distributed information system. The authors conducted a comparative analysis and selected a solution that satisfies the requirements, and designed prototype of the geographically distributed computing infrastructure with the decentralized scheme of jobs dispatching.
Keywords
Full Text:
PDF (Русский)References
Savin G.I., Shabanov B.M., Korneev V.V., Telegin P.N., Semenov D.V., Kiselev A.V., Kuznecov A.V., Vdovikin O.I., Aladyshev O.S., Ovsjannikov A.P. Creation of Distributed Infrastructure for Supercomputer Applications. Programmnye produkty i sistemy [Software & Systems]. 2008, no. 2, pp. 2–7. (in Russian).
Korneev V.V., Semenov D.V., Telegin P.N., Shabanov B.M. Resilient Decentralized GRID Resources Control. Izvestija vysshih uchebnyh zavedenij. Jelektronika [Proceedings of Universities. Electronics]. 2015, vol. 20, no. 1, pp. 83–90. (in Russian).
Baranov A.V., Kiselev A.V., Starichkov V.V., Ionin R.P., Lyakhovets D.S. Comparison of Workload Management Systems from the Point of View of Organizing an Industrial Computing. Nauchnyj servis v seti Internet: poisk novyh reshenij: Trudy mezhdunarodnoy superkomp’yuternoy konferentsii (Novorossiysk, 17–22 Sentyabrya 2012) [Scientific Services and Internet: Search for New Solutions: Proceedings of the International Supercomputing Conference (Novorossiysk, Russia, September, 17-22, 2012)]. Moscow, Publishing of Lomonosov Moscow State University, 2012, pp. 506–508. (In Russian).
Baranov A.V., Tihomirov A.I. Scheduling of Jobs in a Territorially Distributed Computing System with Absolute Priorities. Vychislitel’nye tehnologii [Computational Technologies]. 2017, vol. 22, no. S1, pp. 4–12. (in Russian).
Berezovskij P.S., Kovalenko V.N. Structure and Functionality of the Job Management System for Grid with Non-Clustered Resources. Preprinty IPM im. M. V. Keldysha [KIAM Preprints]. 2007, no. 67, pp. 1–29. (in Russian).
WMS Architecture overview. Available at: http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/wms.shtml (accessed: 27.03.2017).
Internal Architecture 5.14. Available at: http://www.gridway.org/doku.php?id=documentation:release_5.14:iashtml (accessed: 20.03.2017).
Cirne W., Brasileiro F., Costa L., Paranhos D., Santos-Neto E., Andrade N. Scheduling in Bag-of-Task Grids: PAUA Case. 16th Symposium on Computer Architecture and High Performance Computing. Oct. 2004, pp. 124–131. DOI: 10.1109/CAHPC.2004.37.
Kovalenko V.N., Orlov A.V. Metascheduling in GRID and Resource Reservation Protocol.
Preprinty IPM im. M. V. Keldysha [KIAM Preprints]. 2002, no. 1, pp. 1–25. (in Russian).
Buncic P., Saiz P., Peters A.J. The AliEn System, Status and Perspectives. 2003 Conference for Computing in High-Energy and Nuclear Physics, La Jolla, CA, USA, 24–28 Mar 2003. Available at: http://www.slac.stanford.edu/econf/C0303241/proc/papers/MOAT004.PDF (accessed: 20.03.2017).
Toporkov V.V., Emel’janov D.M., Potehin P.A. Job Batch Generation and Scheduling in Distributed Computing Environments. Vestnik Yuzho-Uralskogo gosudarstvennogo universiteta. Seriya: Vychislitel’naja matematika i informatika [Bulletin of South Ural State University. Series: Computational Mathematics and Software Engineering]. 2015, vol. 4, no. 2. pp. 44–57. DOI: 10.14529/cmse150204 (in Russian).
Valiev M.K., Kitaev E.L., Slepenkov M.I. LDAP Directory Service as a Tool for Implementation of Distributed Information Systems. Preprinty IPM im. M. V. Keldysha [KIAM Preprints]. 2000. no. 23. pp. 1–22. (in Russian).
Kesselman C., Fitzgerald S., Foster I., Tuecke S., Smith W. A Directory Service forConfiguring High-Performance Distributed Computations. 6th IEEE Symposium on HighPerformance Distributed Computing. 1997. pp. 365–375. DOI: 10.1109/HPDC.1997.626445.
Loewenstern A. Norberg A. DHT Protocol. 2008. Available at: http://bittorrent.org/beps/bep_0005.html (accessed: 11.03.2017).
ClickHouse Reference Manual. 2015. Available at: https://clickhouse.yandex/reference_en.html (accessed: 16.02.2017).
Elastic Stack and Product Documentation. 2016. Available at: https://www.elastic.co/guide/index.html (accessed: 22.01.2017).
Programming with Redis. 2016. Available at: https://redis.io/documentation (accessed:12.02.2017).
Prasad A. Announcing Docker Compose. 2015. Available at: https://blog.docker.com/2015/02/announcing-docker-compose/ (accessed: 26.02.2017).
DOI: http://dx.doi.org/10.14529/cmse170403


