Обзор моделей параллельных вычислений

Надежда Александровна Ежова, Леонид Борисович Соколинский

Аннотация


Цель данного обзора — дать максимально полное представление о достижениях и современном состоянии дел в разработке аналитических моделей параллельных вычислений, позволяющих предсказать время вычислений, ускорение, эффективность и масштабируемость параллельных алгоритмов применительно к различным целевым многопроцессорным платформам. Важность моделей параллельных вычислений вытекает из того, что они до реализации параллельного алгоритма в виде программы позволяют понять, насколько эффективно данный алгоритм может использовать конкретную многопроцессорную платформу, и при необходимости внести изменения в дизайн алгоритма, либо рассмотреть вариант замены целевой аппаратной платформы. В обзоре показывается эволюция моделей параллельных вычислений, происходившая одновременно с эволюцией многопроцессорных систем, от одноуровневых моделей с общей памятью до многоуровневых иерархических моделей с распределенной памятью, ориентированных на кластерные вычислительные системы с многоядерными ускорителями. В заключении обзора приводятся рекомендации по выбору возможных направлениий дальнейших исследований в области разработки математических моделей параллельных вычислений.

Ключевые слова


модель параллельных вычислений; обзор; параллельное программирование; многопроцессорные системы; оценка производительности; предсказание времени выполнения алгоритма

Полный текст:

PDF

Литература


Zhang Y. et al. Models of Parallel Computation: a Survey and Classification. Frontiers of Computer Science in China. Higher Education Press, 2007. vol. 1, no. 2. pp. 156–165. DOI: 10.1007/s11704-007-0016-1.

Valiant L.G. A Bridging Model for Parallel Computation. Communications of the ACM. 1990. vol. 33, no. 8. pp. 103–111. DOI: 10.1145/79173.79181.

Campbell D.K.G. A Survey of Models of Parallel Computation. Technical Report No.YCS-97-278. 1997. 37 p.

Shepherdson J.C., Sturgis H.E. Computability of Recursive Functions. Journal of the ACM. ACM, 1963. vol. 10, no. 2. pp. 217–255. DOI: 10.1145/321160.321170.

Elgot C.C., Robinson A. Random-Access Stored-Program Machines, an Approach to Programming Languages. Journal of the ACM. ACM, 1964. vol. 11, no. 4. pp. 365–399. DOI: 10.1145/321239.321240.

Hartmanis J. Computational Complexity of Random Access Stored Program Machines. Mathematical Systems Theory. Springer-Verlag, 1971. vol. 5, no. 3. pp. 232–245. DOI: 10.1007/BF01694180.

Cook S.A., Reckhow R.A. Time Bounded Random Access Machines. Journal of Computer and System Sciences. Academic Press, 1973. vol. 7, no. 4. pp. 354–375. DOI: 10.1016/S0022-0000(73)80029-7.

Aho A. V., Hopcroft J.E., Ullman J.D. The Design and Analysis of Computer Algorithms. London, Amsterdam, Don Mills, Ontario, Sydney: Addison-Wesley, 1974. 470 p.

Skillicorn D.B., Talia D. Models and Languages for Parallel Computation. ACM Computing Surveys. 1998. vol. 30, no. 2. pp. 123–169. DOI: 10.1145/280277.280278.

Fortune S., Wyllie J. Parallelism in Random Access Machines. Proceedings of the tenth annual ACM symposium on Theory of computing - STOC’78. New York, New York, USA: ACM Press, 1978. pp. 114–118. DOI: 10.1145/800133.804339.

Culler D. et al. LogP: Towards a Realistic Model of Parallel Computation. Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP’93. New York, New York, USA: ACM Press, 1993. pp. 1–12. DOI: 10.1145/155332.155333.

Yuan L. et al. LogGPH: A Parallel Computational Model with Hierarchical Communication Awareness. Proceedings of the 2010 13th IEEE International Conference on Computational Science and Engineering - CSE’10. Washington, DC, US: IEEE Computer Society, 2010. pp. 268–274. DOI: 10.1109/CSE.2010.40.

Lu F., Song J., Pang Y. HLognGP: A Parallel Computation Model for GPU clusters. Concurrency and Computation: Practice and Experience. 2015. vol. 27, no. 17. pp. 4880–4896. DOI: 10.1002/cpe.3475.

Qiao X., Chen S., Yang L.T. HPM: a Hierarchical Model for Parallel Computations. International Journal of High Performance Computing and Networking. 2004. vol. 1, no. 1–3. pp. 117–127. DOI: 10.1504/IJHPCN.2004.007571.

Rico-Gallego J.-A., Díaz-Martín J.-C. τ-Lop: Modeling Performance of Shared Memory MPI. Parallel Computing. North-Holland, 2015. vol. 46. pp. 14–31. DOI: 10.1016/J.PARCO.2015.02.006.

Rico-Gallego J.-A., Lastovetsky A.L., Diaz-Martin J.-C. Model-Based Estimation of the Communication Cost of Hybrid Data-Parallel Applications on Heterogeneous Clusters. IEEE Transactions on Parallel and Distributed Systems. 2017. vol. 28, no. 11. pp. 3215–3228. DOI: 10.1109/TPDS.2017.2715809.

Bilardi G. et al. On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation. Proceedings of the International Conference on Computational Science - ICCS’01. Part II. Lecture Notes in Computer Science, vol. 2074. Berlin, Heidelberg: Springer, 2001. pp. 579–588. DOI: 10.1007/3-540-45718-6_63.

Ezhova N.A., Sokolinsky L.B. Parallel Computational Model for multiprocessor Systems With Distributed Memory. Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering. 2018. vol. 7, no. 2. pp. 32–49. (in Russian) DOI: 10.14529/cmse180203.

Ezhova N.A., Sokolinsky L.B. Scalability Evaluation of Iterative Algorithms for Supercomputer Simulation of Physical Processes. Numerical methods and programming. 2018. vol. 19, no. 4. pp. 416–430. (in Russian) DOI: 10.26089/NumMet.v19r437.

Ceze L.H. Shared-Memory Multiprocessors. Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. pp. 1810–1812. DOI: 10.1007/978-0-387-09766-4_142.

Nayfeh B.A., Olukotun K. A Single-chip Multiprocessor. Computer. 1997. vol. 30, no. 9. pp. 79–85. DOI: 10.1109/2.612253.

Bardine A. et al. NUMA Caches. Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. pp. 1329–1338. DOI: 10.1007/978-0-387-09766-4_16.

Snir M. Distributed-Memory Multiprocessor. Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. pp. 574–578.

Pfister G.F. In Search of Clusters. 2nd Edition. Upper Saddle River, NJ: Prentice Hall, 1998. 575 p.

Beowulf Cluster Computing with Linux / ed. Sterling T.L. Cambridge, London: MIT Press, 2002. 496 p.

Owens J.D. et al. GPU Computing. Proceedings of the IEEE. 2008. vol. 96, no. 5. pp. 879–899. DOI: 10.1109/JPROC.2008.917757.

Rochange C., Uhrig S., Sainrat P. Memory Hierarchy. Time-Predictable Architectures. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. pp. 69–104. DOI: 10.1002/9781118790229.ch4.

Hennessy J.L., Patterson D.A. Computer Architecture: A Quantitative Approach. Computer. Fifth Edit. Morgan Kaufmann, 2011. 856 p.

Bottomley J. Understanding Caching. Linux Journal. 2004. no. 117. pp. 58–62.

Wu K. et al. Early Evaluation of Intel Optane Non-Volatile Memory with HPC I/O Workloads. arXiv:1708.02199v2 [cs.DC]. 2017. 6 p.

Yang C.-T., Huang C.-L., Lin C.-F. Hybrid CUDA, OpenMP, and MPI Parallel Programming on Multicore GPU Clusters. Computer Physics Communications. North-Holland, 2011. vol. 182, no. 1. pp. 266–269. DOI: 10.1016/J.CPC.2010.06.035.

Bilardi G., Pietracaprina A. Models of Computation, Theoretical. Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. pp. 1150–1158. DOI: 10.1007/978-0-387-09766-4_218.

Skillicorn D.B. Parallelism and the Bird-Meertens Formalism. Kingston, Canada, 1992. 16 p.

Bilardi G., Pietracaprina A., Pucci G. A Quantitative Measure of Portability with Application to Bandwidth-Latency Models for Parallel Computing. Euro-Par’99 Parallel Processing. Euro-Par 1999. Lecture Notes in Computer Science, vol 1685. Springer, Berlin, Heidelberg, 1999. pp. 543–551. DOI: 10.1007/3-540-48311-X_76.

Grama A. et al. Architecture Independent Analysis of Parallel Programs. Proceedings of the International Conference on Computational Science - ICCS’01. Part II. Lecture Notes in Computer Science, vol. 2074. Berlin, Heidelberg: Springer, 2001. pp. 599–608. DOI: 10.1007/3-540-45718-6_65.

JaJa J.F. PRAM (Parallel Random Access Machines). Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. pp. 1608–1615. DOI: 10.1007/978-0-387-09766-4_23.

Goldschlager L.M. A Unified Approach to Models of Synchronous Parallel Machines. Proceedings of the tenth annual ACM symposium on Theory of computing - STOC’78. New York, New York, USA: ACM Press, 1978. pp. 89–94. DOI: 10.1145/800133.804336.

Ladner R.E., Fischer M.J. Parallel Prefix Computation. Journal of the ACM. 1980. vol. 27, no. 4. pp. 831–838. DOI: 10.1145/322217.322232.

JaJa J.F. An Introduction to Parallel Algorithms. Redwood City, CA, USA: Addison Wesley Publishing Co., Reading, 1992. 576 p.

Darema F. et al. A Single-Program-Multiple-Data Computational Model for EPEX/FORTRAN. Parallel Computing. 1988. vol. 7, no. 1. pp. 11–24. DOI: 10.1016/0167-8191(88)90094-4.

Darema F. SPMD Computational Model. Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. pp. 1933–1943. DOI: 10.1007/978-0-387-09766-4_26.

Cook S., Dwork C., Reischuk R. Upper and Lower Time Bounds for Parallel Random Access Machines without Simultaneous Writes. SIAM Journal on Computing. Society for Industrial and Applied Mathematics, 1986. vol. 15, no. 1. pp. 87–97. DOI: 10.1137/0215006.

Karp R.M., Ramachandran V. Parallel Algorithms for Shared-Memory Machines. Handbook of theoretical computer science. Volume A: Algorithms and Complexity / ed. Van Leeuwen J. Amsterdam, New York, Oxford, Tokyo: Elsevier, 1990. pp. 871–941.

Pippenger N. On Simultaneous Resource Bounds. 20th Annual Symposium on Foundations of Computer Science (SFCS 1979). San Juan, Puerto Rico: IEEE, 1979. pp. 307–311. DOI: 10.1109/SFCS.1979.29.

Pippenger N. Pebbling with an Auxiliary Pushdown. Journal of Computer and System Sciences. Academic Press, 1981. vol. 23, no. 2. pp. 151–165. DOI: 10.1016/0022-0000(81)90011-8.

Snyder L. Type Architectures, Shared Memory, and the Corollary of Modest Potential. Annual Review of Computer Science. 1986. vol. 1, no. 1. pp. 289–317. DOI: 10.1146/annurev.cs.01.060186.001445.

Mehlhorn K., Vishkin U. Randomized and Deterministic Simulations of PRAMs by Parallel Machines with Restricted Granularity of Parallel Memories. Acta Informatica. Springer-Verlag, 1984. vol. 21, no. 4. pp. 339–374. DOI: 10.1007/BF00264615.

Gibbons P.B., Matias Y., Ramachandran V. The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms. SIAM Journal on Computing. 1998. vol. 28, no. 2. pp. 733–769. DOI: 10.1137/S009753979427491.

Gibbons P.B., Matias Y. Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation? Theory of Computing Systems. 1999. vol. 32, no. 3. pp. 327–359. DOI: 10.1007/s002240000121.

Aggarwal A., Chandra A.K., Snir M. On Communication Latency in PRAM Computations. Proceedings of the first annual ACM symposium on Parallel algorithms and architectures - SPAA’89. New York, New York, USA: ACM Press, 1989. pp. 11–21. DOI: 10.1145/72935.72937.

Mansour Y., Nisan N., Vishkin U. Trade-offs between Communication Throughput and Parallel Time. Journal of Complexity. Academic Press, 1999. vol. 15, no. 1. pp. 148–166. DOI: 10.1006/JCOM.1998.0498.

Cole R., Zajicek O. The APRAM: Incorporating Asynchrony into the PRAM Model. Proceedings of the first annual ACM symposium on Parallel algorithms and architectures - SPAA’89. New York, New York, USA: ACM Press, 1989. pp. 169–178. DOI: 10.1145/72935.72954.

Gibbons P.B. A More Practical PRAM Model. Proceedings of the first annual ACM symposium on Parallel algorithms and architectures - SPAA’89. New York, New York, USA: ACM Press, 1989. pp. 158–168. DOI: 10.1145/72935.72953.

Valiant L.G. General Purpose Parallel Architectures. Handbook of Theoretical Computer Science (vol. A): Algorithms and Complexity. Elsevier, 1990. pp. 943–971. DOI: 10.1016/B978-0-444-88071-0.50023-0.

de la Torre P., Kruskal C.P. Towards a Single Model of Efficient Computation in Real Parallel Machines. Future Generation Computer Systems. North-Holland, 1992. vol. 8, no. 4. pp. 395–408. DOI: 10.1016/0167-739X(92)90071-I.

Heywood T., Ranka S. A Practical Hierarchical Model of parallel Computation I. The model. Journal of Parallel and Distributed Computing. Academic Press, 1992. vol. 16, no. 3. pp. 212–232. DOI: 10.1016/0743-7315(92)90034-K.

Forsell M. A PRAM-NUMA Model of Computation for Addressing Low-TLP Workloads. International Journal of Networking and Computing. [Hiroshima University], 2011. vol. 1, no. 1. pp. 21–35.

Ranade A.G. How to Emulate Shared Memory. Journal of Computer and System Sciences. Academic Press, 1991. vol. 42, no. 3. pp. 307–326. DOI: 10.1016/0022-0000(91)90005-pp.

Forsell M. et al. Hardware and Software Support for NUMA Computing on Configurable Emulated Shared Memory Architectures. 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum. IEEE, 2013. pp. 640–648. DOI: 10.1109/IPDPSW.2013.146.

Forsell M. E - A Language for Thread-Level Parallel Programming on Synchronous Shared Memory NOCs. WSEAS Transactions on Computers. 2004. vol. 3, no. 3. pp. 807–812.

Forsell M., Leppanen V. An Extended PRAM-NUMA Model of Computation for TCF Programming. International Journal of Networking and Computing. 2013. vol. 3, no. 1. pp. 98–115.

Aggarwal A. et al. A Model for Hierarchical Memory. Proceedings of the nineteenth annual ACM conference on Theory of computing - STOC’87. New York, New York, USA: ACM Press, 1987. pp. 305–314. DOI: 10.1145/28395.28428.

Aggarwal A., Chandra A.K., Snir M. Hierarchical Memory with Block Transfer. 28th Annual Symposium on Foundations of Computer Science (sfcs 1987). IEEE, 1987. pp. 204–216. DOI: 10.1109/SFCS.1987.31.

Luccio F., Pagli L. A Model of Sequential Computation with Pipelined Access to Memory. Mathematical Systems Theory. Springer-Verlag, 1993. vol. 26, no. 4. pp. 343–356. DOI: 10.1007/BF01189854.

Mead C.A., Conway L.A. Introduction to VLSI Systems. Boston, MA, USA: Addison-Wesley, 1980. 396 p.

Alpern B. et al. The Uniform Memory Hierarchy Model of Computation. Algorithmica. Springer-Verlag, 1994. vol. 12, no. 2–3. pp. 72–109. DOI: 10.1007/BF01185206.

Vitter J.S., Shriver E.A.M. Algorithms for Parallel Memory, II: Hierarchical Multilevel Memories. Algorithmica. Springer-Verlag, 1994. vol. 12, no. 2–3. pp. 148–169. DOI: 10.1007/BF01185208.

Tiskin A. BSP (Bulk Synchronous Parallelism). Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. pp. 192–199. DOI: 10.1007/978-0-387-09766-4_311.

Goudreau M. et al. Towards Efficiency and Portability: Programming with the BSP Model. Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures - SPAA’96. New York, NY, USA: ACM Press, 1996. pp. 1–12. DOI: 10.1145/237502.237503.

Bisseling R.H. Parallel Scientific Computation: A Structured Approach using BSP and MPI. New York: Oxford University Press, 2004. pp. 325.

McColl W.F. Scalable Computing. J. van Leeuwen (eds). Computer Science Today: Recent Trends and Developments. Lecture Notes in Computer Science, vol. 1000. Berlin, Heidelberg: Springer, 1995. pp. 46–61. DOI: 10.1007/BFb0015236.

Tiskin A. The bulk-Synchronous Parallel Random Access Machine. Theoretical Computer Science. 1998. vol. 196, no. 1–2. pp. 109–130. DOI: 10.1016/S0304-3975(97)00197-7.

McColl W.F., Tiskin A. Memory-Efficient Matrix Multiplication in the BSP Model. Algorithmica. Springer-Verlag, 1999. vol. 24, no. 3–4. pp. 287–297. DOI: 10.1007/PL00008264.

Kielmann T., Gorlatch S. Bandwidth-Latency Models (BSP, LogP). Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. pp. 107–112. DOI: 10.1007/978-0-387-09766-4_189.

Alexandrov A. et al. LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation. Journal of Parallel and Distributed Computing. 1997. vol. 44, no. 1. pp. 71–79. DOI: 10.1006/jpdc.1997.1346.

Kielmann T., Bal H.E., Verstoep K. Fast Measurement of LogP Parameters for Message Passing Platforms. Parallel and Distributed Processing. IPDPS 2000. Lecture Notes in Computer Science, vol 1800. Berlin, Heidelberg: Springer, 2000. pp. 1176–1183. DOI: 10.1007/3-540-45591-4_162.

Gropp W., Lusk E., Skjellum A. Using MPI: Portable Parallel Programming with the Message-Passing Interface. Second Edi. MIT Press, 1999.

Gropp W. MPI 3 and Beyond: Why MPI Is Successful and What Challenges It Faces. Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol. 7490 / ed. Träff J.L., Benkner S., Dongarra J.J. Berlin, Heidelberg: Springer, 2012. pp. 1–9. DOI: 10.1007/978-3-642-33518-1_1.

Touyama T., Horiguchi S. Parallel Computation Model LogPQ. High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336 / ed. Polychronopoulos C., Joe K., Araki K. A.M. Berlin, Heidelberg: Springer, 1997. pp. 327–334. DOI: 10.1007/BFb0024227.

Touyama T., Horiguchi S. Performance Evaluation of Practical Parallel Computation Model LogPQ. Proceedings of the Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN’99). Washington, DC, USA: IEEE Computer Society, 1999. pp. 216–221. DOI: 10.1109/ISPAN.1999.778942.

Palmer J., Steele G.L. Connection Machine Model CM-5 System Overview. Frontiers’92, the Fourth Symposium on the Frontiers of Massive Parallel Computation, October 19-21, 1992, McLean, Virginia. IEEE Computer Society Press, 1992. pp. 474–483. DOI: 10.1109/FMPC.1992.234877.

Ino F., Fujimoto N., Hagihara K. LogGPS: A Parallel Computational Model for Synchronization Analysis. ACM SIGPLAN Notices. 2001. vol. 36, no. 7. pp. 133–142. DOI: 10.1145/568014.379592.

Gropp W. et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing. 1996. vol. 22, no. 6. pp. 789–828. DOI: 10.1016/0167-8191(96)00024-5.

Moritz C.A. et al. LoGPC: Modeling Network Contention in Message-Passing Programs. ACM SIGMETRICS Performance Evaluation Review. New York, New York, USA: ACM Press, 1998. vol. 26, no. 1. pp. 254–263. DOI: 10.1145/277851.277933.

Moritz C.A., Frank M.I. LoGPC: Modeling Network Contention in Message-Passing Programs. IEEE Transactions on Parallel and Distributed Systems. 2001. vol. 12, no. 4. pp. 404–415. DOI: 10.1109/71.920589.

Agarwal A. et al. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. Scalable Shared Memory Multiprocessors. Proceedings of a workshop held May 26-27, 1990, in Seattle, Wash. / ed. Dubois M., Thakkar S. Boston, MA: Springer, 1992. pp. 239–261. DOI: 10.1007/978-1-4615-3604-8_13.

Kubiatowicz J., Agarwal A. Anatomy of a Message in the Alewife Multiprocessor. ACM International Conference on Supercomputing 25th Anniversary Volume. New York, NY, USA: ACM Press, 2014. pp. 193–204. DOI: 10.1145/2591635.2667168.

Cameron K.W., Ge R., Sun X.-H. lognP and log3P: Accurate Analytical Models of Point-to-point Communication in Distributed Systems. IEEE Transactions on Computers. 2007. vol. 56, no. 3. pp. 314–327. DOI: 10.1109/TC.2007.38.

Cameron K.W., Ge R. Predicting and Evaluating Distributed Communication Performance. Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. IEEE, 2004. pp. 15. DOI: 10.1109/SC.2004.40.

Cameron K.W., Sun X.-H. Quantifying Locality Effect in Data Access Delay: Memory logP. Proceedings of the 2003 IEEE International Parallel and Distributed Processing Symposium (IPDPS’03). IEEE Comput. Soc, 2003. pp. 8. DOI: 10.1109/IPDPS.2003.1213137.

Cappello F. et al. HiHCoHP-Toward a Realistic Communication Model for Hierarchical Hyperclusters Of Heterogeneous Processors. Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001. IEEE Comput. Soc., 2001. pp. 6. DOI: 10.1109/IPDPS.2001.924978.

Cappello F. et al. An Algorithmic Model for Heterogeneous Hyper-clusters: Rationale and Experience. International Journal of Foundations of Computer Science. World Scientific Publishing Company, 2005. vol. 16, no. 02. pp. 195–215. DOI: 10.1142/S0129054105002942.

Bosque J.L., Pastor L. A Parallel Computational Model for Heterogeneous Clusters. IEEE Transactions on Parallel and Distributed Systems. 2006. vol. 17, no. 12. pp. 1390–1400. DOI: 10.1109/TPDS.2006.165.

Hoefler T. et al. LogfP - a Model for Small Messages in InfiniBand. Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. Washington, DC, USA: IEEE Computer Society, 2006. pp. 319–319. DOI: 10.1109/IPDPS.2006.1639624.

Jepsen T.C. InfiniBand. Distributed Storage Networks: Architecture, Protocols and Management. Chichester, West Sussex, England: John Wiley & Sons, 2013. pp. 159–174. DOI: 10.1002/9780470871461.ch6.

Nasri W., Tarhouni O., Slimi N. PLP: Towards a Realistic and Accurate Model for Communication Performances on Hierarchical Cluster-based Systems. 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 2008. pp. 1–8. DOI: 10.1109/IPDPS.2008.4536486.

Hoefler T., Schneider T., Lumsdaine A. LogGOPSim – Simulating Large-Scale Applications in the LogGOPS Model. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC’10. New York, New York, USA: ACM Press, 2010. pp. 597–604. DOI: 10.1145/1851476.1851564.

Valiant L.G. A Bridging Model for Multi-core Computing. Journal of Computer and System Sciences. Elsevier Inc., 2011. vol. 77, no. 1. pp. 154–166. DOI: 10.1016/j.jcss.2010.06.012.

Tu B. et al. Performance Analysis and Optimization of MPI Collective Operations on Multicore Clusters. The Journal of Supercomputing. Springer US, 2012. vol. 60, no. 1. pp. 141–162. DOI: 10.1007/s11227-009-0296-3.

Tu B. et al. Accurate Analytical Models for Message Passing on Multi-core Clusters. 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing. IEEE, 2009. pp. 133–139. DOI: 10.1109/PDpp.2009.18.

Sterling T. et al. SLOWER: A Performance Model for Exascale Computing. Supercomputing Frontiers and Innovations. 2014. vol. 1, no. 2. pp. 42–57. DOI: 10.14529/jsfi140203.

Gerbessiotis A. V. Extending the BSP Model for Multi-core and Out-of-core Computing: MBSP. Parallel Computing. Elsevier B.V., 2015. vol. 41. pp. 90–102. DOI: 10.1016/j.parco.2014.12.002.

Amaris M. et al. A Simple BSP-based Model to Predict Execution Time in GPU Applications. 2015 IEEE 22nd International Conference on High Performance Computing (HiPC). IEEE, 2015. pp. 285–294. DOI: 10.1109/HiPC.2015.34.

Maggs B.M., Matheson L.R., Tarjan R.E. Models of Parallel Computation: a Survey and Synthesis. Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences. IEEE Comput. Soc. Press, 1995. pp. 61–70. DOI: 10.1109/HICSS.1995.375476.

Rico-Gallego J.A. et al. A Survey of Communication Performance Models for High-Performance Computing. ACM Computing Surveys. ACM, 2019. vol. 51, no. 6. pp. 1 36. DOI: 10.1145/3284358.




DOI: http://dx.doi.org/10.14529/cmse190304