WSC’17 Paper: HPC Job Scheduling Simulation

Simulation of HPC Job Scheduling and Large-Scale Parallel Workloads, Mohammad Abu Obaida and Jason Liu. In Proceedings of the 2017 Winter Simulation Conference (WSC 2017), W. K. V. Chan, A. D’Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds., December 2017. To appear. [paper]

abstractbibtex
The paper presents a simulator designed specifically for evaluating job scheduling algorithms on large-scale HPC systems. The simulator was developed based on the Performance Prediction Toolkit (PPT), which is a parallel discrete-event simulator written in Python for rapid assessment and performance prediction of large-scale scientific applications on supercomputers. The proposed job scheduler simulator incorporates PPT’s application models, and when coupled with the sufficiently detailed architecture models, can represent more realistic job runtime behaviors. Consequently, the simulator can evaluate different job scheduling and task mapping algorithms on the specific target HPC platforms more accurately.
Not yet available.

WSC’17 Paper: HPC Simulation History

A Brief History of HPC Simulation and Future Challenges, Kishwar Ahmed, Jason Liu, Abdel-Hameed Badawy, and Stephan Eidenbenz. In Proceedings of the 2017 Winter Simulation Conference (WSC 2017), W. K. V. Chan, A. D’Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds., December 2017. To appear. [paper]

abstractbibtex
High-performance Computing (HPC) systems have gone through many changes during the past two decades in their architectural design to satisfy the increasingly large-scale scientific computing demand. Accurate, fast, and scalable performance models and simulation tools are essential for evaluating alternative architecture design decisions for the massive-scale computing systems. This paper recounts some of the influential work in modeling and simulation for HPC systems and applications, identifies some of the major challenges, and outlines future research directions which we believe are critical to the HPC modeling and simulation community.
Not yet available.

MASCOTS’17 Paper: Energy Demand Response Scheduling

An Energy Efficient Demand-Response Model for High Performance Computing Systems, Kishwar Ahmed, Jason Liu, and Xingfu Wu. In Proceedings of the 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017), September 2017.  [paper]

Abstract

Demand response refers to reducing energy consumption of participating systems in response to transient surge in power demand or other emergency events. Demand response is particularly important for maintaining power grid transmission stability, as well as achieving overall energy saving. High Performance Computing (HPC) systems can be considered as ideal participants for demand-response programs, due to their massive energy demand. However, the potential loss of performance must be weighed against the possible gain in power system stability and energy reduction. In this paper, we explore the opportunity of demand response on HPC systems by proposing a new HPC job scheduling and resource provisioning model. More specifically, the proposed model applies power-bound energy-conservation job scheduling during the critical demand-response events, while maintaining the traditional performance-optimized job scheduling during the normal period. We expect such a model can attract willing articipation of the HPC systems in the demand response programs, as it can improve both power stability and energy saving without significantly compromising application performance. We implement the proposed method in a simulator and compare it with the traditional scheduling approach. Using trace-driven simulation, we demonstrate that the HPC demand response is a viable approach toward power stability and energy savings with only marginal increase in the jobs’ execution time.

Bibtex

Not yet available.

Slides

Invited Talk: High-Performance Modeling and Simulation of Computer Networks

High-Performance Modeling and Simulation of Computer Networks

May 26, 2017

Department of Computer Science
Tsinghua University, Beijing, China
Host: Professor Zhihui Du (都志辉)

Abstract: Modeling and simulation (M&S) plays an important role in the design analysis and performance evaluation of complex systems. Many of these systems, such as computer networks, involve a large number of interrelated components and processes. Complex behaviors emerge as these components and processes inter-operate across multiple scales at various granularities. M&S must be able to provide sufficiently accurate results while coping with the scale and complexity.

My talk will focus on two novel techniques in high-performance network modeling and simulation. The first is a GPU-assisted hybrid network traffic modeling method. The hybrid approach offloads the computationally intensive bulk traffic calculations to the background onto GPU, while leaving detailed simulation of network transactions in the foreground on CPU. Our experiments show that the CPU-GPU hybrid approach can achieve significant performance improvement over the CPU-only approach.

The second technique is a distributed network emulation method based on simulation symbiosis. Mininet is a container-based emulation environment that can study networks consisted of virtual hosts and OpenFlow-enabled virtual switches on Linux. It is well-known, however, that experiments using Mininet may lose fidelity for large-scale networks and heavy traffic load. The proposed symbiotic approach uses an abstract network model to coordinate distributed Mininet instances with superimposed traffic to represent large-scale network scenarios.

Invited Talk: High-Performance Modeling and Simulation of Computer Networks

High-Performance Modeling and Simulation of Computer Networks

April 26, 2017

Laboratory of Information, Networking and Communication Sciences (LINCS), Paris, France
Host: Professor Dario Rossi

Abstract: Modeling and simulation (M&S) plays an important role in the design analysis and performance evaluation of complex systems. Many of these systems, such as computer networks, involve a large number of interrelated components and processes. Complex behaviors emerge as these components and processes inter-operate across multiple scales at various granularities. M&S must be able to provide sufficiently accurate results while coping with the scale and complexity.
My talk will focus on two novel techniques in high-performance network modeling and simulation. The first is a GPU-assisted hybrid network traffic modeling method. The hybrid approach offloads the computationally intensive bulk traffic calculations to the background onto GPU, while leaving detailed simulation of network transactions in the foreground on CPU. Our experiments show that the CPU-GPU hybrid approach can achieve significant performance improvement over the CPU-only approach.
The second technique is a distributed network emulation method based on simulation symbiosis. Mininet is a container-based emulation environment that can study networks consisted of virtual hosts and OpenFlow-enabled virtual switches on Linux. It is well-known, however, that experiments using Mininet may lose fidelity for large-scale networks and heavy traffic load. The proposed symbiotic approach uses an abstract network model to coordinate distributed Mininet instances with superimposed traffic to represent large-scale network scenarios.

Invited Talk: Extending PrimoGENI for Symbiotic Distributed Network Emulation

Extending PrimoGENI for Symbiotic Distributed Network Emulation

March 13, 2017

GENI Regional Workshop (GRW), held in conjunction with GEC25 Miami, Florida, USA

The talk includes recent development in hybrid at-scale network experimentation, which extends the previous PrimoGENI project.

[slides]

HPCC’16 Paper: HPC Interconnect Model

Scalable Interconnection Network Models for Rapid Performance Prediction of HPC Applications, Kishwar Ahmed, Jason Liu, Stephan Eidenbenz, and Joe Zerr. In Proceedings of the 18th International Conference on High Performance Computing and Communications (HPCC 2016), December 2016. [paper] [slides]

abstractbibtex
Performance Prediction Toolkit (PPT) is a simulator mainly developed at Los Alamos National Laboratory to facilitate rapid and accurate performance prediction of large-scale scientific applications on existing and future HPC architectures. In this paper, we present three interconnect models for performance prediction of large-scale HPC applications. They are based on interconnect topologies widely used in HPC systems: torus, dragonfly, and fat-tree. We conduct extensive validation tests of our interconnect models, in particular, using configurations of existing HPC systems. Results show that our models provide good accuracy for predicting the network behavior. We also present a performance study of a parallel computational physics application to show that our model can accurately predict the parallel behavior of large-scale applications.
@INPROCEEDINGS{Ahmed2016:scale-intercon,
author={K. Ahmed and J. Liu and S. Eidenbenz and J. Zerr},
booktitle={Proceedings of the IEEE 18th International Conference on High Performance Computing and Communications (HPCC)},
title={Scalable Interconnection Network Models for Rapid Performance Prediction of HPC Applications},
year={2016},
pages={1069-1078},
doi={10.1109/HPCC-SmartCity-DSS.2016.0151},
month={Dec},}

WSC’16 Paper: Simulation Reproducibility

Panel – Reproducible Research in Discrete-Event Simulation – A Must or Rather a Maybe? Adelinde M. Uhrmacher, Sally Brailsford, Jason Liu, Markus Rabe, and Andreas Tolk. In Proceedings of the 2016 Winter Simulation Conference (WSC 2016), T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds., December 2016. [paper]

abstractbibtex
Scientific research should be reproducible, and as such also simulation research. However, the question is – is this really the case? In some application areas of simulation, e.g., cell biology, simulation studies cannot be published without data, models, methods, including computer code being made available for evaluation. With the applications and methodological areas of modeling and simulation, how the problem of reproducibility is assessed and addressed differs. The diversity of answers to this question will be illuminated by looking into the area of network simulations, simulation in logistics, in military, and health. Making different scientific cultures, different challenges, and different solutions in discrete event simulation explicit is central to improving the reproducibility and thus quality of discrete event simulation research.
@INPROCEEDINGS{Uhrmacher2016:panel,
author={A. M. Uhrmacher and S. Brailsford and J. Liu and M. Rabe and A. Tolk},
booktitle={2016 Winter Simulation Conference (WSC)},
title={Panel–Reproducible research in discrete event simulation–A must or rather a maybe?},
year={2016},
pages={1301-1315},
doi={10.1109/WSC.2016.7822185},
month={Dec},}

PADS’16 Paper: Integrated Interconnect Model

An Integrated Interconnection Network Model for Large-Scale Performance Prediction, Kishwar Ahmed, Mohammad Obaida, Jason Liu, Stephan Eidenbenz, Nandakishore Santhi, and Guillaume Chapuis. In Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS 2016), May 2016. [paper]

abstractbibtex
Interconnection network is a critical component of high- performance computing architecture and application co-design. For many scientific applications, the increasing communication complexity poses a serious concern as it may hinder the scaling properties of these applications on novel architectures. It is apparent that a scalable, efficient, and accurate interconnect model would be essential for performance evaluation studies. In this paper, we present an interconnect model for predicting the performance of large-scale applications on high-performance architectures. In particular, we present a sufficiently detailed interconnect model for Cray’s Gemini 3-D torus network. The model has been integrated with an implementation of the Message-Passing Interface (MPI) that can mimic most of its functions with packet-level accuracy on the target platform. Extensive experiments show that our integrated model provides good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance.
@inproceedings{Ahmed2016:interconnect,
author = {Ahmed, Kishwar and Obaida, Mohammad and Liu, Jason and Eidenbenz, Stephan and Santhi, Nandakishore and Chapuis, Guillaume},
title = {An Integrated Interconnection Network Model for Large-Scale Performance Prediction},
booktitle = {Proceedings of the 2016 Annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation},
series = {SIGSIM-PADS ’16},
year = {2016},
isbn = {978-1-4503-3742-7},
location = {Banff, Alberta, Canada},
pages = {177–187},
numpages = {11},
url = {http://doi.acm.org/10.1145/2901378.2901396},
doi = {10.1145/2901378.2901396},
acmid = {2901396},
publisher = {ACM},
address = {New York, NY, USA},
}

TOMACS’15 Paper: Symbiotic Network Simulation and Emulation

Symbiotic Network Simulation and Emulation, Miguel Erazo, Rong Rong, and Jason Liu. ACM Transactions on Modeling and Computer Simulation (TOMACS), 26(1), Article No. 2, December 2015. [paper]

abstractbibtex
A testbed capable of representing detailed operations of complex applications under diverse network conditions is invaluable for understanding the design and performance of new protocols and applications before their real deployment. We introduce a novel method that combines high-performance large-scale network simulation and high-fidelity network emulation, and thus enables real instances of network applications and protocols to run in real operating environments and be tested under simulated network settings. Using our approach, network simulation and emulation can form a symbiotic relationship, through which they are synchronized for an accurate representation of the network-scale traffic behavior. We introduce a model downscaling method along with an efficient queuing model and a traffic reproduction technique, which can significantly reduce the synchronization overhead and improve accuracy. We validate our approach with extensive experiments via simulation and with a real-system implementation. We also present a case study using our approach to evaluate a multipath data transport protocol.
@article{Erazo2015:symbiosis,
author = {Erazo, Miguel A. and Rong, Rong and Liu, Jason},
title = {Symbiotic Network Simulation and Emulation},
journal = {ACM Trans. Model. Comput. Simul.},
issue_date = {December 2015},
volume = {26},
number = {1},
month = jun,
year = {2015},
issn = {1049-3301},
pages = {2:1–2:25},
articleno = {2},
numpages = {25},
url = {http://doi.acm.org/10.1145/2717308},
doi = {10.1145/2717308},
acmid = {2717308},
publisher = {ACM},
address = {New York, NY, USA},
}