Job Scheduling Archives

HPCC’18 Paper: HPC Demand Response via Power Capping and Node Scaling

Enabling Demand Response for HPC Systems Through Power Capping and Node Scaling, Kishwar Ahmed, Jason Liu, and Kazutomo Yoshii. In Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications (HPCC-2018), June 2018. [to appear]

Abstract

Demand response is an increasingly popular program ensuring power grid stability during a sudden surge in power demand. We expect high-performance computing (HPC) systems to be valued participants in such program for their massive power consumption. In this paper, we propose an emergency demand-response model exploiting both power capping of HPC systems and node scaling of HPC applications. First, we present power and performance prediction models for HPC systems with only power capping, upon which we propose our demand-response model. We validate the models with real-life measurements of application characteristics. Next, we present models to predict energy-to-solution for HPC applications with different numbers of nodes and power-capping values, and we validate the models. Based on the prediction models, we propose an emergency demand response participation model for HPC systems to determine optimal resource allocation based on power capping and node scaling. Finally, we demonstrate the effectiveness of our proposed demand-response model using real-life measurements and trace data. We show that our approach can reduce energy consumption with only a slight increase in the execution time for HPC applications during critical demand response periods.

Bibtex

@inproceedings{hpcc18-power,
title = {Enabling Demand Response for HPC Systems Through Power Capping and Node Scaling},
author = {Kishwar Ahmed and Jason Liu and Kazutomo Yoshii},
booktitle = {Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications (HPCC'18)},
month = {June},
year = {2018}
}

WSC’17 Paper: HPC Job Scheduling Simulation

Simulation of HPC Job Scheduling and Large-Scale Parallel Workloads, Mohammad Abu Obaida and Jason Liu. In Proceedings of the 2017 Winter Simulation Conference (WSC 2017), W. K. V. Chan, A. D’Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds., December 2017. [paper]

Abstract

The paper presents a simulator designed specifically for evaluating job scheduling algorithms on large-scale HPC systems. The simulator was developed based on the Performance Prediction Toolkit (PPT), which is a parallel discrete-event simulator written in Python for rapid assessment and performance prediction of large-scale scientific applications on supercomputers. The proposed job scheduler simulator incorporates PPT’s application models, and when coupled with the sufficiently detailed architecture models, can represent more realistic job runtime behaviors. Consequently, the simulator can evaluate different job scheduling and task mapping algorithms on the specific target HPC platforms more accurately.

Bibtex

@inproceedings{wsc17-jobsched,
title = {Simulation of HPC Job Scheduling and Large-Scale Parallel Workloads}, 
author = {Mohammad Abu Obaida and Jason Liu},
booktitle = {Proceedings of the 2017 Winter Simulation Conference (WSC 2017)}, 
editor = {W. K. V. Chan and A. D’Ambrogio and G. Zacharewicz and N. Mustafee and G. Wainer and E. Page},
month = {December},
year = {2017}
}

MASCOTS’17 Paper: Energy Demand Response Scheduling

An Energy Efficient Demand-Response Model for High Performance Computing Systems, Kishwar Ahmed, Jason Liu, and Xingfu Wu. In Proceedings of the 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017), September 2017. [paper]

Abstract

Demand response refers to reducing energy consumption of participating systems in response to transient surge in power demand or other emergency events. Demand response is particularly important for maintaining power grid transmission stability, as well as achieving overall energy saving. High Performance Computing (HPC) systems can be considered as ideal participants for demand-response programs, due to their massive energy demand. However, the potential loss of performance must be weighed against the possible gain in power system stability and energy reduction. In this paper, we explore the opportunity of demand response on HPC systems by proposing a new HPC job scheduling and resource provisioning model. More specifically, the proposed model applies power-bound energy-conservation job scheduling during the critical demand-response events, while maintaining the traditional performance-optimized job scheduling during the normal period. We expect such a model can attract willing articipation of the HPC systems in the demand response programs, as it can improve both power stability and energy saving without significantly compromising application performance. We implement the proposed method in a simulator and compare it with the traditional scheduling approach. Using trace-driven simulation, we demonstrate that the HPC demand response is a viable approach toward power stability and energy savings with only marginal increase in the jobs’ execution time.

Bibtex

@inproceedings{mascots17-energy,
  title={An Energy Efficient Demand-Response Model for High Performance Computing Systems},
  author={Ahmed, Kishwar and Liu, Jason and Wu, Xingfu},
  booktitle={Proceedings of the 25th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017)},
  pages={175--186},
  month={September},
  year={2017}
}

Slides

HPPAC’17 Paper: Energy-Aware Scheduling

When Good Enough Is Better: Energy-Aware Scheduling for Multicore Servers, Xinning Hui, Zhihui Dua, Jason Liu, Hongyang Sun, Yuxiong He, David A. Bader. In Proceedings of the 13th Workshop on High-Performance, Power-Aware Computing (HPPAC 2017), held in conjunction with 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2017), May 2017. [paper]

Abstract

Power is a primary concern for mobile, cloud, and high-performance computing applications. Approximate computing refers to running applications to obtain results with tolerable errors under resource constraints, and it can be applied to balance energy consumption with service quality. In this paper, we propose a “Good Enough (GE)” scheduling algorithm that uses approximate computing to provide satisfactory QoS (Quality of Service) for interactive applications with significant energy savings. Given a user-specified quality level, the GE algorithm works in the AES (Aggressive Energy Saving) mode for the majority of the time, neglecting the low-quality portions of the workload. When the perceived quality falls below the required level, the algorithm switches to the BQ (Best Quality) mode with a compensation policy. To avoid core speed thrashing between the two modes, GE employs a hybrid power distribution scheme that uses the Equal-Sharing (ES) policy to distribute power among the cores when the workload is light (to save energy) and the Water-Filling (WF) policy when the workload is high (to improve quality). We conduct simulations to compare the performance of GE with existing scheduling algorithms. Results show that the proposed algorithm can provide large energy savings with satisfactory user experience.

Bibtex

@INPROCEEDINGS{ipdpsw17-approx,
author={X. Hui and Z. Du and J. Liu and H. Sun and Y. He and D. A. Bader},
booktitle={Proceedings of the 13th Workshop on High-Performance, Power-Aware Computing (HPPAC 2017), held in conjunction with 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2017)},
title={When Good Enough Is Better: Energy-Aware Scheduling for Multicore Servers},
pages={984-993},
doi={10.1109/IPDPSW.2017.38},
month={May},
year={2017}
}