WSC’18 Paper: Just-In-Time Parallel Simulation

Just-In-Time Parallel Simulation, Christopher Hannon, Nandakishore Santhi, Stephan Eidenbenz, Jason Liu, and Dong Jin. In Proceedings of the 2018 Winter Simulation Conference (WSC 2018), December 2018. (To appear).

Abstract

Due to the evolution of programming languages, interpreted languages have gained widespread use in scientific and research computing. Interpreted languages excel at being portable, easy to use, and fast in prototyping than their ahead-of-time (AOT) counterparts, including C, C++, and Fortran. While traditionally considered as slow to execute, advancements in Just-in-Time (JIT) compilation techniques have significantly improved the execution speed of interpreted languages and in some cases outperformed AOT languages. In this paper, we explore some challenges and design strategies in developing a high performance parallel discrete event simulation engine, called Simian, written with interpreted languages with JIT capabilities, including Python, Lua, and Javascript. Our results show that Simian with JIT performs similarly to AOT simulators, such as MiniSSF and ROSS. We expect that with features like good performance, user-friendliness, and portability, the just-in-time parallel simulation will become a common choice for modeling and simulation in the near future.

Bibtex

@inproceedings{jit-parallel,
title = {Just-In-Time Parallel Simulation},
author = {Hannon, Christopher and Santhi, Nandakishore and Eidenbenz, Stephan and Liu, Jason and Jin, Dong},
booktitle = {Proceedings of the 2018 Winter Simulation Conference (WSC 2018)},
month = {December},
year = {2018}
}

WSC’18 Paper: IMCSim: Parameterized Performance Prediction for Implicit Monte Carlo Codes

IMCSim: Parameterized Performance Prediction for Implicit Monte Carlo Codes, Gopinath Chennupathi, Stephan Eidenbenz, Alex Long, Olena Tkachenko, Joseph Zerr, and Jason Liu. In Proceedings of the 2018 Winter Simulation Conference (WSC 2018), December 2018. (To appear).

Abstract

Monte Carlo techniques to radiation transport play a significant role in modeling complex astrophysical phenomena. In this paper, we design an application model (IMCSim) of an Implicit Monte Carlo (IMC) particle code using the Performance Prediction Toolkit (PPT), a discrete-event simulation-based modeling framework for predicting code performance on a large range of parallel platforms. We present validation results for IMCSim. We then use the fast parameter scanning that such a high-level loop-structure model of a complex code enables to predict optimal IMC parameter settings for interconnect latency hiding. We find that variations in interconnect bandwidth have a significant effect on optimal parameter values. Our results suggest potential value using IMCSim as a pre-step to substantial IMC runs to quickly identify optimal parameter values for the specific hardware platform on which IMC runs.

Bibtex

@inproceedings{imcsim,
title = {IMCSim: Parameterized Performance Prediction for Implicit Monte Carlo Codes},
author = {Chennupathi, Gopinath and Eidenbenz, Stephan and Long, Alex and Tkachenko, Olena and Zerr, Joseph and Liu, Jason},
booktitle = {Proceedings of the 2018 Winter Simulation Conference (WSC 2018)},
month = {December},
year = {2018}
}

SUSCOM’18 Paper: Program Power Profiling Based on Phase Behaviors

Program Power Profiling Based on Phase Behaviors, Xiaobin Ma, Zhihui Du, and Jason Liu. Sustainable Computing, Informatics and Systems, doi:10.1016/j.suscom.2018.05.001 – 17 May 2018. To appear. [preprint]

Abstract

Power profiling tools based on fast and accurate workload analysis can be useful for job scheduling and resource allocation aiming to optimize the power consumption of large-scale, high-performance computer systems. In this article, we propose a novel method for predicting the power consumption of a complete workload or application by extrapolating the power consumption of only a few code segments of the same application obtained from measurements. As such, it provides a fast and yet effective way for predicting the power consumption of the execution of both single and multi-threaded programs on arbitrary architectures without having to profile the entire program’s execution. The latter would be costly to obtain, especially if it is a long-running program. Our method employs a set of code analysis tools to capture the program’s phase behavior and then uses a multi-variable linear regression method to estimate the power consumption of the entire program. For validation, we select the SPEC 2006 benchmark suite and the NAS parallel benchmarks to evaluate the accuracy and effectiveness of our method. Experimental results on three generations of multicore processors show that our power profiling method achieves good accuracy in predicting program’s energy use with relatively small errors.

Bibtex

@Article{suscom18,
AUTHOR = {Ma, Xiaobin and Du, Zhihui and Liu, Jason},
TITLE = {Program Power Profiling Based on Phase Behaviors},
JOURNAL = {Sustainable Computing, Informatics and Systems},
URL = {https://doi.org/10.1016/j.suscom.2018.05.001},
DOI = {10.1016/j.suscom.2018.05.001}
}

DSS-2018 Paper: Analysis of MOOC Learning Rhythms

Analysis of MOOC Learning Rhythms, Jingjing He, Chang Men, Senbiao Fang, Zhihui Du, Jason Liu, and Manli Li. In Proceedings of the 4th IEEE International Conference on Data Science and Systems (DSS-2018), June 2018. [to appear]

Abstract

With the increasing popularity of Massive Open Online Course (MOOC), a large amount of data has been collected by the MOOC platforms about the users and their interactions with the platforms. Many studies analyze the data to understand the online learning behavior of the students in order to improve the courses and services. In this paper, we propose the concept of learning rhythms. We divide the students into three groups corresponding to the level of engagement with the course. We capture the learning behavior on different learning units by observing the delay time and the study time of the students, and use them to infer the eagerness and intensity applied to studying the materials. We use the frequent tree mining technique to extract frequent patterns. The most frequently occurred subtrees are identified as typical learning rhythms. To evaluate our method, we analyze the data provided by XuetangX, an online learning platform in China, and study the learning rhythms using one of its most popular courses.

Bibtex

@inproceedings{dss18-mooc,
title = {Analysis of MOOC Learning Rhythms},
author = {Jingjing He and Chang Men and Senbiao Fang and Zhihui Du and Jason Liu and Manli Li},
booktitle = {Proceedings of the 4th IEEE International Conference on Data Science and Systems (DSS-2018)},
month = {June},
year = {2018}
}

HPCC’18 Paper: HPC Demand Response via Power Capping and Node Scaling

Enabling Demand Response for HPC Systems Through Power Capping and Node Scaling, Kishwar Ahmed, Jason Liu, and Kazutomo Yoshii. In Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications (HPCC-2018), June 2018. [to appear]

Abstract

Demand response is an increasingly popular program ensuring power grid stability during a sudden surge in power demand. We expect high-performance computing (HPC) systems to be valued participants in such program for their massive power consumption. In this paper, we propose an emergency demand-response model exploiting both power capping of HPC systems and node scaling of HPC applications. First, we present power and performance prediction models for HPC systems with only power capping, upon which we propose our demand-response model. We validate the models with real-life measurements of application characteristics. Next, we present models to predict energy-to-solution for HPC applications with different numbers of nodes and power-capping values, and we validate the models. Based on the prediction models, we propose an emergency demand response participation model for HPC systems to determine optimal resource allocation based on power capping and node scaling. Finally, we demonstrate the effectiveness of our proposed demand-response model using real-life measurements and trace data. We show that our approach can reduce energy consumption with only a slight increase in the execution time for HPC applications during critical demand response periods.

Bibtex

@inproceedings{hpcc18-power,
title = {Enabling Demand Response for HPC Systems Through Power Capping and Node Scaling},
author = {Kishwar Ahmed and Jason Liu and Kazutomo Yoshii},
booktitle = {Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications (HPCC'18)},
month = {June},
year = {2018}
}

IEEECloud’18 Paper: Detecting Containerized Application Dependencies

A Toolset for Detecting Containerized Application’s Dependencies in CaaS Clouds, Pinchao Liu, Liting Hu, Hailu Xu, Zhiyuan Shi, Jason Liu, Qingyang Wang, Jai Dayal, and Yuzhe Tang. In Proceedings of the 2018 IEEE International Conference on Cloud Computing (IEEE CLOUD 2018), July 2018. [to appear]

Abstract

There has been a dramatic increase in the popularity of Container as a Service (CaaS) clouds. The CaaS multi-tier applications could be optimized by using network topology, link or server load knowledge to choose the best endpoints to run in CaaS cloud. However, it is difficult to apply those optimizations to the public datacenter shared by multi-tenants. This is because of the opacity between the tenants and the datacenter providers: Providers have no insight into tenant’s container workloads and dependencies, while tenants have no clue about the underlying network topology, link, and load. As a result, containers might be booted at wrong physical nodes that lead to performance degradation due to bi-section bandwidth bottleneck or co-located container interference. We propose `DocMan’, a toolset that adopts a black-box approach to discover container ensembles and collect information about intra-ensemble container interactions. It uses a combination of techniques such as distance identification and hierarchical clustering. The experimental results demonstrate that DocMan enables optimized containers placement to reduce the stress on bi-section bandwidth of the datacenter’s network. The method can detect container ensembles at low cost and with 92% accuracy and significantly improve performance for multi-tier applications under the best of circumstances.

Bibtex

@inproceedings{cloud18-docman,
title = {A Toolset for Detecting Containerized Application's Dependencies in CaaS Clouds},
author = {Pinchao Liu and Liting Hu and Hailu Xu and Zhiyuan Shi and Jason Liu and Qingyang Wang and Jai Dayal and Yuzhe Tang},
booktitle = {Proceedings of the 2018 IEEE International Conference on Cloud Computing (IEEE CLOUD 2018)},
month = {July},
year = {2018}
}

HotStorage’18 Paper: ML-based Cache Replacement

Driving Cache Replacement with ML-based LeCaR, Giuseppe Vietri, Liana V. Rodriguez, Wendy A. Martinez, Steven Lyons, Jason Liu, Raju Rangaswami, Giri Narasimhan, and Ming Zhao. In Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’18), July 2018. [to appear]

Abstract

Can machine learning (ML) be used to improve on existing cache replacement strategies? We propose a general framework called LeCar that uses the ML technique of regret minimization to answer the question in the affirmative. Surprisingly, we show that the LeCar framework outperforms ARC using only two fundamental eviction policies — LRU and LFU. We also show that the performance gap increases when the size of the available cache gets smaller relative to the size of the working set.

Bibtex

@inproceedings{hotstorage18-lecar,
title = {Driving Cache Replacement with ML-based {LeCaR}},
author = {Giuseppe Vietri and Liana V. Rodriguez and Wendy A. Martinez and Steven Lyons and Jason Liu and Raju Rangaswami and Giri Narasimhan and Ming Zhao},
booktitle = {Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’18)},
month = {July},
year = {2018}
}

SIGSIM-PADS’18 Paper: Parallel Application Performance Prediction

Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations, Mohammad Abu Obaida, Jason Liu, Gopinath Chennupati, Nandakishore Santhi, and Stephan Eidenbenz. In Proceedings of the 2018 SIGSIM Principles of Advanced Discrete Simulation (SIGSIM-PADS’18), May 2018. [paper]

Abstract

Parallel application performance models provide valuable insight about the performance in real systems. Capable tools providing fast, accurate, and comprehensive prediction and evaluation of high-performance computing (HPC) applications and system architectures have important value. This paper presents PyPassT, an analysis based modeling framework built on static program analysis and integrated simulation of target HPC architectures. More specifically, the framework analyzes application source code written in C with OpenACC directives and transforms it into an application model describing its computation and communication behavior (including CPU and GPU workloads, memory accesses, and message-passing transactions). The application model is then executed on a simulated HPC architecture for performance analysis. Preliminary experiments demonstrate that the proposed framework can represent the runtime behavior of benchmark applications with good accuracy.

Bibtex

@inproceedings{pads18-hpcpred,
title = {Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations},
author = {Mohammad Abu Obaida and Jason Liu and Gopinath Chennupati and Nandakishore Santhi and Stephan Eidenbenz},
booktitle = {Proceedings of the 2018 SIGSIM Principles of Advanced Discrete Simulation (SIGSIM-PADS’18)},
pages = {49--59},
month = {May},
year = {2018},
doi = {10.1145/3200921.3200937}
}

Slides

Information’17 Paper: Investigating the Statistical Distribution of Learning Coverage in MOOCs

Investigating the Statistical Distribution of Learning Coverage in MOOCs, Xiu Li, Chang Men, Zhihui Du, Jason Liu, Manli Li, and Xiaolei Zhang. Information 2017, 8(4), 153; doi:10.3390/info8040150 – 20 November 2017. [paper]

Abstract

Learners participating in Massive Open Online Courses (MOOC) have a wide range of backgrounds and motivations. Many MOOC learners enroll in the courses to take a brief look; only a few go through the entire content, and even fewer are able to eventually obtain a certificate. We discovered this phenomenon after having examined 92 courses on both xuetangX and edX platforms. More specifically, we found that the learning coverage in many courses—one of the metrics used to estimate the learners’ active engagement with the online courses—observes a Zipf distribution. We apply the maximum likelihood estimation method to fit the Zipf’s law and test our hypothesis using a chi-square test. In the xuetangX dataset, the learning coverage in 53 of 76 courses fits Zipf’s law, but in all of 16 courses on the edX platform, the learning coverage rejects the Zipf’s law. The result from our study is expected to bring insight to the unique learning behavior on MOOC.

Bibtex

@Article{info8040150,
AUTHOR = {Li, Xiu and Men, Chang and Du, Zhihui and Liu, Jason and Li, Manli and Zhang, Xiaolei},
TITLE = {Investigating the Statistical Distribution of Learning Coverage in MOOCs},
JOURNAL = {Information},
VOLUME = {8},
YEAR = {2017},
NUMBER = {4},
ARTICLE NUMBER = {150},
URL = {http://www.mdpi.com/2078-2489/8/4/150},
ISSN = {2078-2489},
DOI = {10.3390/info8040150}
}

WSC’17 Paper: HPC Job Scheduling Simulation

Simulation of HPC Job Scheduling and Large-Scale Parallel Workloads, Mohammad Abu Obaida and Jason Liu. In Proceedings of the 2017 Winter Simulation Conference (WSC 2017), W. K. V. Chan, A. D’Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds., December 2017. [paper]

Abstract

The paper presents a simulator designed specifically for evaluating job scheduling algorithms on large-scale HPC systems. The simulator was developed based on the Performance Prediction Toolkit (PPT), which is a parallel discrete-event simulator written in Python for rapid assessment and performance prediction of large-scale scientific applications on supercomputers. The proposed job scheduler simulator incorporates PPT’s application models, and when coupled with the sufficiently detailed architecture models, can represent more realistic job runtime behaviors. Consequently, the simulator can evaluate different job scheduling and task mapping algorithms on the specific target HPC platforms more accurately.

Bibtex

@inproceedings{wsc17-jobsched,
title = {Simulation of HPC Job Scheduling and Large-Scale Parallel Workloads}, 
author = {Mohammad Abu Obaida and Jason Liu},
booktitle = {Proceedings of the 2017 Winter Simulation Conference (WSC 2017)}, 
editor = {W. K. V. Chan and A. D’Ambrogio and G. Zacharewicz and N. Mustafee and G. Wainer and E. Page},
month = {December},
year = {2017}
}