Invited Talk: Symbiotic Modeling and High-Performance Simulation

Symbiotic Modeling and High-Performance Simulation

January 19, 2017

Department of Computer Science, Colorado School of Mines
Host: Professor Tracy Camp

Abstract: Modeling and simulation plays an important role in the design analysis and performance evaluation of complex systems. Many of these systems, such as the internet and high-performance computing systems, involve a huge number of interrelated components and processes. Complex behaviors emerge as these components and processes inter-operate across multiple scales at various granularities. Modeling and simulation must be able to provide sufficiently accurate results while coping with the scale and the complexity of these systems. My talk will focus on some of our latest advances in high-performance modeling and simulation techniques. I will focus on two specific case studies, one on network emulation and the other on high-performance computing (HPC) modeling.
In the first case, I will present a novel distributed network emulation mechanism based on modeling symbiosis. Mininet is a container-based emulation environment that can study networks consisted of virtual hosts and OpenFlow-enabled virtual switches on Linux. It is well-known, however, that experiments using Mininet may lose fidelity for large-scale networks and heavy traffic load. We propose a symbiotic approach, where an abstract network model is used to coordinate the distributed emulation instances superimposed to represent the target network. In doing so, we can effectively study the behavior of real implementation of network applications on large-scale networks in a distributed environment.
In the second case, I will present our latest work on performance modeling of HPC architectures and applications. In collaboration with the Los Alamos National Laboratory, we have developed a highly efficient simulator, called Performance Prediction Toolkit (PPT), which can facilitate rapid and accurate performance prediction of large-scale scientific applications on existing and future HPC architectures.

HPCC’16 Paper: HPC Interconnect Model

Scalable Interconnection Network Models for Rapid Performance Prediction of HPC Applications, Kishwar Ahmed, Jason Liu, Stephan Eidenbenz, and Joe Zerr. In Proceedings of the 18th International Conference on High Performance Computing and Communications (HPCC 2016), December 2016. [paper] [slides]

Abstract

Performance Prediction Toolkit (PPT) is a simulator mainly developed at Los Alamos National Laboratory to facilitate rapid and accurate performance prediction of large-scale scientific applications on existing and future HPC architectures. In this paper, we present three interconnect models for performance prediction of large-scale HPC applications. They are based on interconnect topologies widely used in HPC systems: torus, dragonfly, and fat-tree. We conduct extensive validation tests of our interconnect models, in particular, using configurations of existing HPC systems. Results show that our models provide good accuracy for predicting the network behavior. We also present a performance study of a parallel computational physics application to show that our model can accurately predict the parallel behavior of large-scale applications.

Bibtex

@INPROCEEDINGS{Ahmed2016:scale-intercon,
author={K. Ahmed and J. Liu and S. Eidenbenz and J. Zerr},
booktitle={Proceedings of the IEEE 18th International Conference on High Performance Computing and Communications (HPCC)},
title={Scalable Interconnection Network Models for Rapid Performance Prediction of HPC Applications},
year={2016},
pages={1069-1078},
doi={10.1109/HPCC-SmartCity-DSS.2016.0151},
month={Dec},}

WSC’16 Paper: Simulation Reproducibility

Panel – Reproducible Research in Discrete-Event Simulation – A Must or Rather a Maybe? Adelinde M. Uhrmacher, Sally Brailsford, Jason Liu, Markus Rabe, and Andreas Tolk. In Proceedings of the 2016 Winter Simulation Conference (WSC 2016), T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds., December 2016. [paper]

Abstract

Scientific research should be reproducible, and as such also simulation research. However, the question is – is this really the case? In some application areas of simulation, e.g., cell biology, simulation studies cannot be published without data, models, methods, including computer code being made available for evaluation. With the applications and methodological areas of modeling and simulation, how the problem of reproducibility is assessed and addressed differs. The diversity of answers to this question will be illuminated by looking into the area of network simulations, simulation in logistics, in military, and health. Making different scientific cultures, different challenges, and different solutions in discrete event simulation explicit is central to improving the reproducibility and thus quality of discrete event simulation research.

Bibtex

@INPROCEEDINGS{Uhrmacher2016:panel, 
author={A. M. Uhrmacher and S. Brailsford and J. Liu and M. Rabe and A. Tolk}, 
booktitle={2016 Winter Simulation Conference (WSC)}, 
title={Panel--Reproducible research in discrete event simulation--A must or rather a maybe?}, 
year={2016}, 
pages={1301-1315}, 
doi={10.1109/WSC.2016.7822185}, 
month={Dec},}

REHPC’16 Paper: Program Power Profiling Based on Phases

Fast and Effective Power Profiling of Program Execution Based on Phase Behaviors, Xiaobin Ma, Zhihui Du and Jason Liu. In Proceedings of the 1st International Workshop on Resilience and/or Energy-Aware Techniques for High-Performance Computing (RE-HPC 2016), held in conjunction with the 7th International Green and Sustainable Computing Conference (IGSC 2016), November 2016. [paper]

Abstract

Power profiling tools based on fast and accurate workload analysis can be useful for job scheduling and resource allocation aiming to optimize the power consumption of large-scale high-performance computer systems. In this paper, we propose a novel method for predicting the power consumption of a complete workload or application by extrapolating the power consumption of only a few code segments of the same application obtained from measurement. As such, it provides a fast and yet effective way for predicting the power consumption of a single-threaded execution of a program on arbitrary architectures without having to profile the entire program’s execution. The latter would be costly to obtain, especially if it’s a long running program. Our method employs a set of code analysis tools to capture the program’s phase behavior and then adopts a multi-variable linear regression method to estimate the power consumption of the entire program. We use SPEC 2006 benchmark to evaluate the accuracy and effectiveness of our method. Experimental results show that our power profiling method achieves good accuracy in predicting program’s energy use with relatively small errors.

Bibtex

@INPROCEEDINGS{Ma2016:phases,
author={Xiaobin Ma and Zhihui Du and Jason Liu},
booktitle={Proceedings of the 7th International Green and Sustainable Computing Conference (IGSC)},
title={Fast and effective power profiling of program execution based on phase behaviors},
year={2016},
pages={1-8},
doi={10.1109/IGCC.2016.7892625},
month={Nov},}

Research Project: NVM-Enabled Host-Side Caches

Funding: National Science Foundation (CNS-1563883)
Investigators: Raju Rangaswami, Ming Zhao (ASU), Giri Narasimhan, Jason Liu
Duration: June 2016 – May 2019

Non-volatile memory (NVM) is a transformative technology that is dramatically changing how data storage systems of the future are built. This technology allows an unprecedented combination of performance and persistence into a single device. This project will develop a suite of storage caching techniques for this transformative technology along four complementary dimensions.

The first two usage dimensions address the selective use of NVM as host-side read caches for persistently stored data as well as using their persistence properties explicitly by developing fault-tolerant write caching solutions. The latter two develop advanced techniques for delivering storage quality of service (QoS) using NVM caches and building caching algorithms that are aware of data reduction techniques, such as deduplication and compression, for the NVM layer. Together, these contributions have the potential to transform enterprise data center storage stacks by readily adopting the best properties of current and future NVM technology. The expected performance benefits apply to a broad spectrum of computer systems and applications.

Educational activities will include the involvement of undergraduate students and incorporation of the project’s research findings into coursework. Planned outreach activities will focus on recruitment of under-represented students from minority groups in computer science for participation in the project. Transition of the new technologies to practice through open source distribution of the Linux operating system and KVM hypervisor code implementing the innovations are successful expectations.

PADS’16 Paper: Integrated Interconnect Model

An Integrated Interconnection Network Model for Large-Scale Performance Prediction, Kishwar Ahmed, Mohammad Obaida, Jason Liu, Stephan Eidenbenz, Nandakishore Santhi, and Guillaume Chapuis. In Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS 2016), May 2016. [paper]

Abstract

Interconnection network is a critical component of high- performance computing architecture and application co-design. For many scientific applications, the increasing communication complexity poses a serious concern as it may hinder the scaling properties of these applications on novel architectures. It is apparent that a scalable, efficient, and accurate interconnect model would be essential for performance evaluation studies. In this paper, we present an interconnect model for predicting the performance of large-scale applications on high-performance architectures. In particular, we present a sufficiently detailed interconnect model for Cray’s Gemini 3-D torus network. The model has been integrated with an implementation of the Message-Passing Interface (MPI) that can mimic most of its functions with packet-level accuracy on the target platform. Extensive experiments show that our integrated model provides good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance.

Bibtex

@inproceedings{Ahmed2016:interconnect,
 author = {Ahmed, Kishwar and Obaida, Mohammad and Liu, Jason and Eidenbenz, Stephan and Santhi, Nandakishore and Chapuis, Guillaume},
 title = {An Integrated Interconnection Network Model for Large-Scale Performance Prediction},
 booktitle = {Proceedings of the 2016 Annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation},
 series = {SIGSIM-PADS '16},
 year = {2016},
 isbn = {978-1-4503-3742-7},
 location = {Banff, Alberta, Canada},
 pages = {177--187},
 numpages = {11},
 url = {http://doi.acm.org/10.1145/2901378.2901396},
 doi = {10.1145/2901378.2901396},
 acmid = {2901396},
 publisher = {ACM},
 address = {New York, NY, USA},
} 

TOMACS’15 Paper: Symbiotic Network Simulation and Emulation

Symbiotic Network Simulation and Emulation, Miguel Erazo, Rong Rong, and Jason Liu. ACM Transactions on Modeling and Computer Simulation (TOMACS), 26(1), Article No. 2, December 2015. [paper]

abstractbibtex
A testbed capable of representing detailed operations of complex applications under diverse network conditions is invaluable for understanding the design and performance of new protocols and applications before their real deployment. We introduce a novel method that combines high-performance large-scale network simulation and high-fidelity network emulation, and thus enables real instances of network applications and protocols to run in real operating environments and be tested under simulated network settings. Using our approach, network simulation and emulation can form a symbiotic relationship, through which they are synchronized for an accurate representation of the network-scale traffic behavior. We introduce a model downscaling method along with an efficient queuing model and a traffic reproduction technique, which can significantly reduce the synchronization overhead and improve accuracy. We validate our approach with extensive experiments via simulation and with a real-system implementation. We also present a case study using our approach to evaluate a multipath data transport protocol.
@article{Erazo2015:symbiosis,
author = {Erazo, Miguel A. and Rong, Rong and Liu, Jason},
title = {Symbiotic Network Simulation and Emulation},
journal = {ACM Trans. Model. Comput. Simul.},
issue_date = {December 2015},
volume = {26},
number = {1},
month = jun,
year = {2015},
issn = {1049-3301},
pages = {2:1–2:25},
articleno = {2},
numpages = {25},
url = {http://doi.acm.org/10.1145/2717308},
doi = {10.1145/2717308},
acmid = {2717308},
publisher = {ACM},
address = {New York, NY, USA},
}

WSC’15 Paper: Simian Concept

The Simian Concept: Parallel Discrete Event Simulation with Interpreted Languages and Just-in-Time Compilation, Nandakishore Santhi, Stephan Eidenbenz, and Jason Liu. In Proceedings of the 2015 Winter Simulation Conference (WSC 2015), L. Yilmaz, W. K V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds., December 2015. [paper]

abstractbibtex
We introduce Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python. Simian reaps the benefits of interpreted languages—ease of use, fast development time, enhanced readability and a high degree of portability on different platforms—and, through the optional use of Just-In-Time (JIT) compilation, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++. This paper describes the main design concepts of Simian, and presents a benchmark performance study, comparing four Simian implementations (written in Python and Lua, with and without using JIT) against a traditionally compiled simulator, MiniSSF, written in C++. Our experiments show that Simian in Lua with JIT outperforms MiniSSF, sometimes by a factor of three under high computational workloads.
@INPROCEEDINGS{Santhi2015:simuan,
author={N. Santhi and S. Eidenbenz and J. Liu},
booktitle={Proceedings of the 2015 Winter Simulation Conference (WSC)},
title={The Simian Concept: Parallel Discrete Event Simulation with Interpreted Languages and Just-in-Time Compilation},
year={2015},
pages={3013-3024},
doi={10.1109/WSC.2015.7408405},
month={Dec},
}

DSRT’15 Paper: Scalable Emulation with Simulation Symbiosis

Toward Scalable Emulation of Future Internet Applications with Simulation Symbiosis, Jason Liu, Cesar Marcondes, Musa Ahmed, and Rong Rong. In Proceedings of the 19th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2015), October 2015. [paper]

abstractbibtex
Mininet is a popular container-based emulation environment built on Linux for testing OpenFlow applications. Using Mininet, one can compose an experimental network using a set of virtual hosts and virtual switches with flexibility. However, it is well understood that Mininet can only provide a limited capacity, both for CPU and network I/O, due to its underlying physical constraints. We propose a method for combining simulation and emulation to improve the scalability of network experiments. This is achieved by applying the symbiotic approach to effectively integrate emulation and simulation for hybrid experimentation. In this case, one can use Mininet to directly run OpenFlow applications on the virtual machines and software switches, with network connectivity represented by detailed simulation at scale.
@INPROCEEDINGS{Liu2015:emulation-symbiosis,
author={J. Liu and C. Marcondes and M. Ahmed and R. Rong},
booktitle={Proccedings of the 2015 IEEE/ACM 19th International Symposium on Distributed Simulation and Real Time Applications (DS-RT)},
title={Toward Scalable Emulation of Future Internet Applications with Simulation Symbiosis},
year={2015},
pages={68-77},
doi={10.1109/DS-RT.2015.19},
ISSN={1550-6525},
month={Oct},
}

HotStorage’15 Paper: To ARC or Not to ARC

To ARC or Not to ARC, Ricardo Santana, Steven Lyons, Ricardo Koller, Raju Rangaswami, and Jason Liu. In Proceedings of the 7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 2015), July 2015. [paper]

abstractbibtex
Cache replacement algorithms have focused on managing caches that are in the datapath. In datapath caches, every cache miss results in a cache update. Cache updates are expensive because they induce cache insertion and cache eviction overheads which can be detrimental to both cache performance and cache device lifetime. Nondatapath caches, such as host-side flash caches, allow the flexibility of not having to update the cache on each miss. We propose the multi-modal adaptive replacement cache (mARC), a new cache replacement algorithm that extends the adaptive replacement cache (ARC) algorithm for non-datapath caches. Our initial trace-driven simulation experiments suggest that mARC improves the cache performance over ARC while significantly reducing the number of cache updates for two sets of storage I/O workloads from MSR Cambridge and FIU.
@inproceedings {Santana2015:MARC,
author = {Ricardo Santana and Steven Lyons and Ricardo Koller and Raju Rangaswami and Jason Liu},
title = {To ARC or Not to ARC},
booktitle = {7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 15)},
year = {2015},
address = {Santa Clara, CA},
url = {https://www.usenix.org/conference/hotstorage15/workshop-program/presentation/santana},
publisher = {USENIX Association},
}