BigData’17 Paper: Light Curve Anomaly Detection

Real-Time Anomaly Detection of Short Time-Scale GWAC Survey Light Curves, Tianzhi Feng, Zhihui Du, Yankui Sun, Jianyan Wei, Jing Bi, and Jason Liu. In Proceedings of 6th IEEE International Congress on Big Data, June 2017. [paper]

Ground-based Wide-Angle Camera array (GWAC) is a short time-scale survey telescope that can take images covering a field of view of over 5,000 square degrees every 15 seconds or even shorter. One scientific missions of GWAC is to accurately and quickly detect anomaly astronomical events. For that, a huge amount of data must be handled in real time. In this paper, we propose a new time series analysis model, called DARIMA (or Dynamic Auto-Regressive Integrated Moving Average), to identify the anomaly events that occur in light curves obtained from GWAC as early as possible with high degree of confidence. A major advantage of DARIMA is that it can dynamically adjust its model parameters during the real-time processing of the time series data. We identify the anomaly points based on the weighted prediction result of different time windows to improve accuracy. Experimental results using real survey data show that the DARIMA model can identify the first anomaly point for all light curves. We also evaluate our model with simulated anomaly events of various types embedded in the real time series data. The DARIMA model is able to generate the early warning triggers for all of them. The results from the experiments demonstrate that the proposed DARIMA model is a promising method for real-time anomaly detection of short time-scale GWAC light curves.
Not yet available.

ICBDA’17 Paper: MOOC Learning Zipf Law

Zipf’s Law in MOOC Learning Behavior, Chang Men, Xiu Li, Zhihui Du, Jason Liu, Manli Li, and Xiaolei Zhang. In Proceedings of the 2nd IEEE International Conference on Big Data Analysis (ICBDA 2017), March 2017. [paper]

Learners participating in Massive Open Online Courses (MOOC) have a wide range of backgrounds and motivations. Many MOOC learners sign up the courses to take a brief look; only a few go through the entire content, and even fewer are able to eventually obtain a certificate. We discovered this phenomenon after having examined 76 courses on the xuetangX platform. More specifically, we found that in many courses the learning coverage—one of the metrics used to estimate the learners’ active engagement with the online courses—observes a Zipf distribution. We apply the maximum likelihood estimation method to fit the Zipf’s law and test our hypothesis using a chi-square test. The result from our study is expected to bring insight to the unique learning behavior on MOOC and thus help improve the effectiveness of MOOC learning platforms and the design of courses.
Not yet available.

TOMACS’15 Paper: Cluster-Based Spatiotemporal Background Traffic

Cluster-Based Spatiotemporal Background Traffic Generation for Network Simulation, Ting Li and Jason Liu. ACM Transactions on Modeling and Computer Simulation (TOMACS), 25(1), Article No. 4, January 2015. [paper]

To reduce the computational complexity of large-scale network simulation, one needs to distinguish foreground traffic generated by the target applications one intends to study from background traffic that represents the bulk of the network traffic generated by other applications. Background traffic competes with foreground traffic for network resources and consequently plays an important role in determining the behavior of network applications. Existing background traffic models either operate only at coarse time granularity or focus only on individual links. There is little insight on how to meaningfully apply realistic background traffic over the entire network. In this article, we propose a method for generating background traffic with spatial and temporal characteristics observed from real traffic traces. We apply data clustering techniques to describe the behavior of end hosts as a function of multidimensional attributes and group them into distinct classes, and then map the classes to simulated routers so that we can generate traffic in accordance with the cluster-level statistics. The proposed traffic generator makes no assumption on the target network topology. It is also capable of scaling the generated traffic so that the traffic intensity can be varied accordingly in order to test applications under different and yet realistic network conditions. Experiments show that our method is able to generate traffic that maintains the same spatial and temporal characteristics as in the observed traffic traces.
author = {Li, Ting and Liu, Jason},
title = {Cluster-Based Spatiotemporal Background Traffic Generation for Network Simulation},
journal = {ACM Trans. Model. Comput. Simul.},
issue_date = {January 2015},
volume = {25},
number = {1},
month = nov,
year = {2014},
issn = {1049-3301},
pages = {4:1–4:25},
articleno = {4},
numpages = {25},
url = {},
doi = {10.1145/2667222},
acmid = {2667222},
publisher = {ACM},
address = {New York, NY, USA},