Data Analytics Archives

DSS-2018 Paper: Analysis of MOOC Learning Rhythms

Analysis of MOOC Learning Rhythms, Jingjing He, Chang Men, Senbiao Fang, Zhihui Du, Jason Liu, and Manli Li. In Proceedings of the 4th IEEE International Conference on Data Science and Systems (DSS-2018), June 2018. [to appear]

Abstract

With the increasing popularity of Massive Open Online Course (MOOC), a large amount of data has been collected by the MOOC platforms about the users and their interactions with the platforms. Many studies analyze the data to understand the online learning behavior of the students in order to improve the courses and services. In this paper, we propose the concept of learning rhythms. We divide the students into three groups corresponding to the level of engagement with the course. We capture the learning behavior on different learning units by observing the delay time and the study time of the students, and use them to infer the eagerness and intensity applied to studying the materials. We use the frequent tree mining technique to extract frequent patterns. The most frequently occurred subtrees are identified as typical learning rhythms. To evaluate our method, we analyze the data provided by XuetangX, an online learning platform in China, and study the learning rhythms using one of its most popular courses.

Bibtex

@inproceedings{dss18-mooc,
title = {Analysis of MOOC Learning Rhythms},
author = {Jingjing He and Chang Men and Senbiao Fang and Zhihui Du and Jason Liu and Manli Li},
booktitle = {Proceedings of the 4th IEEE International Conference on Data Science and Systems (DSS-2018)},
month = {June},
year = {2018}
}

Information’17 Paper: Investigating the Statistical Distribution of Learning Coverage in MOOCs

Investigating the Statistical Distribution of Learning Coverage in MOOCs, Xiu Li, Chang Men, Zhihui Du, Jason Liu, Manli Li, and Xiaolei Zhang. Information 2017, 8(4), 153; doi:10.3390/info8040150 – 20 November 2017. [paper]

Abstract

Learners participating in Massive Open Online Courses (MOOC) have a wide range of backgrounds and motivations. Many MOOC learners enroll in the courses to take a brief look; only a few go through the entire content, and even fewer are able to eventually obtain a certificate. We discovered this phenomenon after having examined 92 courses on both xuetangX and edX platforms. More specifically, we found that the learning coverage in many courses—one of the metrics used to estimate the learners’ active engagement with the online courses—observes a Zipf distribution. We apply the maximum likelihood estimation method to fit the Zipf’s law and test our hypothesis using a chi-square test. In the xuetangX dataset, the learning coverage in 53 of 76 courses fits Zipf’s law, but in all of 16 courses on the edX platform, the learning coverage rejects the Zipf’s law. The result from our study is expected to bring insight to the unique learning behavior on MOOC.

Bibtex

@Article{info8040150,
AUTHOR = {Li, Xiu and Men, Chang and Du, Zhihui and Liu, Jason and Li, Manli and Zhang, Xiaolei},
TITLE = {Investigating the Statistical Distribution of Learning Coverage in MOOCs},
JOURNAL = {Information},
VOLUME = {8},
YEAR = {2017},
NUMBER = {4},
ARTICLE NUMBER = {150},
URL = {http://www.mdpi.com/2078-2489/8/4/150},
ISSN = {2078-2489},
DOI = {10.3390/info8040150}
}

BigData’17 Paper: Light Curve Anomaly Detection

Real-Time Anomaly Detection of Short Time-Scale GWAC Survey Light Curves, Tianzhi Feng, Zhihui Du, Yankui Sun, Jianyan Wei, Jing Bi, and Jason Liu. In Proceedings of 6th IEEE International Congress on Big Data, June 2017. [paper]

Abstract

Ground-based Wide-Angle Camera array (GWAC) is a short time-scale survey telescope that can take images covering a field of view of over 5,000 square degrees every 15 seconds or even shorter. One scientific missions of GWAC is to accurately and quickly detect anomaly astronomical events. For that, a huge amount of data must be handled in real time. In this paper, we propose a new time series analysis model, called DARIMA (or Dynamic Auto-Regressive Integrated Moving Average), to identify the anomaly events that occur in light curves obtained from GWAC as early as possible with high degree of confidence. A major advantage of DARIMA is that it can dynamically adjust its model parameters during the real-time processing of the time series data. We identify the anomaly points based on the weighted prediction result of different time windows to improve accuracy. Experimental results using real survey data show that the DARIMA model can identify the first anomaly point for all light curves. We also evaluate our model with simulated anomaly events of various types embedded in the real time series data. The DARIMA model is able to generate the early warning triggers for all of them. The results from the experiments demonstrate that the proposed DARIMA model is a promising method for real-time anomaly detection of short time-scale GWAC light curves.

Bibtex

@INPROCEEDINGS{bd17-lightcurve, 
author={Tianzhi Feng and Zhihui Du and Yankui Sun and Jianyan Wei and Jing Bi and Jason Liu},
booktitle={2017 IEEE International Congress on Big Data (BigData Congress)}, 
title={Real-Time Anomaly Detection of Short-Time-Scale GWAC Survey Light Curves}, 
pages={224-231}, 
month={June},
year={2017}
}

ICBDA’17 Paper: MOOC Learning Zipf Law

Zipf’s Law in MOOC Learning Behavior, Chang Men, Xiu Li, Zhihui Du, Jason Liu, Manli Li, and Xiaolei Zhang. In Proceedings of the 2nd IEEE International Conference on Big Data Analysis (ICBDA 2017), March 2017. [paper]

Abstract

Learners participating in Massive Open Online Courses (MOOC) have a wide range of backgrounds and motivations. Many MOOC learners sign up the courses to take a brief look; only a few go through the entire content, and even fewer are able to eventually obtain a certificate. We discovered this phenomenon after having examined 76 courses on the xuetangX platform. More specifically, we found that in many courses the learning coverage—one of the metrics used to estimate the learners’ active engagement with the online courses—observes a Zipf distribution. We apply the maximum likelihood estimation method to fit the Zipf’s law and test our hypothesis using a chi-square test. The result from our study is expected to bring insight to the unique learning behavior on MOOC and thus help improve the effectiveness of MOOC learning platforms and the design of courses.

Bibtex

@inproceedings{mooczipf,
title = {Zipf’s Law in MOOC Learning Behavior},
author = {Chang Men and Xiu Li and Zhihui Du and Jason Liu and Manli Li and Xiaolei Zhang},
booktitle = {Proceedings of the 2nd IEEE International Conference on Big Data Analysis (ICBDA 2017)},
month = {March},
year = {2017}
}

TOMACS’15 Paper: Cluster-Based Spatiotemporal Background Traffic

Cluster-Based Spatiotemporal Background Traffic Generation for Network Simulation, Ting Li and Jason Liu. ACM Transactions on Modeling and Computer Simulation (TOMACS), 25(1), Article No. 4, January 2015. [paper]

abstractbibtex

To reduce the computational complexity of large-scale network simulation, one needs to distinguish foreground traffic generated by the target applications one intends to study from background traffic that represents the bulk of the network traffic generated by other applications. Background traffic competes with foreground traffic for network resources and consequently plays an important role in determining the behavior of network applications. Existing background traffic models either operate only at coarse time granularity or focus only on individual links. There is little insight on how to meaningfully apply realistic background traffic over the entire network. In this article, we propose a method for generating background traffic with spatial and temporal characteristics observed from real traffic traces. We apply data clustering techniques to describe the behavior of end hosts as a function of multidimensional attributes and group them into distinct classes, and then map the classes to simulated routers so that we can generate traffic in accordance with the cluster-level statistics. The proposed traffic generator makes no assumption on the target network topology. It is also capable of scaling the generated traffic so that the traffic intensity can be varied accordingly in order to test applications under different and yet realistic network conditions. Experiments show that our method is able to generate traffic that maintains the same spatial and temporal characteristics as in the observed traffic traces.

@article{Li2014:bgtraffic,
author = {Li, Ting and Liu, Jason},
title = {Cluster-Based Spatiotemporal Background Traffic Generation for Network Simulation},
journal = {ACM Trans. Model. Comput. Simul.},
issue_date = {January 2015},
volume = {25},
number = {1},
month = nov,
year = {2014},
issn = {1049-3301},
pages = {4:1–4:25},
articleno = {4},
numpages = {25},
url = {http://doi.acm.org/10.1145/2667222},
doi = {10.1145/2667222},
acmid = {2667222},
publisher = {ACM},
address = {New York, NY, USA},
}