All Issues

Volume 2, 2017

Volume 1, 2016

Big Data & Information Analytics

October 2016 , Volume 1 , Issue 4

Select all articles


ADERSIM-IBM partnership in big data
Ali Asgary and Jianhong Wu
2016, 1(4): 277-278 doi: 10.3934/bdia.2016010 +[Abstract](3190) +[HTML](292) +[PDF](164.86KB)

This short notes announces the recent development of the Advanced Disaster, Emergency and Rapid Response Simulation Initiative, in collaboration with IBM-Canada. Focus is on the Big Data analytics techniques and the IBM's Intelligent Operations Centre for Emergency Management platform.

Analyzing opinion dynamics in online social networks
Robin Cohen, Alan Tsang, Krishna Vaidyanathan and Haotian Zhang
2016, 1(4): 279-298 doi: 10.3934/bdia.2016011 +[Abstract](3332) +[HTML](305) +[PDF](2165.12KB)

In this paper, we examine the challenge of performing analyses of opinion dynamics in online social networks. We present a model for studying the influence exerted by peers within the network, emphasizing the role that skepticism can play with respect to establishing consensus of opinion. From here, we focus on some key extensions to the model, with respect to the nature of peers (their familiarity relationships, their empathy) and the presence of peers with particular profiles, as well as with specific clustering of peer relationships. Specifically, we show that the influence of trusted confidants on individuals behaves in a predictable fashion; moreover, we show that the underlying model is robust to individual variations in empathy within the population. These empirical results provide important insights to those seeking to examine and analyze patterns of influence within social networks.

Modeling daily guest count prediction
Fok Ricky, Lasek Agnieszka, Li Jiye and An Aijun
2016, 1(4): 299-308 doi: 10.3934/bdia.2016012 +[Abstract](3936) +[HTML](435) +[PDF](644.45KB)

We present a novel method for analyzing data with temporal variations. In particular, the problem of modeling daily guest count forecast for a restaurant with more than 60 chain stores is presented. We study the transaction data collected from each store, perform data preprocessing and feature constructions for the data. We then discuss different forecasting techniques based on data mining and machine learning techniques. A new modeling algorithm SW-LAR-LASSO is proposed. We compare multiple regression model, poisson regression model, and the proposed SW-LAR-LASSO model for prediction. Experimental results show that the approach of combining sliding windows and LAR-LASSO produces the best results with the highest precision. This approach can also be applied to other areas where temporal variations exist in the data.

A testbed to enable comparisons between competing approaches for computational social choice
John A. Doucette and Robin Cohen
2016, 1(4): 309-340 doi: 10.3934/bdia.2016013 +[Abstract](4289) +[HTML](369) +[PDF](1171.6KB)

Within artificial intelligence, the field of computational social choice studies the application of AI techniques to the problem of group decision making, especially through systems where each agent submits a vote taking the form of a total ordering over the alternatives (a preference). Reaching a reasonable decision becomes more difficult when some agents are unwilling or unable to rank all the alternatives, and appropriate voting systems must be devised to handle the resulting incomplete preference information. In this paper, we present a detailed testbed which can be used to perform information analytics in this domain. We illustrate the testbed in action for the context of determining a winner or putting candidates into ranked order, using data from realworld elections, and demonstrate how to use the results of the testbed to produce effective comparisons between competing algorithms.

Increase statistical reliability without losing predictive power by merging classes and adding variables
Wenxue Huang, Xiaofeng Li and Yuanyi Pan
2016, 1(4): 341-347 doi: 10.3934/bdia.2016014 +[Abstract](2267) +[HTML](187) +[PDF](307.5KB)

It is usually true that adding explanatory variables into a probability model increases association degree yet risks losing statistical reliability. In this article, we propose an approach to merge classes within the categorical explanatory variables before the addition so as to keep the statistical reliability while increase the predictive power step by step.

MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics
Yaguang Huangfu, Guanqing Liang and Jiannong Cao
2016, 1(4): 349-376 doi: 10.3934/bdia.2016015 +[Abstract](4822) +[HTML](671) +[PDF](690.1KB)

The computation core of many big data applications can be expressed as general matrix computations, including linear algebra operations and irregular matrix operations. However, existing parallel programming systems such as Spark do not have programming abstraction and efficient implementation for general matrix computations. In this paper, we present MatrixMap, a unified and efficient data-parallel programming framework for general matrix computations. MatrixMap provides powerful yet simple abstraction, consisting of a distributed in-memory data structure called bulk key matrix and a programming interface defined by matrix patterns. Users can easily load data into bulk key matrices and program algorithms into parallel matrix patterns. MatrixMap outperforms current state-of-the-art systems by employing three key techniques: matrix patterns with lambda functions for irregular and linear algebra matrix operations, asynchronous computation pipeline with context-aware data shuffling strategies for specific matrix patterns and in-memory data structure reusing data in iterations. Moreover, it can automatically handle the parallelization and distribute execution of programs on a large cluster. The experiment results show that MatrixMap is 12 times faster than Spark.

Disentangling data, information and knowledge
Subrata Dasgupta
2016, 1(4): 377-389 doi: 10.3934/bdia.2016016 +[Abstract](3279) +[HTML](294) +[PDF](265.5KB)

Information, data and knowledge constitute the fundamental 'stuff' of computing and one might assume that in the seven decades since the advent of the modern computer theorists and practitioners of computing can differentiate between the concepts they denote. And, of course, computer scientists do not have exclusive claims over these terms or concepts: sociologists, cultural scholars, economists, historians, natural scientists, philosophers, and the managerial class have them as part of their vocabularies. The surprising fact is that these terms and the concepts they denote are far from distinct. They form a tangled web. In this essay I address the question: what is the relationship between data, information and knowledge? I attempt to disentangle -and clarify -how these terms are in fact interpreted by practitioners in such diverse disciplines as information science, historical research, empirical sciences, cognitive science, data mining and computer programming and to identify what appears to be a common thread.

On identifiability of 3-tensors of multilinear rank $(1,\ L_{r},\ L_{r})$
Ming Yang, Dunren Che, Wen Liu, Zhao Kang, Chong Peng, Mingqing Xiao and Qiang Cheng
2016, 1(4): 391-401 doi: 10.3934/bdia.2016017 +[Abstract](3130) +[HTML](266) +[PDF](377.2KB)

In this paper, we study a specific big data model via multilinear rank tensor decompositions. The model approximates to a given tensor by the sum of multilinear rank $(1, \ L_{r}, \ L_{r})$ terms. And we characterize the identifiability property of this model from a geometric point of view. Our main results consists of exact identifiability and generic identifiability. The arguments of generic identifiability relies on the exact identifiability, which is in particular closely related to the well-known "trisecant lemma" in the context of algebraic geometry (see Proposition 2.6 in [1]). This connection discussed in this paper demonstrates a clear geometric picture of this model.




Email Alert

[Back to Top]