Big Data & Information Analytics
January 2017 , Volume 2 , Issue 1
Special issue on computational intelligence methods for big data and information analytics
Select all articles
Incremental learning has been investigated by many researchers. However, only few works have considered the situation where class imbalance occurs. In this paper, class imbalanced incremental learning was investigated and an ensemble-based method, named Selective Further Learning (SFL) was proposed. In SFL, a hybrid ensemble of Naive Bayes (NB) and Multilayer Perceptrons (MLPs) were employed. For the ensemble of MLPs, parts of the MLPs were selected to learning from the new data set. Negative Correlation Learning (NCL) with Dynamic Sampling (DyS) for handling class imbalance was used as the basic training method. Besides, as an additive model, Naive Bayes was employed as an individual of the ensemble to learn the data sets incrementally. A group of weights (with the number of the classes as the length) are updated for every individual of the ensemble to indicate the 'confidence' of the individual learning about the classes. The ensemble combines all of the individuals by weighted average according to the weights. Experiments on 3 synthetic data sets and 10 real world data sets showed that SFL was able to handle class imbalance incremental learning and outperform a recently related approach.
This paper addresses the problem of finding the low-rank and sparse components of a given matrix. The problem involves two conflicting objective functions, reducing the rank and sparsity of each part simultaneously. Previous methods combine two objectives into a single objective penalty function to solve with traditional numerical optimization approaches. The main contribution of this paper is to put forward a multiobjective method to decompose the given matrix into low-rank component and sparse part. We optimize two objective functions with an evolutionary multiobjective algorithm MOEA/D. Another contribution of this paper, a modified low-rank and sparse matrix model is proposed, which simplifying the variable of objective functions and improving the efficiency of multiobjective optimization. The proposed method obtains a set of solutions with different trade-off between low-rank and sparse objectives, and decision makers can choose one or more satisfied decomposed results according to different requirements directly. Experiments conducted on artificial datasets and nature images, show that the proposed method always obtains satisfied results, and the convergence, stability and robustness of the proposed method is acceptable.
Inspired by the representation designed for floorplanning problems, in this paper, we proposed a new representation, namely the moving block sequence (MBS), for resource investment project scheduling problems (RIPSPs). Since each activity of a project in RIPSPs has fixed duration and resource demand, we consider an activity as a rectangle block whose width is equal to the duration of the activity and height the resource needed by the activity. Four move modes are designed for activities, by using which the activity can move to the appropriate position. Therefore, the new representation of the project of RIPSPs consists of two parts: an activity list and a move mode list. By initializing the move modes randomly for each activity and moving it appropriately, the activity list can be decoded into valid solutions of RIPSPs. Since the decoding method of MBS guarantees that after moved, each activity is scheduled in the left-most and bottom-most position within a coordinate, which means that each activity in the corresponding project is arranged as early as possible when the precedence constraints and resource demands are satisfied. In addition, the multiagent evolutionary algorithm (MAEA) is employed to incorporate with the newly designed MBS representation in solving RIPSPs. With the intrinsic properties of MBS in mind, four behaviors, namely the crossover, mutation, competition, and self-learning operators are designed for agents in MAEA. To test the performance of our algorithm, 450 problem instances are used and the experimental results demonstrate the good performance of the proposed representation.
Vein recognition is a new identity authentication technology. It attracts many researchers' attention due to its good security and reliability. This paper proposes a wrist vein recognition system. The proposed system identifies people according to the characteristics of their wrist veins. A special camera is reformed to obtain the wrist vein images and an image dataset is established. Principal component analysis (PCA) is adopted to eliminate the redundant information in the images and extract their global features. The global features are classified by Two-hidden-layer Extreme Learning Machine (TELM). TELM is compared with original Extreme Learning Machine (ELM) and other two algorithms Support Vector Machine (SVM) and Naive Bayes (NB). Experiment results show that the accuracy of the proposed system is higher than the other three algorithms. Though the speed of TELM is not the fastest, it is able to recognize images within satisfactory time.
Text categorization is the fundamental bricks of other related researches in NLP. Up to now, researchers have proposed many effective text categorization methods and gained well performance. However, these methods are generally based on the raw features or low level features, e.g., tf or tfidf, while neglecting the semantic structures between words. Complex semantic information can influence the precision of text categorization. In this paper, we propose a new method to handle the semantic correlations between different words and text features from the representations and the learning schemes. We represent the document as multiple instances based on word2vec. Experiments validate the effectiveness of proposed method compared with those state-of-the-art text categorization methods.
The mate selection plays a key role in natural evolution process. Although a variety of mating strategies have been proposed in the community of evolutionary computation, the importance of mate selection has been ignored. In this paper, we propose a clustering based mate selection (CMS) strategy for evolutionary algorithms (EAs). In CMS, the population is partitioned into clusters and only the solutions in the same cluster are chosen for offspring reproduction. Instead of doing a whole new clustering process in each EA generation, the clustering iteration process is combined with the evolution iteration process. The combination of clustering and evolving processes benefits EAs by saving the cost to discover the population structure. To demonstrate this idea, a CMS utilizing the k-means clustering method is proposed and applied to a state-of-the-art EA. The experimental results show that the CMS strategy is promising to improve the performance of the EA.
Network robustness stands for the capability of networks in resisting failures or attacks. Many robustness measures have been proposed to evaluate the robustness of various types of networks, such as small-world and scale-free networks. However, the robustness of biological networks is different for their special structures related to the unique functionality. Cancer signaling networks which show the information transformation of cancers in molecular level always appear with robust complex structures which mean information exchange in the networks do not depend on skimp pathways in which resulting the low rate of cure, high rate of recurrence and especially, the short time in survivability caused by constantly destruction of cancer. So a network metric that shows significant changes when one node is removed, and further to correlate that metric with survival probabilities for patients who underwent cancer chemotherapy is meaningful. Therefore, in this paper, the relationship between 14 typical cancer signaling networks robustness and those cancers patient survivability are studied. Several widely used robustness measures are included, and we find that the natural connectivity, in which the redundant circles are satisfied with the need of information exchange of cancer signaling networks, is negatively correlated to cancer patient survivability. Furthermore, the top three affected nodes measured by natural connectivity are obtained and the analysis on these nodes degree, closeness centrality and betweenness centrality are followed. The result shows that the node found are important so we believe that natural connectivity will be a great help to cancer treatment.
Add your name and e-mail address to receive news of forthcoming issues of this journal:
[Back to Top]