Classification and adaptive novel class detection of feature-evolving data streams

Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, Jing Gao, Jiawei Han, Ashok Srivastava, Nikunj C. Oza

    Research output: Contribution to journalArticlepeer-review

    82 Citations (Scopus)

    Abstract

    Data stream classification poses many challenges to the data mining community. In this paper, we address four such major challenges, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Since a data stream is theoretically infinite in length, it is impractical to store and use all the historical data for training. Concept-drift is a common phenomenon in data streams, which occurs as a result of changes in the underlying concepts. Concept-evolution occurs as a result of new classes evolving in the stream. Feature-evolution is a frequently occurring process in many streams, such as text streams, in which new features (i.e., words or phrases) appear as the stream progresses. Most existing data stream classification techniques address only the first two challenges, and ignore the latter two. In this paper, we propose an ensemble classification framework, where each classifier is equipped with a novel class detector, to address concept-drift and concept-evolution. To address feature-evolution, we propose a feature set homogenization technique. We also enhance the novel class detection module by making it more adaptive to the evolving stream, and enabling it to detect more than one novel class at a time. Comparison with state-of-the-art data stream classification techniques establishes the effectiveness of the proposed approach.

    Original languageEnglish
    Article number6205751
    Pages (from-to)1484-1497
    Number of pages14
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume25
    Issue number7
    DOIs
    Publication statusPublished - Jun 3 2013

    Keywords

    • Data stream
    • concept-evolution
    • novel class
    • outlier

    ASJC Scopus subject areas

    • Information Systems
    • Computer Science Applications
    • Computational Theory and Mathematics

    Fingerprint

    Dive into the research topics of 'Classification and adaptive novel class detection of feature-evolving data streams'. Together they form a unique fingerprint.

    Cite this