Volume :8 , Issue :2 ,Page :92-107
Abstract : Internet is the rapidly growing information ga llery that contains rich textual information. This rapid growth makes it difficult for the users to locate relevant informa tion quickly on the web. Document retrieval, categorization, routing and filtering systems are of ten based on text classification. Text Classification means allocating a document to one or more categories or classes. The ability to accurately perform a classification task depends on th e representations of documents to be classified. In this research work, new ensemble classification methods are proposed for homogeneous ensemble classifiers using bagging and heterogeneous ensemble classifiers using arcing classifier and their performances are analyzed in ter ms of accuracy. A Classifier ensemble is designed using Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algor ithm (GA) as base classifiers. The feasibility and the benefits of the proposed approaches are demo nstrated by means of news groups dataset that is widely used in the field of sentiment classification. The main originality of the proposed approach is based on five main parts: preprocessing phase, do cument indexing phase, feature reduction phase, classification phase and combining phase to aggregate the best classification results. A wide range of comparative experiments are conduc ted for newsgroups datas et. The accuracy of base classifiers is compared with homogeneous and heterogeneous models for newsgroups dataset. The proposed ensemble methods provide significant improv ement of accuracy compared to individual classifiers and also heterogeneous models exhibit better results than homogeneous models for newsgroups dataset.