A Parallel Learning Algorithm for Text Categorization on PIRUN Beowulf Cluster
Canasai Kruengkrai and Chuleerat Jaruskulchai
Abstract
Text categorization is the process of classifying documents into predefined categories or classes based on their content. Since text data rapidly increase on the Internet, the scalability of the algorithm is required to handle such massive data. In this paper, we propose a parallel learning algorithm for text categorization based on the combination of the Expectation-Maximization (EM) algorithm and the naive Bayes. Our experiment performed on a 72 nodes Beowulf cluster called PIRUN. The preliminary experimental results show that our parallel implementation has reasonable speedup characteristics.
Download: pdf, ps
Canasai Kruengkrai