An improved k-means algorithm for text clustering
-
Graphical Abstract
-
Abstract
Random selection of initial cluster centroid in k-means algorithm for text clustering resulted in local optimization of clustering results, and isolated points and indeterminate cluster number led to low accuracy and slow convergence speed of k-means algorithm. So an improved k-means algorithm for text clustering was proposed. In the proposed algorithm, fp-growth algorithm was used for mining frequent item sets of text, and frequent item sets of text were filtered to obtain the core frequent item sets, and then the core frequent item sets were adopted to generate initial cluster centroid and the number of clustering. Finally k-means algorithm was applied for text clustering with the generated initial cluster centroid and the number of clustering. The results of text clustering experiment on Sina microblog dataset show that the improved k-means algorithm can effectively improve the accuracy of text clustering and accelerate the convergence speed, and has strong robustness.
-
-