Implementasi Text Mining Pengelompokkan Dokumen Skripsi Menggunakan Metode K-Means Clustering

Authors

  • Dezty Adhe Chajannah Rachman Laboratorium Statistika Komputasi FMIPA Universitas Mulawarman
  • Rito Goejantoro Laboratorium Statistika Komputasi FMIPA Universitas Mulawarman
  • Fidia Deny Tisna Amijaya Laboratorium Matematika Komputasi FMIPA Universitas Mulawarman

DOI:

https://doi.org/10.30872/eksponensial.v11i2.660

Keywords:

documents, thesis, text mining, k-means clustering, silhouette coefficient

Abstract

Text mining is the text analysis that automatically discover quality information from a series of texts that is summarized in a document. K-Means Clustering method is often used because of its ability to make a group of large amounts of data with relatively fast and efficient computing time. The purpose of this study is to determine the optimal number of the groups formed from the thesis documents and determine the results of the groups formed. This study is using Nazief and Adriani algorithms for the stemming step, Euclidean Similarity to calculate document distances, and Silhouette Coefficient to test the cluster validity. The sample in this study is 119 thesis documents of Statistics Study Program, Mathematics Department, Faculty of Mathematics and Natural Sciences, graduates of 2016-2018. Based on the results of the analysis, the optimal number of groups formed is two clusters with a silhouette coefficient of 0.12. The results of the grouping formed are two clusters with the total of the first cluster is 85 documents and the second cluster is 34 documents. The first cluster is dominated by studies with data mining especially classification, time series analysis, regression analysis, survival analysis, spatial analysis and operational research, and the second cluster is dominated by studies with multivariate analysis, quality control, and insurance mathematics.

Published

2021-01-19

Issue

Section

Articles