The Clustering of the Aquaculture Fisheries Companies in Indonesia Using the K-Prototypes and Two Step Cluster (TSC) Algorithm

Sri  Sulastri; Budi  Susetyo; I Made  Sumertajaya

Authors

Sri Sulastri Department of Statistics, IPB University, Bogor, 16680, Indonesia
Budi Susetyo Department of Statistics, IPB University, Bogor, 16680, Indonesia
I Made Sumertajaya

Keywords:

Cluster, Fisheries, K-Prototypes, Mixed Data, Two Step Cluster

Abstract

Background: Fisheries subsector has an important role in the Indonesian economy, especially for the aquaculture fisheries companies. Each aquaculture fisheries companies has its own characteristics like in terms of technical, financial, staffing, or input and output structures. It is necessary to clustering 258 aquaculture fisheries companies to make it easier to identify the characteristics of these different companies based on the characteristics of their cluster. One of the method that can be used to grouping objects is cluster analysis. On this study, the clustering process was using the K-Prototypes and Two Step Cluster (TSC) algorithm because the data that used in this study was the mixed data type (13 numerical and 8 categorical variables). Then this study would choose the best algorithm by the smallest ratio between the standard deviation within the cluster (S_W) and the standard deviation between cluster (S_B). The smallest ratio means that the diversity within clusters is quite homogeneous, while the diversity between clusters is heterogeneous. Based on the comparison of the ratio between S_W and S_B from the k-prototypes and the TSC algorithm, the k-prototypes algorithm with 6 clusters was the best algorithm for clustering the aquaculture fisheries companies in Indonesia. The result showed that the cluster 5 was the best cluster and the cluster 6 was the worst cluster related to the condition of the aquaculture fisheries companies in Indonesia. Cluster 5 which is characterized by most of the central companies in the form of PT and do the enlargement of sea water fish in fishpond and has a high numerical variable value. Cluster 6 which is characterized by most of the central companies in the form of PT and CV and do the hatchery of land water fish in water tubs and has the lowest value compared to other clusters.

References

. Food and Agriculture Organization of The United Nations. The State of World Fisheries and Aquaculture Sustainability in Action. Rome: Food and Agriculture Organization of The United Nations, 2020.

. Statistics Indonesia. Economic Indicators for July 2020. Jakarta: Statistics Indonesia, 2020.

. The Ministry of Marine Affairs and Fisheries of Indonesia. Marine and Fisheries Figures in 2018. Jakarta: The Ministry of Marine Affairs and Fisheries of Indonesia, 2018.

. Statistics Indonesia. Statistics of Fishery Establishment 2018. Jakarta: Statistics Indonesia, 2019.

. G. Gan, C. Ma, and J. Wu. Data Clustering Theory, Algorithms, and Applications. Virginia: American Statistical Association (ASA), 2007.

. J. Macqueen. “Some methods for classification and analysis of multivariate observations”. In Proceedings of The 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281-297.

. Z. Huang. “Clustering large data sets with mixed numeric and categorical values”. In Proceeding of the First Pacific Asia Knowledge Discovery and Data Mining Conference, 1997, pp. 21–34.

. T. Chiu, D. Fang, J. Chen, Y. Wang, and C. Jeris. “A robust and scalable clustering algorithm for mixed type attributes in large database environment”. In Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 263-268.

. A. Strehl and J. Ghosh. “A knowledge reuse framework for combining multiple partitions”. Journal of Machine Learning Research. vol. 3, pp. 583-617, 2002.

. J.C. Gower. “A general coefficient of similarity and some of its properties”. International Biometric Society. vol. 27, pp. 857-871, 1971.

. P.F. Lazarsfeld and N.W. Henry. Latent Structure Analysis. New York: Houghton Mifflin, 1968.

. D.T. Pham, M.M.S. Alvarez, and Y.I. Prostov. “Random search with k-prototypes algorithm for clustering mixed datasets”. In Proceedings of The Royal Society A: Mathematical, Physical, and Engineering Sciences, 2011, pp. 2387-2403.

. R. Nooraeni, J. Suprijadi, and Zulhanif. “K-prototype for clustering the mixed data type”. Journal of Theoretical and Applications Statistics: Biomedics, Industry, Business, and Social Statistics. vol. 13, pp. 9-16, 2019.

. S.R. Ahire and L. Landge. “K-prototype clustering with efficient summarization for topic evolutionary tweet stream clustering”. International Journal of Science and Research (IJSR). vol. 6, pp. 769-774, 2015.

. O. Pasin and H. Ankarah. “Comparison of EM and two step cluster method for mixed data: an application”. International Journal of Medical Science and Clinical Inventions. vol. 4, pp. 2768-2773, 2017.

. M. Kayri. “Two step cluster analysis in researches: a case study”. Eurasian Journal of Educational Research (EJER). vol. 7, pp. 89-99, 2007.

. A.D. Munthe, I.M. Sumertajaya, and U.D. Syafitri. “The clustering of villages and subdistricts based on poverty indicators by applying the TSC and k-prototypes algorithm”. Indonesian Journal of Statistics and Its Applications. vol. 2, pp. 63-76, 2018.

. B.S. Everitt, S. Landau, M. Leese, and D. Stahl D. Cluster Analysis 5th Edition. London, UK: John Wiley and Sons Ltd, 2011.

. J. Bacher, K. Wenzig, and M. Vogler. “SPSS Twostep cluster-a first evaluation”. Lehrstuhl fur Soziologie Arbeits- und Diskussionpapiere. vol. 2, pp. 1–20, 2004.

The Clustering of the Aquaculture Fisheries Companies in Indonesia Using the K-Prototypes and Two Step Cluster (TSC) Algorithm

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Developed By

Current Issue