OVERVIEW OF COMPANY TRAFFIC ANALYSIS METHODS BASED ON ENSEMBLE CLUSTERING
DOI:
https://doi.org/10.32689/maup.it.2024.4.18Keywords:
network traffic, big data, data analysis methods, clustering, collective solutions, ensemble modelsAbstract
The article provides an overview of existing research devoted to the analysis of company traffic using data clustering methods. Based on the analysis of scientific publications reviewed in this work, the importance of developing new cybernetic systems for traffic analysis in large organizations is emphasized. Such systems, first of all, should be aimed at optimizing routing, reducing costs and increasing delivery speed. The possibility of creating a similar cluster platform that provides distributed data processing and integration of various types of data storage is under consideration. For effective collection and storage of information, it is suggested to use such sources as server logs, data from traffic sensors, geolocation information and user movement routes, and depending on the specifics of the company's business processes, other data may be used. The purpose of the study is to systematize existing methods and approaches to the analysis of company traffic using various methods of data clustering, including collective (ensemble) solutions. The article uses the methodology of the analytical method of research, which includes a review of existing literature, analysis of previous works and systematization of knowledge in the field of traffic cluster analysis. Our research focuses on evaluating different approaches and technologies used to process big data. The scientific novelty consists in the consideration of ensemble clustering methods for network traffic analysis, which provide scalability, processing speed and flexibility of systems. The integration of various data sources is proposed to optimize business processes and decision-making, taking into account the modern challenges of Big Data. Conclusions. It was demonstrated that cluster solutions, including those based on collective (ensemble) algorithms, provide scalability, high processing speed, availability of data and services, as well as flexibility in the application of various tools and technologies. However, the implementation of such systems is associated with technical challenges that require deep knowledge in the field of Big Data, machine learning and cluster technologies, as well as significant costs for hardware, software and skilled professionals. Despite these difficulties, the application of collective cluster solutions for big data analysis can provide companies with significant competitive advantages by optimizing business processes, improving the quality of decision-making and increasing the overall efficiency of operations.
References
Джулій В. М., Солодєєва Л. В., Мірошніченко О. В. Метод класифікації додатків трафіка комп'ютерних мереж на основі машинного навчання в умовах невизначеності. Наукові праці. 2022. С. 73–82. DOI: https://doi.org/10.17721/2519-481X/2022/74-07.
Лунгол О. Огляд методів та стратегій кібербезпеки засобами штучного інтелекту. Кібербезпека: освіта, наука, техніка. 2024. Т. 1(25). С. 379–389.
Мамарєв В. М. Аналіз сучасних методів виявлення атак на ресурси інформаційно-телекомунікаційних систем. Ukrainian Information Security Research Journal. 2011. Т. 13(2 (51)).
Морозов Б. Дослідження методів аналізу мережевого трафіку. Матеріали Ⅸ Всеукраїнської студентської науково-технічної конференції „Природничі та гуманітарні науки. Актуальні питання“. 2016. Т. 1. С. 91–92.
Рубан І. В., Мартовицький В. О., Партика С. О. Класифікація методів виявлення аномалій в інформаційних системах. Системи озброєння і військова техніка. 2016. Т. 3. С. 100–105.
Ankerst M., Breunig M. M., Kriegel H. P., Sander J. OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record. 1999. 28(2), pp. 49–60.
Aouedi O., Piamrat K., Hamma S., Perera J.M. Network traffic analysis using machine learning: an unsupervised approach to understand and slice your network. Annals of Telecommunications. 2022. Т. 77(5). pp. 297–309.
Cheeseman P. C., Stutz J. C. Bayesian classification (AutoClass): theory and results. Advances in knowledge discovery and data mining. 1996. N. 180. pp. 153–180.
Dainotti A., De Donato W., Pescape A., Rossi P.S. Classification of network traffic via packet-level hidden markov models. IEEE GLOBECOM 2008-2008 IEEE Global Telecommunications Conference. 2008. pp. 1–5.
Erman J., Mahanti A., Arlitt M., Cohen I., Williamson C. Offline/realtime traffic classification using semi-supervised learning. Performance Evaluation. 2007. N 64, pp. 1194–1213.
Ester M., Kriegel H.P., Sander J., Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD. Vol. 96, No. 34. 1996. pp. 226–231.
Guha S., Rastogi R., Shim K. CURE: An efficient clustering algorithm for large databases. ACM Sigmod record. 1998. Т. 27(2). pp. 73–84.
Li J., Zhang H., Tang D., Lin C. Traffic classification using cluster analysis. 2021 International Conference on Computer Information Science and Artificial Intelligence (CISAI). 2021. pp. 463–467.
MacQueen J. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1967. Т. 1, No. 14. pp. 281–297.
McGregor A., Hall M., Lorier P., Brunskill J. Flow clustering using machine learning techniques. Passive and Active Network Measurement: 5th International Workshop, PAM 2004. 2004. pp. 205–214.
NetMate Meter. URL: http://sourceforge.net/projects/netmate-meter.
Rodriguez Rodriguez J. E., Garcia V.H.M., Usaquén M.A.O. Corporate networks traffic analysis for knowledge management based on random interactions clustering algorithm. Knowledge Management in Organizations: 13th International Conference, KMO 2018. 2018. pp. 523–536.
S. Lloyd, "Least squares quantization in PCM", IEEE transactions on information theory. 1982. vol. 28, no. 2, pp. 129–137.
Subramani K., Velkov A., Ntoutsi I., Kroger P., Kriegel H. P. Density-based community detection in social networks. In 2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application. 2011. pp. 1–8.
Takyi K., Bagga A., Goopta P. Clustering techniques for traffic classification: A comprehensive review. 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). 2018. pp. 224–230.
Wang Y., Xiang Y., Zhang J., Zhou W., Wei G., Yang L.T. Internet traffic classification using constrained clustering. IEEE transactions on parallel and distributed systems. 2013. Т. 25(11). pp. 2932–2943.
Wang P., Lin S.C., Luo M. A framework for QoS-aware traffic classification using semi-supervised machine learning in SDNs. 2016 IEEE international conference on services computing (SCC). 2016. pp. 760–765.
Zander S., Nguyen T., Armitage G. Automated traffic classification and application identification using machine learning. The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05). 2005. pp. 250–257.
Zhang Tian, Raghu Ramakrishnan, Miron Livny. BIRCH: A new data clustering algorithm and its applications. Data mining and knowledge discovery. 1997. N. 1. pp. 141–182.