Proceedings of the XXth Conference of Open Innovations Association FRUCT | 卷:28 |
Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix | |
Irina Krasnova1  Vladimir Deart2  Vladimir Mankov3  | |
[1] MTUCI, Russia; | |
[2] Moscow Techical University of Communications and Informatics, Russia; | |
[3] Nokia Training Center, Russia; | |
关键词: traffic classification; agglomerative clustering; distance matrix; random forest; extremely randomized trees; random trees embedding; euclidean distance; manhattan distance; machine learning; | |
DOI : 10.23919/FRUCT50888.2021.9347616 | |
来源: DOAJ |
【 摘 要 】
We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.
【 授权许可】
Unknown