Abstract
Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting
methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host
behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by
proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived
per application through 𝑘-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial
unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to 𝑘-
means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train
and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.
methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host
behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by
proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived
per application through 𝑘-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial
unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to 𝑘-
means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train
and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.
Original language | English |
---|---|
Article number | 2048302 |
Number of pages | 22 |
Journal | Journal of Computer Networks and Communications |
Volume | 2016 |
DOIs | |
Publication status | Published - 6 Jun 2016 |
Externally published | Yes |