On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Taimur Bakhshi, Bogdan Ghita

Research output: Contribution to journalArticlepeer-review

20 Downloads (Pure)

Abstract

Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting
methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host
behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by
proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived
per application through 𝑘-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial
unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to 𝑘-
means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train
and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.
Original languageEnglish
Article number2048302
Number of pages22
JournalJournal of Computer Networks and Communications
Volume2016
DOIs
Publication statusPublished - 6 Jun 2016
Externally publishedYes

Fingerprint

Dive into the research topics of 'On Internet Traffic Classification: A Two-Phased Machine Learning Approach'. Together they form a unique fingerprint.

Cite this