An Alternative Approach for Statistical Single-label Document Classification of Newspaper Articles

Andrew Ware, Georgios Mamakis, Athanasios Malamos

Allbwn ymchwil: Cyfraniad at gyfnodolynErthygladolygiad gan gymheiriaid

Crynodeb

Text classification is one of the most important sectors of machine learning theory. It enables a series of tasks among which are email spam filtering and context identification. Classification theory proposes a number of different techniques based on different technologies and tools. Classification systems are typically distinguished into single-label categorization and multi-label categorization systems, according to the number of categories they assign to each of the classified documents. In this paper, we present work undertaken in the area of single-label classification which resulted in a statistical classifier, based on the Naive Bayes assumption of statistical independence of word occurrence across a document. Our algorithm, takes into account cross-category word occurrence in deciding the class of a random document. Moreover, instead of estimating word co-occurrence in assigning a class, we estimate word contribution for a document to belong in a class. This approach outperforms other statistical classifiers as Naive Bayes Classifier and Language Models, as it was proven in our results.
Iaith wreiddiolSaesneg
Tudalennau (o-i)293 - 303
Nifer y tudalennau10
CyfnodolynJournal of Information Science
Cyfrol37
Rhif cyhoeddi3
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 18 Ebr 2011

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'An Alternative Approach for Statistical Single-label Document Classification of Newspaper Articles'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.

Dyfynnu hyn