Institutional Research Information Service
UCL Logo
Please report any queries concerning the funding data grouped in the sections named "Externally Awarded" or "Internally Disbursed" (shown on the profile page) to your Research Finance Administrator. Your can find your Research Finance Administrator at https://www.ucl.ac.uk/finance/research/rs-contacts.php by entering your department
Please report any queries concerning the student data shown on the profile page to:

Email: portico-services@ucl.ac.uk

Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Publication Detail
Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers
  • Publication Type:
  • Authors:
    Langdon WB
  • publication date:
  • Place of publication:
    Kruislaan 413, NL-1098 SJ Amsterdam, The Netherlands
  • Report number:
  • Print ISSN:
  • Notes:
    email: W.Langdon@cs.ucl.ac.uk keywords: genetic algorithms, ngrams, trigrams, natural language processing, NLP notes: Also available as GECCO’2000 Late Breaking paper langdon:2000:ngramLB size: 10 pages
Ngrams offer fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classifier. 91 percent accuracy is found on binary classification on short multi-author technical English documents. This falls if more categories are used but 69 percent is obtained with 8 classes. Zipf law is found not to apply to trigrams.
Publication data is maintained in RPS. Visit https://rps.ucl.ac.uk
 More search options
UCL Researchers
Dept of Computer Science
University College London - Gower Street - London - WC1E 6BT Tel:+44 (0)20 7679 2000

© UCL 1999–2011

Search by