Institutional Research Information Service
UCL Logo
Please report any queries concerning the funding data grouped in the sections named "Externally Awarded" or "Internally Disbursed" (shown on the profile page) to your Research Finance Administrator. Your can find your Research Finance Administrator at https://www.ucl.ac.uk/finance/research/rs-contacts.php by entering your department
Please report any queries concerning the student data shown on the profile page to:

Email: portico-services@ucl.ac.uk

Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Publication Detail
Evolving Regular Expressions for GeneChip Probe Performance Prediction
  • Publication Type:
  • Authors:
    Langdon WB, Harrison AP
  • publication date:
  • Place of publication:
    University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
  • Report number:
  • Print ISSN:
  • Notes:
    keywords: genetic algorithms, genetic programming, Bioinformatics, Affymetrix GeneChip, strongly typed genetic programming, STGP, grammar, regular expression, egrep, gawk size: 18 pages
Commercial GeneChips provide highly redundant but noisy data. Rapid identification and subsequent rejection of bad data effectively increases the quality of the remaining data at little cost whilst serving as a basis for better understanding the bio-physics of short surface mounted DNA sequences. Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. Regular expressions can be evolved from a Backus-Naur form (BNF) context-free grammar using tree based strongly typed genetic programming written in gawk. Fitness is given by egrep. The quality of individual HG-U133A probes is indicated by its correlation across 6685 human tissue samples from NCBI’s GEO database with other measurements for the same gene. Low concordance indicates a poor probe. The evolved data mined motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. Section 4.6 gives more RE GP gawk implementation details. Code is available at ftp://cs.ucl.ac.uk/genetic/gp-code/RE_gp.tar
Publication data is maintained in RPS. Visit https://rps.ucl.ac.uk
 More search options
UCL Researchers
Dept of Computer Science
University College London - Gower Street - London - WC1E 6BT Tel:+44 (0)20 7679 2000

© UCL 1999–2011

Search by