UCL  IRIS
Institutional Research Information Service
UCL Logo
Please report any queries concerning the funding data grouped in the sections named "Externally Awarded" or "Internally Disbursed" (shown on the profile page) to your Research Finance Administrator. Your can find your Research Finance Administrator at https://www.ucl.ac.uk/finance/research/rs-contacts.php by entering your department
Please report any queries concerning the student data shown on the profile page to:

Email: portico-services@ucl.ac.uk

Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Publication Detail
Learning a Functional Grammar of Protein Domains using Natural Language Word Embedding Techniques.
Abstract
In this paper, using word2vec, a widely-used natural language processing method, we demonstrate that proteins domains may have a learnable implicit semantic "meaning" in the context of their functional contributions to multi-domain proteins in which they are found. Word2vec is a group of models which can be used to produce semantically meaningful embeddings of words or tokens in a fixed-dimension vector space. In this work, we treat multi-domain proteins as "sentences" where domain identifiers are tokens which may be considered as "words". Using all InterPro [1] pfam domain assignments we observe that the embedding could be used to suggest putative GO assignments for Pfam [2] Domains of Unknown Function. This article is protected by copyright. All rights reserved.
Publication data is maintained in RPS. Visit https://rps.ucl.ac.uk
 More search options
UCL Researchers
Author
Dept of Computer Science
Author
Dept of Computer Science
University College London - Gower Street - London - WC1E 6BT Tel:+44 (0)20 7679 2000

© UCL 1999–2011

Search by