UCL  IRIS
Institutional Research Information Service
UCL Logo
Please report any queries concerning the funding data grouped in the sections named "Externally Awarded" or "Internally Disbursed" (shown on the profile page) to your Research Finance Administrator. Your can find your Research Finance Administrator at http://www.ucl.ac.uk/finance/research/post_award/post_award_contacts.php by entering your department
Please report any queries concerning the student data shown on the profile page to:

Email: portico-services@ucl.ac.uk

Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Publication Detail
POSIT: Simultaneously Tagging Natural and Programming Languages
  • Publication Type:
    Conference
  • Authors:
    Partachi P-P, Treude C, Dash SK, Barr ET
  • Publisher:
    ACM
  • Publication date:
    23/05/2020
  • Published proceedings:
    42nd International Conference on Software Engineering (ICSE '20)
  • Name of conference:
    ICSE
  • Conference place:
    Seoul, Republic of Korea
  • Conference start date:
    23/05/2020
  • Conference finish date:
    29/05/2020
  • Language:
    English
  • Keywords:
    part-of-speech tagging, mixed-code, code-switching, language identification
Abstract
Software developers use a mix of source code and natural language text to communicate with each other: Stack Overflowand Developer mailing lists abound with this mixed text. Tagging this mixed text is essential for making progress on two seminal software engineering problems — traceability, and reuse via precise extraction of code snippets from mixed text. In this paper, we borrow code-switching techniques from Natural Language Processing and adapt them to apply to mixed text to solve two problems: language identification and token tagging. Our technique, POSIT, simultaneously provides abstract syntax tree tags for source code tokens, part-of-speech tags for natural language words, and predicts the source language of a token in mixed text. To realize POSIT, we trained a biLSTM network with a Conditional Random Field output layer using abstract syntax tree tags from the CLANG compiler and part-of-speech tags from the Standard Stanford part-of-speech tagger. POSIT improves the state-of-the-art on language identification by 10.6% and PoS/AST tagging by 23.7% in accuracy.
Publication data is maintained in RPS. Visit https://rps.ucl.ac.uk
 More search options
UCL Researchers
Author
Dept of Computer Science
University College London - Gower Street - London - WC1E 6BT Tel:+44 (0)20 7679 2000

© UCL 1999–2011

Search by