Please report any queries concerning the funding data grouped in the sections named "Externally Awarded" or "Internally Disbursed" (shown on the profile page) to
your Research Finance Administrator. Your can find your Research Finance Administrator at https://www.ucl.ac.uk/finance/research/rs-contacts.php by entering your department
Please report any queries concerning the student data shown on the profile page to:
Email: portico-services@ucl.ac.uk
Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Email: portico-services@ucl.ac.uk
Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Publication Detail
POSIT: Simultaneously Tagging Natural and Programming Languages
-
Publication Type:Conference
-
Authors:Partachi P-P, Treude C, Dash SK, Barr ET
-
Publisher:ACM
-
Publication date:06/2020
-
Published proceedings:42nd International Conference on Software Engineering (ICSE '20)
-
Name of conference:ICSE
-
Conference place:Seoul, Republic of Korea
-
Conference start date:23/05/2020
-
Conference finish date:29/05/2020
-
Language:English
-
Keywords:part-of-speech tagging, mixed-code, code-switching, language identification
Abstract
Software developers use a mix of source code and natural language
text to communicate with each other: Stack Overflowand Developer
mailing lists abound with this mixed text. Tagging this mixed text is
essential for making progress on two seminal software engineering
problems — traceability, and reuse via precise extraction of code
snippets from mixed text. In this paper, we borrow code-switching
techniques from Natural Language Processing and adapt them to
apply to mixed text to solve two problems: language identification
and token tagging. Our technique, POSIT, simultaneously provides
abstract syntax tree tags for source code tokens, part-of-speech tags
for natural language words, and predicts the source language of a
token in mixed text. To realize POSIT, we trained a biLSTM network
with a Conditional Random Field output layer using abstract syntax
tree tags from the CLANG compiler and part-of-speech tags from
the Standard Stanford part-of-speech tagger. POSIT improves the
state-of-the-art on language identification by 10.6% and PoS/AST
tagging by 23.7% in accuracy.
› More search options
UCL Researchers