UCL  IRIS
Institutional Research Information Service
UCL Logo
Please report any queries concerning the funding data grouped in the sections named "Externally Awarded" or "Internally Disbursed" (shown on the profile page) to your Research Finance Administrator. Your can find your Research Finance Administrator at https://www.ucl.ac.uk/finance/research/rs-contacts.php by entering your department
Please report any queries concerning the student data shown on the profile page to:

Email: portico-services@ucl.ac.uk

Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Publication Detail
Gesture Recognition in Robotic Surgery with Multimodal Attention
  • Publication Type:
    Journal article
  • Publication Sub Type:
    Article
  • Authors:
    Van Amsterdam B, Funke I, Edwards E, Speidel S, Collins J, Sridhar A, Kelly J, Clarkson MJ, Stoyanov D
  • Publication date:
    02/02/2022
  • Journal:
    IEEE Transactions on Medical Imaging
  • Status:
    Published online
  • Print ISSN:
    0278-0062
Abstract
Automatically recognising surgical gestures from surgical data is an important building block of automated activity recognition and analytics, technical skill assessment, intra-operative assistance and eventually robotic automation. The complexity of articulated instrument trajectories and the inherent variability due to surgical style and patient anatomy make analysis and fine-grained segmentation of surgical motion patterns from robot kinematics alone very difficult. Surgical video provides crucial information from the surgical site with context for the kinematic data and the interaction between the instruments and tissue. Yet sensor fusion between the robot data and surgical video stream is non-trivial because the data have different frequency, dimensions and discriminative capability. In this paper, we integrate multimodal attention mechanisms in a two-stream temporal convolutional network to compute relevance scores and weight kinematic and visual feature representations dynamically in time, aiming to aid multimodal network training and achieve effective sensor fusion. We report the results of our system on the JIGSAWS benchmark dataset and on a new in vivo dataset of suturing segments from robotic prostatectomy procedures. Our results are promising and obtain multimodal prediction sequences with higher accuracy and better temporal structure than corresponding unimodal solutions. Visualization of attention scores also gives physically interpretable insights on network understanding of strengths and weaknesses of each sensor.
Publication data is maintained in RPS. Visit https://rps.ucl.ac.uk
 More search options
UCL Researchers
Author
Dept of Med Phys & Biomedical Eng
Author
Dept of Computer Science
Author
Dept of Computer Science
University College London - Gower Street - London - WC1E 6BT Tel:+44 (0)20 7679 2000

© UCL 1999–2011

Search by