A Simple Multi-Modality Transfer Learning System for End-to-End Sign Language Summarization

Authors

  • MIHIR KULSHRESHTHA Department of Computer Science, George Mason University, Fairfax, VA
  • Ziyu Yao Department of Computer Science, George Mason University, Fairfax, VA
  • Parth Pathak Department of Computer Science, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2023.3942

Abstract

Sign languages are fully-fledged visual languages that are used by over 466 million deaf and hard-of-hearing people worldwide. With its own grammar and lexicon conveyed through manual and non-manual markers, sign languages are not understood by most hearing people and are not supported by communication technologies. Recently, promising progress in sign language recognition and translation have contributed to decreasing the communication barrier. However, little work has been done in downstream sign language processing. Current systems perform downstream tasks through a cascade of models. Specifically, summarizing the meaning of a long sign language video would be achieved via a cascade of sign language recognition and text summarization models. These types of cascading models allow errors to propagate from one task to the next and are computationally inefficient. Instead, we propose to build an end-to-end model which directly generates a summary given the sign language video. The model will be constructed using the How2 and How2Sign datasets. With its simplicity, this model can serve as a solid baseline for future research in downstream sign language processing. 

Published

2023-10-27

Issue

Section

College of Engineering and Computing: Department of Computer Science

Categories