Robust Neural Machine Translation for Abugidas by Glyph Perturbation
Hour Kaing, Chenchen Ding, Hideki Tanaka, Masao Utiyama
Main: Machine Translation Oral Paper
Session 9: Machine Translation (Oral)
Conference Room: Marie Louise 1
Conference Time: March 20, 09:00-10:30 (CET) (Europe/Malta)
TLDR:
You can open the
#paper-242-Oral
channel in a separate window.
Abstract:
Neural machine translation (NMT) systems are vulnerable when trained on limited data. This is a common scenario in low-resource tasks in the real world. To increase robustness, a solution is to intently add realistic noise in the training phase. Noise simulation using text perturbation has been proven to be efficient in writing systems that use Latin letters. In this study, we further explore perturbation techniques on more complex abugida writing systems, for which the visual similarity of complex glyphs is considered to capture the essential nature of these writing systems. Besides the generated noise, we propose a training strategy to improve robustness. We conducted experiments on six languages: Bengali, Hindi, Myanmar, Khmer, Lao, and Thai. By overcoming the introduced noise, we obtained non-degenerate NMT systems with improved robustness for low-resource tasks for abugida glyphs.