How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?

Danni Liu; Jan Niehues

How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?

Danni Liu, Jan Niehues

Add to Favorites

Main: Machine Translation Oral Paper

Session 9: Machine Translation (Oral)

Conference Room: Marie Louise 1

Conference Time: March 20, 09:00-10:30 (CET) (Europe/Malta)

TLDR:

RocketChat
Abstract

You can open the #paper-40-Oral channel in a separate window.

Abstract: Customizing machine translation models to comply with desired attributes (e.g., formality or grammatical gender) is a well-studied topic. However, most current approaches rely on (semi-)supervised data with attribute annotations. This data scarcity bottlenecks democratizing such customization possibilities to a wider range of languages, particularly lower-resource ones. This gap is out of sync with recent progress in pretrained massively multilingual translation models. In response, we transfer the attribute controlling capabilities to languages without attribute-annotated data with an NLLB-200 model as a foundation. Inspired by techniques from controllable generation, we employ a gradient-based inference-time controller to steer the pretrained model. The controller transfers well to zero-shot conditions, as it is operates on pretrained multilingual representations and is attribute- rather than language-specific. With a comprehensive comparison to finetuning-based control, we demonstrate that, despite finetuning’s clear dominance in supervised settings, the gap to inference-time control closes when moving to zero-shot conditions, especially with new and distant target languages. The latter also shows stronger domain robustness. We further show that our inference-time control complements finetuning. Moreover, a human evaluation on a real low-resource language, Bengali, confirms our findings. Our code is in the supplementary material.