ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

Abstract : This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English→ Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Univer-sité), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encoder-decoder architecture with attention mechanism was used for two primary submissions corresponding to the two EN-PT evaluations sets: (1) TED (MuST-C) and (2) How2. In this paper, we notably investigate impact of pooling heterogeneous corpora for training, impact of target tokeniza-tion (characters or BPEs), impact of speech input segmenta-tion and we also compare our best end-to-end model (BLEU of 26.91 on MuST-C and 43.82 on How2 validation sets) to a pipeline (ASR+MT) approach.
Document type :
Conference papers
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02352949
Contributor : Yannick Estève <>
Submitted on : Thursday, November 7, 2019 - 10:02:50 AM
Last modification on : Wednesday, November 13, 2019 - 1:41:32 AM

File

IWSLT_2019.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02352949, version 1

Citation

Manh Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubrière, Fethi Bougares, et al.. ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task. 16th International Workshop on Spoken Language Translation 2019, Nov 2019, Hong Kong, China. ⟨hal-02352949⟩

Share

Metrics

Record views

15

Files downloads

12