DISRPT: A Multilingual, Multi-domain, Cross-Framework Benchmark For Discourse Processing - Méthodes et Ingénierie des Langues, des Ontologies et du Discours
Communication Dans Un Congrès Année : 2024

DISRPT: A Multilingual, Multi-domain, Cross-Framework Benchmark For Discourse Processing

Résumé

This paper presents DISRPT, a multilingual, multi-domain, and cross-framework benchmark dataset for discourseprocessing, covering the tasks of discourse unit segmentation, connective identification, and relation classification.DISRPT includes 13 languages, with data from 24 corpora covering about 4 millions tokens and around 250, 000discourse relation instances from 4 discourse frameworks: RST, SDRT, PDTB, and Discourse Dependencies.We present an overview of the data, its development across three NLP shared tasks on discourse processingcarried out in the past five years, and the latest modifications and added extensions. We also carry out anevaluation of state-of-the-art multilingual systems trained on the data for each task, showing plateau performanceon segmentation, but important room for improvement for connective identification and relation classification. TheDISRPT benchmark employs a unified format that we make available on GitHub and HuggingFace in order toencourage future work on discourse processing across languages, domains, and frameworks.
Fichier principal
Vignette du fichier
2024.lrec-main.447.pdf (260.6 Ko) Télécharger le fichier
Origine Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-04598164 , version 1 (03-06-2024)

Identifiants

  • HAL Id : hal-04598164 , version 1

Citer

Chloé Braud, Amir Zeldes, Laura Rivière, Yang Janet Liu, Philippe Muller, et al.. DISRPT: A Multilingual, Multi-domain, Cross-Framework Benchmark For Discourse Processing. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA; ICCL, May 2024, Torino, Italy. pp.4990--5005. ⟨hal-04598164⟩
466 Consultations
58 Téléchargements

Partager

More