En poursuivant votre navigation sur ce site, vous acceptez le dépôt de cookies dans votre navigateur. (En savoir plus)

Doctorant (M/F)

This offer is available in the following languages:
- Français-- Anglais

Date Limite Candidature : vendredi 28 mars 2025 23:59:00 heure de Paris

Assurez-vous que votre profil candidat soit correctement renseigné avant de postuler

Informations générales

Intitulé de l'offre : Doctorant (M/F) (H/F)
Référence : UMR7503-CLAGAR-006
Nombre de Postes : 1
Lieu de travail : VANDOEUVRE LES NANCY
Date de publication : vendredi 7 mars 2025
Type de contrat : CDD Doctorant
Durée du contrat : 36 mois
Date de début de la thèse : 16 avril 2025
Quotité de travail : Complet
Rémunération : 2200 gross monthly
Section(s) CN : 07 - Sciences de l'information : traitements, systèmes intégrés matériel-logiciel, robots, commandes, images, contenus, interactions, signaux et langues

Description du sujet de thèse

Text generation has recently attracted significant attention from the
NLP community as large pretrained language models have demonstrated
an excellent ability to generate long, grammatically correct, and fluent text.
One important drawback of these models however is that adapting them
to a new task or a new language often requires labeled data which might
not be readily available and would be costly and difficult to create. This is
particularly acute for data-to-text generation [NG24, SNSM+ 25] but also
holds for domain specific (e.g., health, finance) text-to-text tasks such as
summarisation or simplification.
Recently, preference learning techniques such as DPO (Direct Prefe-
rence Optimisation, [RSM+ 24, IL24]), Group Relative Policy Optimization
(GRPO) 1 or ORPO (Odds Ratio Preference Optimization, [HLT24]) can
be used to improve a base model by training on silver data (which is often
easier to create particularly in for multilingual processing typically by using
machine translation) and further improving the model using preference data.
Preference learning has gained traction in NLG tasks, particularly through
methods like Reinforcement Learning from Human Feedback (RLHF) [LYW23].
This approach has been successfully applied in various domains, including
machine translation and summarization, to improve text quality by aligning
model outputs with human preferences [LNN+ 23]. However, the application
of preference learning to KG-to-Text generation and to multilingual text-to-
text generation tasks has not been explored. A significant challenge is the
scarcity of reliable preference data, which is crucial for training models to
generate text that faithfully represents the input. Addressing this challenge
requires both generating multiple output from the input and being able to
rank these outputs (so as to create preference pairs).
Topic and Program of Work
The goal of this phd thesis is to investigate how these novel preference
learning methods can be exploited to facilitate multi-task, multi-lingual text
generation when gold training data is unavailable.
Specifically, the thesis will build on previous work by the candidate
[SG25] and focus on multilingual Knowledge Graph-to-Text generation pos-
sibly extending to text-to-text summarisation and/or simplification if time
allows.
The PhD project faces several challenges.
First, methods for creating silver training data must be identified and
compared. Depending on the generation tasks, techniques such as distant su-
pervision, machine translation for the multilingual aspect and LLM promp-
ting are natural candidates. These various methods will be explored and
their effectiveness compared using standard evaluation metrics for each of
the target generation tasks.
Second, preference data must be created. As mentioned above, this re-
quires being able to rank alternative outputs. To this end, we will either use
existing metrics when they are available (e.g., SARI for text simplification
or METEOR for translation) or devise new metrics when they are not (e.g.,
metrics that capture the degree of meaning preservation between a know-
ledge graph and a text). We explored this latter point in [SG25] and plan
both to improve on these first results using e.g., by training a ranker rather
than representation learning and to extend to other types of input such as
tabular data or meaning representation graphs.
Third, we will compare and evaluate various preference learning methods
on the targeted generation tasks.

Références
[HLT24]Jiwoo Hong, Noah Lee, and James Thorne. ORPO : Monolithic
preference optimization without reference model. In Yaser Al-
Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Procee-
dings of the 2024 Conference on Empirical Methods in Natural
Language Processing, pages 11170–11189, Miami, Florida, USA,
November 2024. Association for Computational Linguistics.
[IL24]Shawn Im and Yixuan Li. On the generalization of preference
learning with dpo, 2024.
[LNN+ 23]Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat
Nguyen, Franck Dernoncourt, Ryan A. Rossi, and Thien Huu
Nguyen. Okapi : Instruction-tuned large language models in
multiple languages with reinforcement learning from human
feedback, 2023.
[LYW23]Zihao Li, Zhuoran Yang, and Mengdi Wang. Reinforcement
learning with human feedback : Learning dynamic choices via
pessimism, 2023.
[NG24]Anna Nikiforovskaya and Claire Gardent. Evaluating RDF-to-
text generation models for English and Russian on out of do-
main data. In Saad Mahamood, Nguyen Le Minh, and Daphne
Ippolito, editors, Proceedings of the 17th International Natural
Language Generation Conference, pages 134–144, Tokyo, Ja-
pan, September 2024. Association for Computational Linguis-
tics.
[RSM+ 24]Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon,
Christopher D. Manning, and Chelsea Finn. Direct preference optimization : Your language model is secretly a reward model,
2024.
[SG25]
Yifei Song and Claire Gardent. Mucal : Contrastive alignment
for preference-driven kg-to-text generation. Technical report,
CNRS/LORIA and EPFL Lausanne, 2025. In submission.
[SNSM+ 25] Yifei Song, Anna Nikiforovskaya, William Soto-Martinez, Evan
Chapple, and Claire Gardent. Multilingual verbalisation of
knowledge graphs. Technical report, LORIA/CNRS, 2025. In
Submission.

Contexte de travail

The PhD candidate will be part of the MOSAIC group at LORIA. Fun-
ded by the ENACT AI Cluster, he will participate to the project events
while benefitting from the rich research environment provided by both the
large MOSAIC group and the ENACT Project. Publication, participation
and presentation in high ranking conferences and journals are expected and
will be fully supported. The candidate will also be provided with the op-
portunity to attend a summer school and, if interested, to teach university
courses (up to a maximum of 60 hours per year).