En poursuivant votre navigation sur ce site, vous acceptez le dépôt de cookies dans votre navigateur. (En savoir plus)

Doctorant (M/F) - Communicative acts and interactions modeling

This offer is available in the following languages:
- Français-- Anglais

Date Limite Candidature : lundi 31 mars 2025 23:59:00 heure de Paris

Assurez-vous que votre profil candidat soit correctement renseigné avant de postuler

Informations générales

Intitulé de l'offre : Doctorant (M/F) - Communicative acts and interactions modeling (H/F)
Référence : UMR9015-CAMGUI-001
Nombre de Postes : 1
Lieu de travail : GIF SUR YVETTE
Date de publication : lundi 10 mars 2025
Type de contrat : CDD Doctorant
Durée du contrat : 36 mois
Date de début de la thèse : 1 octobre 2025
Quotité de travail : Complet
Rémunération : 2200 gross monthly
Section(s) CN : 01 - Interactions, particules, noyaux du laboratoire au cosmos

Description du sujet de thèse

Most human interactions occur through spoken conversations. If this interaction mode seems so natural and easy for humans, it remains a challenge for spoken language processing models as conversational speech raises critical issues: • Non-verbal information can be essential to understand a message. For example, a smiling face and a joyful voice can help detecting irony or humor in a message. • Visual grounding between participants is often needed during a conversation to integrate posture and body gesture as well as references to the surrounding world. For example, a speaker can talk about an object on a table and refer to it as this object by designing it with her hand. • Semantic grounding between participants of a conversation to establish mutual knowledge is essential for communicating with each other. It includes world knowledge (general and commonsense knowledge) domain knowledge and context knowledge (which can evolve during a conversation). For example, conversations between a child and an adult won't have the same characteristics as those between adults. If Large Language Models (LLMs) have transformed the field of Natural Language Processing (NLP) by allowing mostly any applicative task to be handled by simply prompting LLMs, handling conversations remains challenging of then. Processing conversations is often limited to including the raw transcription in the context of the prompt. Simply taking the transcription of a conversation as a textual input of a generative model for downstream tasks, such as summarization or question answering, can provide helpful results but remains limitative as the summary will only convey the spoken content of the conversation, regardless of the other dimensions listed above. To tackle these issues, the ambition of the MINERAL project is to generate enriched transcriptions in the form of a script. The script will comprehensively describe the conversation, including linguistic, paralinguistic, discursive, and pragmatic dimensions expressed in natural language. Such representation holds value in scenarios where the dynamics and structure of interactions are paramount, when the flow and exchange patterns of conversation are as informative as the verbal content itself. We argue that it is necessary to handle the three issues previously mentioned to reach a level of understanding of conversational speech that will be high enough to allow the development of practical applications. We believe that generating scripts of a conversation, that would ideally be self-sufficient to play the conversation back, has an intrinsic value for accessibility in an inclusion perspective. We also believe that creating a comprehensive conversation script could significantly impact a wide range of applicative downstream tasks on one hand and, on the other hand, could contribute to a better understanding of human conversations from a cognitive science perspective. In fact, in this domain, it could be a rich material yielding to novel quantitative methods to study behavior in a rich conversational context. The overall objective of the MINERAL project is to train a multimodal conversation representation model for communicative acts (i.e., the smallest—verbal or non-verbal—communication units with a consistent communication intention) and to study communicative structures (how and why communicative acts are linked), and then to exploit a panel of ecological conversational datasets from various domains to evaluate the quality of the generated scripts and their impact on relevant use cases.

In this context, we propose a PhD subject that focuses on communicative acts and interactions modeling. The goal is first to propose a unified definition of CAs and interactions among modalities and then to compute the representation of communicative acts
First, the PhD candidate will work on the production of representations of independent communicative acts from multimodal low-level descriptors. Communicative acts (CA) are envisioned as an extension of the concept of dialog act to multimodal communication including non-verbal interactions and capturing the notion of intent of a participant at a given time in the conversation. The PhD candidate will first define formally the concept of communicative acts for the project; Then he will create systems able to segment and represent implicitly and explicitly communicative acts from raw inputs and low-level descriptors; Finally he will evaluate representations with probing methods, based on existing benchmark datasets.
On the second part of the PhD, the objective of the thesis seeks to enable a comprehensive processing of a conversation, as required by the objective of script generation. This goes through the detection, characterization, and representation of the relations that connect the communicative acts. After defining the relations of interest, the PhD candidate will focus on predicting them, and study which representations can be the most efficient for downstream tasks, in the perspective of script generation.

Contexte de travail

The work will take place at the Laboratoire Interdisciplinaire des Sciences du Numérique (LISN) on the "Belvédère" site. The selected candidate will join the LIPS team (Language, Interaction, Speech, and Sign) within the STL department (Science and Technology of Language). This team, composed of researchers and research professors in linguistics and natural language processing, conducts interdisciplinary research on spoken and signed languages in a multimodal context. It collaborates extensively with other teams in the STL department as well as with other departments within the laboratory.

The position falls within a sector subject to the protection of scientific and technical potential (PPST). Therefore, in accordance with regulations, your arrival must be authorized by the competent authority of the Ministry of Higher Education and Research (MESR).

Le poste se situe dans un secteur relevant de la protection du potentiel scientifique et technique (PPST), et nécessite donc, conformément à la réglementation, que votre arrivée soit autorisée par l'autorité compétente du MESR.