Informations générales
Intitulé de l'offre : Interdisciplinary PhD in Computer Science / Linguistics / Speech Processing (M/F/X) (H/F)
Référence : UMR5800-JEAROU-002
Nombre de Postes : 1
Lieu de travail : TALENCE
Date de publication : vendredi 25 avril 2025
Type de contrat : CDD Doctorant
Durée du contrat : 36 mois
Date de début de la thèse : 1 octobre 2025
Quotité de travail : Complet
Rémunération : 2200 gross monthly
Section(s) CN : 07 - Sciences de l'information : traitements, systèmes intégrés matériel-logiciel, robots, commandes, images, contenus, interactions, signaux et langues
Description du sujet de thèse
Title:
Speech, Empathy, and Artificial Intelligence: Linguistic Analysis and Automatic Modeling of Caregiver Communication in Nursing Homes
Abstract:
This interdisciplinary PhD project delves into the heart of caregiver communication to uncover the linguistic and vocal mechanisms of empathy. Based on the first speech database capturing professional caregivers in action, the candidate will develop innovative tools to analyze professional speech, document expressed affects, and train intelligent systems capable of evaluating how well caregiver speech aligns with the principles of the “Humanitude” approach. By combining corpus phonetics and artificial intelligence, the project aims to advance scientific understanding of caregiver-patient interaction while offering concrete applications: personalized feedback for professionals, training support tools, and empathetic technologies in care environments.
Thesis Topic:
In a pilot study, we examined the differences between informal and professional speech using recordings of trained caregivers. This constitutes the first speech database documenting caregivers engaged in professional tasks. Our analyses revealed that caregivers adjust their vocal characteristics when performing their duties.
The PhD project is structured around the following axes:
Expanding the speech database with recordings from both trained and untrained caregivers. The methodological framework, to be developed by the candidate, will support both fine-grained linguistic and phonetic analysis of caregiver speech production and the training of automatic systems to assess its compliance with the “Humanitude” communication framework.
Documenting the database, including metadata collection, transcription of speech (leveraging state-of-the-art speech recognition tools supplemented by human transcribers), affect annotation during interactions, and perceptual validation of selected segments.
Analyzing data across various speech styles, with special attention to spontaneous speech, which is often unpredictable and marked by incomplete sentences, hesitations, and fillers. This analysis will have two components: (1) evaluating end-to-end (E2E) speech recognition systems for their handling of expressivity, and (2) analyzing acoustic features that capture tonal, pitch, and affective variations, independent of speaker identity. This approach will combine linguistic analysis with automatic modeling to better understand the dynamics of empathy expression.
Assessing the relevance of the proposed analyses through regular feedback from “Humanitude” trainers to ensure their practical usefulness.
Developing a practical guide for caregivers and trainers, designed to concretize the vocal style specific to Humanitude, illustrated with selected examples.
This PhD lies at the intersection of language sciences and computer science. Applications are welcome from candidates in either of these fields. Interdisciplinarity is essential, both in the use of computational tools and in the development of novel resources combining linguistic annotations with interpretable, automatically extractable vocal descriptors.
Beyond the collection and documentation of naturalistic data, the main expected contributions are:
The characterization of varied strategies caregivers use to express empathy (through language, prosody, and voice quality),
The development of automatic systems for evaluating caregiver speech, providing targeted feedback on linguistic and expressive dimensions that can be improved.
Contexte de travail
The PhD candidate will be hosted at LaBRI (Bordeaux Computer Science Research Laboratory) in the Image and Sound (I&S) department, with frequent visits to the Laboratory of Phonetics and Phonology (LPP), where they will interact with researchers in linguistics.
The I&S department conducts research on the acquisition, processing, analysis, modeling, synthesis, and interaction of audiovisual media. It covers the entire acquisition chain, from data collection to information extraction or digital data rendering, with the user placed at the center of the process. The types of data handled are highly diverse: 2D and 3D images, video, speech, music, 3D data, EEG signals, physiological data, etc. The various stages of the processing pipeline include modeling phases for both analysis and synthesis. The targeted application domains include health, medicine, education, gaming, and more.
The Laboratory of Phonetics and Phonology (LPP) is a joint research unit (UMR 7018) under the supervision of CNRS and Université Sorbonne Nouvelle. As of September 1, 2024, the LPP hosts 60 members (including permanent staff, PhD students, and contract researchers). The lab has demonstrated strong momentum in recent years, with several new hires, numerous national and European research projects, a wide network of national and international collaborations, and high visibility through strong publication output and international presence. It holds a coherent and unique thematic position within the French academic landscape and is the only laboratory in France that has continuously supported a PhD track in phonetics, with a steady flow of new doctoral students (on average, four per year) and excellent career outcomes for its PhD graduates and postdoctoral researchers.
The LPP specializes in research and teaching in phonetics, phonology, and automatic speech processing. Its research—whether experimental, theoretical, or applied—benefits from a rich interdisciplinary synergy among faculty members of Université Sorbonne Nouvelle, CNRS researchers and engineers, and students from Master's to PhD level. It also relies on a comprehensive experimental platform, open to all national and international researchers working on speech.
Le poste se situe dans un secteur relevant de la protection du potentiel scientifique et technique (PPST), et nécessite donc, conformément à la réglementation, que votre arrivée soit autorisée par l'autorité compétente du MESR.
Contraintes et risques
The position falls within a sector related to the protection of scientific and technical potential (PPST) and therefore, in accordance with regulations, your appointment must be authorized by the competent authority of the Ministry of Higher Education and Research (MESR).