PhD Provenance and explainability of LLMs (M/F)

This offer is available in the following languages:
- Français-- Anglais
Date Limite Candidature : lundi 2 juin 2025 23:59:00 heure de Paris

Assurez-vous que votre profil candidat soit correctement renseigné avant de postuler

Informations générales

Intitulé de l'offre : PhD Provenance and explainability of LLMs (M/F) (H/F)
Référence : UMR5217-SILMAN-005
Nombre de Postes : 1
Lieu de travail : ST MARTIN D HERES
Date de publication : lundi 12 mai 2025
Type de contrat : CDD Doctorant
Durée du contrat : 36 mois
Date de début de la thèse : 1 octobre 2025
Quotité de travail : Complet
Rémunération : 2200 gross monthly
Section(s) CN : 06 - Sciences de l'information : fondements de l'informatique, calculs, algorithmes, représentations, exploitations

Description du sujet de thèse

Conversational AI systems are large-scale language models using transformative neural networks. These models are trained on large amounts of text data collected from the web using supercomputers over several days. For example, PaLM, a Google LLM model, has 540 billion parameters and requires over a month's training on a specialized computing cluster. The rapid adoption of LLMs has outstripped the development of techniques for assessing the quality of their results. Such monitoring is crucial, as LLMs have been shown to be prone to producing so-called “hallucinations”, plausible but factually incorrect answers or answers incompatible with the user's intention. Consequently, relying on LLMs without proper evaluation can have serious consequences. Ensuring the quality of LLM results is essential to harness the transformative power of these models while limiting potential risks. By developing robust validation methodologies and integrating quality control measures, organizations can harness the benefits of LLMs while safeguarding their decision-making.

Another issue of LLMs is that they are not fully capable of tracking their reasoning, especially in long conversational threads or complex queries over primary sources.

The objectives of this MSCA doctoral thesis is to contribute towards better explainability of LLMs by targeting the following objectives:
Establish a formalism for explainability and provenance of the information used by LLMs, by linking them to where they come from in the sources and how the primary data was used to derive the generated content, by extending known data provenance approaches.
Link the explainability formalism to knowledge-graph based approaches
Implement the explainability framework either at a high-level (one- or multiple-shot prompting) or by modifying / fine-tuning the architecture of the LLM.

A successful candidate needs to be familiar with abstract reasoning (logic and databases, linear and non-linear algebra) but also with implementation (programming in C/C++, Python).

Contexte de travail

MSCA Doctoral Network ARMADA project at the Laboratoire d'Informatique de Grenoble. Remuneration may be higher depending on the MSCA's funding criteria.

The LIG is a 500-member laboratory made up of academic staff, permanent researchers, PhD students and administrative and technical staff. Its mission is to contribute to the development of fundamental aspects of computer science (models, languages, methodologies, algorithms) and to meet conceptual, technological and societal challenges. The LIG's 24 research teams aim to increase the diversity and dynamism of data, services, interaction devices and use cases, in order to influence the evolution of software and systems to guarantee essential properties such as reliability, performance, autonomy and adaptability. Research at LIG focuses on five areas: Intelligent systems for linking data, knowledge and people, Software and information systems engineering, Formal methods, models and languages, Interactive and cognitive systems, Distributed systems, Parallel computing and Networks.

ARMADA is a doctoral network aiming to train 15 versatile and interconnected young researchers, specializing in the field of conversational artificial intelligence (AI) and the challenges associated with recent advances in the development of large language models (LLMs), such as ChatGPT and Bard. These specialists will acquire unique knowledge and skills in artificial intelligence, natural language processing, machine learning, data management and algorithm design to improve the reliability of LLMs. A reliable LLM will produce fast, consistent and verifiable answers, and guide the user. Thanks to its highly interdisciplinary nature, the proposed program offers numerous training activities designed to hone trainees' skills. The network offers research training with summer and winter schools on multidisciplinary aspects of the subject, as well as workshops and courses aimed at developing non-technical social and interpersonal skills, such as scientific writing, innovation, supervision and management. This program responds to the EU's critical need for AI regulation by proposing to train conversational AI experts who can advise European bodies on technical issues related to the adoption of these technologies in key disciplines such as medicine, education and business intelligence. The eight organizations in seven countries form an interoperability platform for sharing knowledge and skills.

The position is located in a sector covered by the French Protection of Scientific and Technical Potential (PPST), and therefore requires, in accordance with regulations, that your arrival be authorized by the competent Ministry of Higher Education and Research (MESR) authority.

Le poste se situe dans un secteur relevant de la protection du potentiel scientifique et technique (PPST), et nécessite donc, conformément à la réglementation, que votre arrivée soit autorisée par l'autorité compétente du MESR.