PhD (M/F) vulnerabilities in code generated by LLM (TAP project)

This offer is available in the following languages:
- Français-- Anglais
Date Limite Candidature : lundi 8 septembre 2025 23:59:00 heure de Paris

Assurez-vous que votre profil candidat soit correctement renseigné avant de postuler

Informations générales

Intitulé de l'offre : PhD (M/F) vulnerabilities in code generated by LLM (TAP project) (H/F)
Référence : UMR6074-OLIZEN-005
Nombre de Postes : 1
Lieu de travail : RENNES
Date de publication : lundi 18 août 2025
Type de contrat : CDD Doctorant
Durée du contrat : 36 mois
Date de début de la thèse : 3 novembre 2025
Quotité de travail : Complet
Rémunération : 2200 gross monthly
Section(s) CN : 06 - Sciences de l'information : fondements de l'informatique, calculs, algorithmes, représentations, exploitations

Description du sujet de thèse

For the last 60 to 70 years, programming has largely prevailed in the field of computer science, encompassing the capture of intentions and the production of code. Formal specifications have gained in importance thanks to advances in systems modelling and design, enabling more precise capture of objectives. Despite these advances, software engineers are reluctant to write formal specifications, resulting in the absence of a formal statement of intent for large software systems, making debugging and error correction difficult. Despite the lack of intent capture, testing and analysis have been used to build reliable code bases. In testing, this work aims to achieve greater behavioural coverage and uses test oracles. Fuzzing approaches have gained in importance over the last decade. However, achieving functional correctness of software without extensive formal requirements remains a difficult goal.

Recent advances in automatic code generation from large language models (LLMs) offer a new perspective. It is possible to program from natural language specifications using LLM-based code generation, suggesting that self-coding is feasible. This raises the question of the correctness and security of code automatically generated by LLMs and the conditions under which it can be trusted.

The TAP (Trustworthy Automatic Programming) project focuses specifically on these aspects. The objectives of this project are to identify vulnerabilities in LLM-generated code, to analyse and classify them, and to determine whether certain types of vulnerability are more frequent in LLM-generated code than in code written by humans. The objectives of the project also include the automatic correction of vulnerabilities in LLM-generated code and the strengthening of LLM against vulnerabilities in generated code.

The main objective of the DiverSE team on this project is to carry out research to identify vulnerabilities in code generated by LLMs. To achieve this objective, we will set up a system capable of automatically generating datasets of vulnerabilities. This will be achieved by using the web catalogues available for vulnerabilities and by modelling these vulnerabilities in such a way as to integrate them seamlessly into a test tool, enabling us to analyse the code and libraries generated by LLM. The target languages will primarily be C and Java, due to their widespread use and in order to maximise the impact of our work.

In this context, the DiverSE team (in close collaboration with the IPAL laboratory and the DGA) is recruiting a PhD student for a period of 36 months, under the scientific and technical responsibility of permanent members of the team involved in the project. This person will be responsible for research and design work related to the DiverSE objectives indicated above, with the aim of analysing the state of the art, and designing techniques and methods that will then be implemented in prototypes and demonstrators.
Synergies with other work carried out in the team will also be explored and exploited.
The results of our work will be used by NUS partners in Singapore.

The exponential growth in the use of LLMs for all kinds of tasks, including the assisted production of source code, ensures that the results of the project will have a considerable impact. Indeed, the security of code produced by LLMs is currently in its infancy, and providing a system that performs this task automatically would meet a huge global need. The resulting cybersecurity challenges are therefore considerable in practice.

Contexte de travail

IRISA (Institut de Recherche en Informatique et Systèmes Aléatoires) is one of France's largest research laboratories (over 850 staff) in the field of computing and new information technologies.
Organised into seven scientific departments, the laboratory is a centre of research excellence focusing on priority areas such as bioinformatics, systems security, new software architectures, virtual reality, massive data analysis and artificial intelligence.
IRISA is at the centre of a dynamic regional research and innovation ecosystem, standing out in France and abroad thanks to its recognised expertise, particularly through international scientific collaborations.
Focused on the future of computing and with an international outlook, IRISA is at the heart of the digital transition and innovation for cybersecurity, health, the environment, transport, robotics, energy, culture and artificial intelligence.

The DiverSE research team studies software engineering techniques for the reliable and efficient construction of applications, with a particular focus on cybersecurity and LLMs.
With around 15 permanent staff (Inria and CNRS researchers, INSA/Université de Rennes teacher-researchers, including 3 IUFs), around 15 PhD students, several engineers and a DGA associate engineer, the team is recognised worldwide in its areas of expertise. It also makes a point of ensuring that its research is applicable and even applied, with very strong links with international, national and local industry.
It is also renowned locally for its on-site atmosphere, coffee breaks and memorable team seminars.

As part of the TAP project, there will be frequent contact with our partners at NUS (National University of Singapore) and IPAL (Nantes). One or more stays in Singapore may be envisaged, depending on your wishes. More generally, contacts within and outside the DIverSE team will give you an opportunity to look beyond your own work, offering a broad and varied context with the team's many research, innovation and industrial transfer projects.
After the project, you'll be one of the (many) alumni of the DiverSE team, most of whom are still in touch.

Benefits
Possibility of teleworking up to 2 days a week
Partial reimbursement of public transport costs to and from work or FMD
Partial reimbursement of health insurance costs
Subsidised on-site catering
Free on-site car and bicycle parking; bus 5 minutes' walk; metro 10 minutes' walk.

Location
Campus de Beaulieu Irisa/Inria Rennes
Building 12
263 avenue du Général Leclerc
35 042 RENNES cedex

Presentation of CNRS as employer: https://www.cnrs.fr/fr/le-cnrs
Presentation of IRISA as the laboratory of assignment: https://www.irisa.fr/umr-6074

The post is located in a sector covered by the protection of scientific and technical potential (PPST), and therefore requires, in accordance with the regulations, that your arrival be authorised by the competent authority of the MESR.

Contraintes et risques

This work may involve travel in France and abroad, including by air.