En poursuivant votre navigation sur ce site, vous acceptez le dépôt de cookies dans votre navigateur. (En savoir plus)

PhD Candidate in Computer Science (M/F) – Extraction, Structuring, and Integration of Heterogeneous Data for Geopolymer Formulation

This offer is available in the following languages:
- Français-- Anglais

Date Limite Candidature : mercredi 25 février 2026 23:59:00 heure de Paris

Assurez-vous que votre profil candidat soit correctement renseigné avant de postuler

Informations générales

Intitulé de l'offre : PhD Candidate in Computer Science (M/F) – Extraction, Structuring, and Integration of Heterogeneous Data for Geopolymer Formulation (H/F)
Référence : UMR5205-ANDMAU-002
Nombre de Postes : 1
Lieu de travail : VILLEURBANNE
Date de publication : mercredi 4 février 2026
Type de contrat : CDD Doctorant
Durée du contrat : 36 mois
Date de début de la thèse : 1 juin 2026
Quotité de travail : Complet
Rémunération : 2300 € gross monthly
Section(s) CN : 02 - Sciences informatiques : fondements de l'informatique, calculs, algorithmes, représentations, exploitations

Description du sujet de thèse

The formulation of low-carbon geopolymers from construction waste requires the combined exploitation of a wide range of information related to raw material properties, mixture compositions, and manufacturing parameters. This information is currently scattered across heterogeneous data sources, including scientific literature, technical reports in PDF format, graphs, and tables produced by industrial partners and research laboratories, which limits its systematic use.

This PhD project aims to develop methods for the extraction, structuring, and integration of such data in order to build a unified knowledge base that can guide geopolymer synthesis and help avoid non-performant formulations. The work will rely on OCR techniques, table parsing, and approaches based on Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to extract and structure information from heterogeneous documents while ensuring traceability to the original sources.

The extracted data will be organized within a property graph by applying a well-defined schema and strict integrity constraints (PG-keys, PG-schema), as well as normalization and semantic analysis techniques. The thesis will also address data integration and reconciliation challenges by investigating Global-as-View and Local-as-View strategies, together with view maintenance mechanisms for property graphs.

The proposed approaches will follow a human-in-the-loop paradigm, combining automated processing with expert validation, in close collaboration with the academic and industrial partners of the GEOLIANT project.

Contexte de travail

The PhD project is conducted within the framework of the GEOLIANT project, supported by BPI France under the France 2030 program. GEOLIANT aims to develop and industrialize low-carbon geopolymer binders derived from construction waste, providing a sustainable alternative to traditional cement, which is highly CO₂-intensive. The project relies on the development of innovative formulations and on the deployment of AI-based digital and predictive tools to accelerate research, performance evaluation, and validation processes.

The PhD candidate will be affiliated with the LIRIS laboratory and will work within an academic–industrial consortium involving stakeholders from civil engineering, materials science, and environmental domains. The research will be carried out in a multidisciplinary environment, at the intersection of computer science (data extraction and integration, graphs, artificial intelligence) and materials engineering.

The PhD will be conducted in close collaboration with the project partners, within a framework that promotes co-design, expert validation, and the transfer of research outcomes to operational applications, particularly in the context of pilot construction sites. It offers a stimulating research environment combining methodological contributions with significant industrial and environmental impact.

Le poste se situe dans un secteur relevant de la protection du potentiel scientifique et technique (PPST), et nécessite donc, conformément à la réglementation, que votre arrivée soit autorisée par l'autorité compétente du MESR.

Contraintes et risques

The PhD will be carried out within a multi-partner collaborative project, involving coordination, scheduling, and dependencies on partner contributions. The work depends on the availability, quality, and heterogeneity of data from industrial and scientific sources, which may affect the pace of the research. The complexity of unstructured sources (PDFs, graphs, tables) requires evolving methodological and technical choices. An appropriate balance between AI-based automation and human validation is necessary and may lengthen certain experimental phases. The project is subject to contractual milestones and deliverables associated with funding from BPI France / France 2030. The interdisciplinary nature of the topic requires a period of familiarization with concepts in materials science and civil engineering. Occasional travel may be required for meetings, workshops, or validation activities with project partners.
Dépendance à la disponibilité, à la qualité et à l'hétérogénéité des données issues de sources industrielles et scientifiques, pouvant impacter le rythme des travaux.

Complexité des sources non structurées (PDF, graphiques, tableaux), nécessitant des choix méthodologiques et techniques évolutifs.

Nécessité d'un équilibre entre automatisation par l'IA et validation humaine, pouvant allonger certaines phases expérimentales.

Contraintes liées aux jalons et livrables contractuels du projet financé par BPI France / France 2030.

Interdisciplinarité du sujet, demandant un temps d'appropriation des notions en science des matériaux et en génie civil.

Déplacements ponctuels possibles pour réunions, ateliers ou validations chez les partenaires du projet.