Informations générales
Intitulé de l'offre : Research Position (PhD Student) in Artificial Intelligence for Science (M/F) (H/F)
Référence : UMR6072-FREJUR0-013
Nombre de Postes : 1
Lieu de travail : CAEN
Date de publication : mardi 20 mai 2025
Type de contrat : CDD Doctorant
Durée du contrat : 36 mois
Date de début de la thèse : 1 octobre 2025
Quotité de travail : Complet
Rémunération : 2200 gross monthly
Section(s) CN : 07 - Sciences de l'information : traitements, systèmes intégrés matériel-logiciel, robots, commandes, images, contenus, interactions, signaux et langues
Description du sujet de thèse
The fields of Artificial Intelligence (AI) and Machine Learning (ML) are poised to revolutionize scientific discovery. Foundation models, large neural networks pre-trained on vast datasets, have shown immense promise in natural language processing and are now being explored for scientific applications across chemistry, physics, materials science, and biology. A key challenge in building effective foundation models for science lies in its inherent multimodality: scientific progress relies on integrating information not just from text, but also from complex data structures like atomic cluster/molecular graphs, 3D crystallographic structures, experimental measurements (XRD, Raman spectro., NMR, XAS, etc.), synthesis protocols and simulation outputs.
This project tackles a critical bottleneck in developing such models: how to effectively represent ("tokenize") these complex, non-linguistic scientific data structures for seamless integration into unified, multimodal foundation models, often based on transformer architectures. Standard sequence-based tokenization fails to capture rich topological, geometric (including crucial symmetries), or continuous spectral information.
This research aims to systematically investigate, develop, and evaluate novel representation learning strategies for graphs, 3D coordinates, and spectra. The goal is to create representations that are information-rich, computationally scalable, robust to noise, interpretable, and facilitate effective cross-modal reasoning when combined with textual or other scientific data. Success in this area is crucial for unlocking the potential of AI to understand complex scientific phenomena and accelerate discovery.
Contexte de travail
The project is situated within the research environment of the Computer Science Department and the Physics Department at the University of Caen, focusing on the intersection of deep learning and scientific applications. The new technics devolepped as part of this project will be used on real world data obtained at the Material Chemistry Departement and Physics Department at the University of Caen.
Le poste se situe dans un secteur relevant de la protection du potentiel scientifique et technique (PPST), et nécessite donc, conformément à la réglementation, que votre arrivée soit autorisée par l'autorité compétente du MESR.
Contraintes et risques
--