En poursuivant votre navigation sur ce site, vous acceptez le dépôt de cookies dans votre navigateur. (En savoir plus)

Research Position (PhD Student) in Artificial Intelligence for Science (M/F)

This offer is available in the following languages:
- Français-- Anglais

Date Limite Candidature : mardi 10 juin 2025 23:59:00 heure de Paris

Assurez-vous que votre profil candidat soit correctement renseigné avant de postuler

Informations générales

Intitulé de l'offre : Research Position (PhD Student) in Artificial Intelligence for Science (M/F) (H/F)
Référence : UMR6072-FREJUR0-013
Nombre de Postes : 1
Lieu de travail : CAEN
Date de publication : mardi 20 mai 2025
Type de contrat : CDD Doctorant
Durée du contrat : 36 mois
Date de début de la thèse : 1 octobre 2025
Quotité de travail : Complet
Rémunération : 2200 gross monthly
Section(s) CN : 07 - Sciences de l'information : traitements, systèmes intégrés matériel-logiciel, robots, commandes, images, contenus, interactions, signaux et langues

Description du sujet de thèse

The fields of Artificial Intelligence (AI) and Machine Learning (ML) are poised to revolutionize scientific discovery. Foundation models, large neural networks pre-trained on vast datasets, have shown immense promise in natural language processing and are now being explored for scientific applications across chemistry, physics, materials science, and biology. A key challenge in building effective foundation models for science lies in its inherent multimodality: scientific progress relies on integrating information not just from text, but also from complex data structures like atomic cluster/molecular graphs, 3D crystallographic structures, experimental measurements (XRD, Raman spectro., NMR, XAS, etc.), synthesis protocols and simulation outputs.

This project tackles a critical bottleneck in developing such models: how to effectively represent ("tokenize") these complex, non-linguistic scientific data structures for seamless integration into unified, multimodal foundation models, often based on transformer architectures. Standard sequence-based tokenization fails to capture rich topological, geometric (including crucial symmetries), or continuous spectral information.

This research aims to systematically investigate, develop, and evaluate novel representation learning strategies for graphs, 3D coordinates, and spectra. The goal is to create representations that are information-rich, computationally scalable, robust to noise, interpretable, and facilitate effective cross-modal reasoning when combined with textual or other scientific data. Success in this area is crucial for unlocking the potential of AI to understand complex scientific phenomena and accelerate discovery.

Contexte de travail

The project is situated within the research environment of the Computer Science Department and the Physics Department at the University of Caen, focusing on the intersection of deep learning and scientific applications. The new technics devolepped as part of this project will be used on real world data obtained at the Material Chemistry Departement and Physics Department at the University of Caen.

Le poste se situe dans un secteur relevant de la protection du potentiel scientifique et technique (PPST), et nécessite donc, conformément à la réglementation, que votre arrivée soit autorisée par l'autorité compétente du MESR.

Contraintes et risques

--