By continuing to browse the site, you are agreeing to our use of cookies. (More details)

M/F Phd within ANR VITE Project

This offer is available in the following languages:
- Français-- Anglais

Application Deadline : 03 October 2025 23:59:00 Paris time

Ensure that your candidate profile is correct before applying.

General information

Offer title : M/F Phd within ANR VITE Project (H/F)
Reference : UMR5149-NATCOL-027
Number of position : 1
Workplace : MONTPELLIER
Date of publication : 12 September 2025
Type of Contract : FTC PhD student / Offer for thesis
Contract Period : 36 months
Start date of the thesis : 1 November 2025
Proportion of work : Full Time
Remuneration : 2200 gross monthly
Section(s) CN : 01 - Interactions, particles, nuclei, from laboratory to cosmos

Description of the thesis topic

The doctoral student's primary mission will be to **develop, adapt, and evaluate methods for estimating sample influence** within the specific context of Pl@ntNet, a platform dedicated to automated plant identification. This mission will focus on the following key areas:

1. Study and Adaptation of Influence Functions**
The doctoral student will delve into **influence functions**, theoretical tools designed to quantify the impact of a sample on a machine learning model. These functions, defined through the derivative of model parameters or the loss function with respect to an infinitesimal perturbation of the dataset, provide a rigorous framework to:
- **Identify the most informative samples** among the predictions of a deep neural network (DNN), with the goal of enhancing Pl@ntNet's user interface. Currently, images presented to users are selected based on $\ell_2$ distance in feature space. The objective is to replace this approach with influence-based selection to facilitate species identification and improve user experience.
- **Detect mislabeled samples** in validated or user-labeled databases by leveraging $I_{loss}(z_i, z_i)$, which approximates the error incurred if sample $z_i$ is removed from the training set.
- **Prioritize the annotation of unlabeled images**, particularly for rare species, by identifying samples whose addition or correction would have the greatest impact on model performance.

The student will also explore **advanced variants** of influence functions, such as the **Proximal Bregman Objective (PBO)**, which avoids the assumption of model optimality and allows influence to be assessed at various stages of optimization.

2. Addressing Computational Challenges**
Applying influence functions to deep neural networks presents **computational challenges**, primarily due to the prohibitive size of the Hessian matrix $H_{\hat{\theta}}$. To overcome these, the doctoral student will study and implement **approximation methods**, including:
- Using the **Fisher Information Matrix** as a surrogate for the Hessian, building on recent work such as @george2018fast.
- Adopting **Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC)**, an efficient method for estimating the curvature of the loss function, as proposed by @grosse2023studying.
- Exploring **zero-order optimization tools**, such as extensions of Stein's lemma or Stokes' formula-based methods, to estimate the Hessian of a proxy function (smoothed version of the network) in a single forward pass, as suggested by @balasubramanian2022zeroth.
- Combining these approaches with **natural gradient techniques** to improve the accuracy and efficiency of influence estimates.

3. Accounting for Optimization Biases**
Classical influence functions assume stationarity of the optimal solution. However, SGD-type algorithms often converge to specific stationary points influenced by **implicit bias phenomena** (@chizat2020implicit, @vardi2023implicit). The doctoral student will:
- **Extend influence function concepts** to incorporate the dynamic properties of the optimization algorithm, providing a more accurate reflection of the real-world impact of samples on the final model.
- **Adapt methods** to address Pl@ntNet's specific characteristics, such as the rarity of certain species and the hierarchical structure of the data.

4. Integration and Practical Validation**
Finally, the doctoral student will collaborate with the Pl@ntNet team to:
- **Integrate developed methods** into the existing pipeline, particularly through the *ThePlantGame* platform, to enhance annotation quality and model efficiency.
- **Evaluate the performance** of proposed approaches in real-world scenarios, measuring their impact on model accuracy, error detection, and user experience.

Work Context

Main Project Objective:
Identify the most influential samples in a dataset, particularly for Pl@ntNet, a plant identification app.
Applications:
- **User Interface Improvement:** Currently, users see the most likely species (5 candidates) and the 5 most probable samples among them. The goal is to provide the most **informative** samples (not just the most probable) to facilitate identification and enhance user experience.
- **Theoretical Model Improvement:** Understanding a sample's influence helps detect labeling errors or prioritize unlabeled images, optimizing the learning algorithm and service quality.
The doctoral student will carry out their work at IMAG (UMR of Mathematics) and within the IROKO/INRIA team in Montpellier.