By continuing to browse the site, you are agreeing to our use of cookies. (More details)

PHD Task-Based Data Analytics Patterns for Data Discovery and Exploration (M/F)

This offer is available in the following languages:
- Français-- Anglais

Application Deadline : 26 June 2025 23:59:00 Paris time

Ensure that your candidate profile is correct before applying.

General information

Offer title : PHD Task-Based Data Analytics Patterns for Data Discovery and Exploration (M/F) (H/F)
Reference : UMR5217-SILMAN-006
Number of position : 1
Workplace : ST MARTIN D HERES
Date of publication : 05 June 2025
Type of Contract : FTC PhD student / Offer for thesis
Contract Period : 36 months
Start date of the thesis : 1 October 2025
Proportion of work : Full Time
Remuneration : 2200 gross monthly
Section(s) CN : 06 - Information sciences: bases of information technology, calculations, algorithms, representations, uses

Description of the thesis topic

Dataset discovery is the process of identifying and collating datasets. Its first purpose is to create a new, potentially virtual dataset. This may, for example, be done directly through a search, by navigating from related datasets, or by browsing the datasets with a specific annotation. Dataset exploration is the process of understanding the properties of datasets and relationships between them. This may, for example, be carried out by exploring the relationships of a given dataset, by viewing shared annotations at dataset or attribute level, or by exploring relationships that are shared by several datasets. Data exploration is the process of stepwise querying a given dataset. The purpose of this PhD is to bridge the gap between dataset discovery, dataset exploration and data exploration to address a specified data task.
The main objective is – using a data model that captures data and metadata, machine learning and transformation operators – to explore different approaches to build query planning data patterns and apply them to the project's 3 use cases: higher education, lifelong learning, weather analysis.

Main tasks:
1. Design a task-driven analytical pattern semantics and efficient algorithms
2, Develop planning algorithms for the use cases of the project
3, Implement and evaluate prototypes (performance); disseminate results

Needed skills: abstraction capabilities, strong programming skills in C/C++ and Python, skills in concepts such as graph data models and sequential learning algorithms. English is needed.

Work Context

The work will take place at the Grenoble Informatics Lab (LIG), a 450-member laboratory with teaching faculty, full-time researchers, PhD students, administrative and technical staff. The mission of LIG is to contribute to the development of fundamental aspects of Computer Science (models, languages, methodologies, algorithms) and address conceptual, technological, and societal challenges. The 22 research teams in LIG aim to increase diversity and dynamism of data, services, interaction devices, and use cases influence the evolution of software and systems to guarantee the essential properties such as reliability, performance, autonomy, and adaptability. Research within LIG is organized into 5 focus areas: Intelligent Systems for Bridging Data, Knowledge and Humans, Software and Information System Engineering, Formal Methods, Models, and Languages, Interactive and Cognitive Systems, Distributed Systems, Parallel Computing, and Networks.
The host team, DAISY, is a joint CNRS, Grenoble INP, and UGA research team handling research challenges at the intersection of AI and data management, but also when data is sourced from interdisciplinary domains such as education and health.
Position in the context of the H2024-INFRA DataGEMS project
Data is an asset that spurs innovation, drives decision making, improves operations and impacts several domains, including science, environment, health, energy, education, industry, and society as a whole. A growing number of open datasets from governments, academic institutions, and companies bring new opportunities for innovation, economic growth, and societal benefits. From real-time to historical data, from structured data in tabular form to unstructured text, images or videos, data is highly heterogeneous. Moreover, its volume and complexity create a “needle-in-the-haystack” problem: it is extremely challenging and time-consuming to discover, leverage and combine data within this expanding sea of data. Data discovery systems, such as Google Datasets, and open data portals, such as the EOSC Portal promise to bring data closer to the users, but fall short for the following reasons: (a) Limited data discovery capabilities, (b) Poor metadata, (c) Superficial query answering, and (d) Single-table datasets. Existing tools allow searching for spreadsheets or data published in formats such as CSV or JSON but not complex datasets, e.g., collections of tables, text, or temporal data.
​​To address the above limitations, the DataGEMS project proposes a data discovery platform with Generalized Exploratory, Management, and Search capabilities. DataGEMS is built on the principles of data FAIRness, openness and re-use. It aims to seamlessly integrate data sharing, discovery and analysis into a system that addresses the whole data lifecycle, i.e., sharing, storing, managing, discovering, analyzing and reusing (data and/or metadata), bridging the gap between the data provider and the data consumer.
DataGEMS is a HORIZON-INFRA-2024-EOSC-01-05 - HORIZON-RIA HORIZON Research and Innovation Actions whose purpose is to build a fully operational and sustainable ecosystem of open-source tools for data FAIRness and provide an ecosystem of Free and Open-Source tools and a number of services covering all phases of the data lifecycle dealing with storage and management, discovery, analysis, description, publication and reuse. The project has 12 partners distributed in 8 European countries who will collaborate to develop novel tools and services to access FAIR-by-design datasets more rapidly than previously. They facilitate the collection and analysis of heterogeneous and/or large-scale data sets, provide automatic production of FAIR data at the research instruments (e.g., meteorological stations) and support infrastructures by metadata automation tools and techniques.

The position is located in an area subject to French legislation on the protection of scientific and technical potential (PPST), and therefore requires, in accordance with regulations, that your arrival be authorized by the competent authority of the Ministry of Higher Education and Research (MESR).

The position is located in a sector under the protection of scientific and technical potential (PPST), and therefore requires, in accordance with the regulations, that your arrival is authorized by the competent authority of the MESR.