Faites connaître cette offre !
Reference : UMR5217-SIHAME-007
Workplace : GRENOBLE,GRENOBLE
Date of publication : Friday, October 09, 2020
Type of Contract : FTC Scientist
Contract Period : 12 months
Expected date of employment : 1 December 2020
Proportion of work : Full time
Remuneration : between 2648.79 and 3054.06 €
Desired level of education : PhD
Experience required : Indifferent
The topic of this post-doc is to develop formalization, algorithms and validation to help users achieve their goal in data exploration. Exploration is a popular research area in data analysis where users are involved in the process of data analysis. The user has only a partial understanding of her needs and seeks to refine them as she extracts more information from the data. Each session can contain a sequence of iterations whose transition is formed by different types of exploration. We propose to provide a declarative layer specifically designed to express these needs. We call this layer an exploration-centric pipeline. Each pipeline is formed by a sequence of explorations that can be revisited and adapted on the fly as a user enriches his knowledge.
Users differ from each other in their knowledge of the data and their techniques which determine how they will explore the data. We distinguish three user roles: data scientists who may or may not know the data and the domain; subject matter experts who have a good understanding of the data but generally do not know how to form queries; information consumers who are least informed about data and technical details. They can know the attributes of the domain (for example, in case of Amazon or eBay e-commerce sites). However, with the new evolving user roles, the categorization between the different roles is blurring. The proposed approach must take this into account.
Interacting with a dataset presents inherent challenges that arise from: (a) the high volume of data (the volume increases, as does the gap between the amount of data available and the human ability to understand the data), ( b) the user's low familiarity with a dataset (the majority of users are not familiar with the organization of content in a dataset), and (c) the user's information needs ( the user may not know exactly what she is looking for until she finds out).
As a result, interacting with one or more datasets is a difficult and evolving journey.
Exploration support services aim to guide the user by providing advice in the form of (a) questions that could be asked and (b) additional results.
This work will be done in collaboration with three use case providers: (a) Research on cancer biomarkers - SIB Swiss Institute of Bioinformatics, Switzerland, (b) Research and innovation policy - SIRIS, Spain, and c ) Astrophysics - Max Planck Institute for Extraterrestrial Physics, Germany.
The candidate will develop a framework to assist users in data mining including the following tasks:
1. Formalization: data access and exploration operators, exploration framework.
2. Algorithms: fully and partially guided exploration
3. Validation: use case with different objectives
The candidate must be competent in algorithms and optimization, in big data management and in data science in general.
The candidate must hold a PhD degree in Computer Science
Laboratoire d'Informatique de Grenoble, SLIDE Team, University Grenoble Alpes
The growth and availability of data has dramatically changed data mining over the past 10 years. Large volumes of data are continuously collected. These data sets are heterogeneous ranging from highly structured data in tabular form, to images and videos.
The post-doc proposal falls within the context of a European INODE - Intelligent Open Data Exploration project. The fundamental principle of INODE is that users should interact with data in a more dialectical and intuitive way similar to a dialogue with a human. To achieve this principle, INODE will offer a suite of agile solutions, tailored to sustainable needs and services for data set exploration that help users (a) link and leverage multiple data sets, (b) access and research data using natural language, using examples and using analytics (c) get system guidance to understand the data and formulate the right queries, and (d) explore the data and discover new insights through visualizations.
Constraints and risks
We talk about it on Twitter!