Faites connaître cette offre !
Reference : UMR7357-HYESEO-004
Workplace : STRASBOURG
Date of publication : Thursday, January 6, 2022
Scientific Responsible name : Hyewon Seo
Type of Contract : PhD Student contract / Thesis offer
Contract Period : 36 months
Start date of the thesis : 1 March 2022
Proportion of work : Full time
Remuneration : 2 135,00 € gross monthly
Description of the thesis topic
The proposed PhD study is articulated with three following tasks:
1. A photo-realistic 4D human model will be developed, by coupling a geometric human model and a color- and illumination components in a way that any desired shape, pose, and attribute can be generated. We will build our human model by grounding on our previous work as well as other standard geometric human models learned from multiple images of real people. Additional efforts will be made to consolidate the model by adding cloth and garment draping module, again based on our previous work.
2. The evolution of the human model over time due to motion will be learned by a deep neural network (DNN), over multiples of annotated motion datasets. We will focus on the combinations of RNNs and variational autoencoders, allowing a stochastic prediction of shape- and pose-sequences in a latent space. Different data representations and network hyperparameters will be experimented, to obtain best results.
3. An inverse problem will be formulated to reconstruct a 4D human model given an image, video, or a simple action command. All model components will be developed in a differentiable manner, so that an inverse problem can be formulated: Given an observation data, we will be able to build a model instance by an optimal model-fitting and even to predict its future state.
Robot vision for human cognition often fails to work well in the real-world situation, despite the disruptive results achieved in Computer Vision and Artificial Intelligence. While most training data have been collected in well-conditioned, easy-to-isolate backgrounds, wild videos from the real-world may contain various environmental conditions such as lighting, background patterns, and, most notoriously, occlusions. The latter becomes the source of recurrent problems of human cognition by care-robots in the in-house situation. Large variations in body shapes, motions, clothes, and frequent interactions with objects also contribute to the difficulty. Finally, viewing dynamics by moving robots and humans is another source of the problem. Learning-based methods relying on dataset inevitably show limited performance, as it is almost impossible to collect a large, annotated dataset that spans over all possible configurations of the real-world scene.
Model-based learning approaches are good alternatives to this problem, as have been confirmed by a considerable amount of previous works on human face and body recognition from images based on pre-constructed 3D models. However, many existing models consider only the geometry of human face or body in isolation, making it difficult to real-world situation involving obstacles or environmental objects. In this project2, we will also adopt a model-based approach but with enhanced realism, additional dimension (i.e. time), and beyond. Our aim is to push the current limits of robot vision in human cognition by care-robots in the in-house situation. The specific goal is to make the performance of the vision-intelligence robust to large variations (in body shapes, motions,..) to occlusion (cloth, furniture, wall,..), and capable of understanding the interaction by developing a photo-realistic, physics-aware 4D human model.
We talk about it on Twitter!