Faites connaître cette offre !
Reference : UMR7271-VIVROS-016
Workplace : SOPHIA ANTIPOLIS
Date of publication : Thursday, July 30, 2020
Scientific Responsible name : ANTONINI MARC
Type of Contract : PhD Student contract / Thesis offer
Contract Period : 36 months
Start date of the thesis : 1 November 2020
Proportion of work : Full time
Remuneration : 2 135,00 € gross monthly
Description of the thesis topic
Storage of digital data is becoming challenging for the humanity due to the relatively short life span of storage devices. At the same time, the “digital universe” (all digital data worldwide) is forecast to grow to over 160 zettabytes in 2025. A significant fraction of this data is called “cold” or infrequently accessed. Old photographs stored by users on Facebook is one such example of cold data; Facebook recently built an entire data center dedicated to storing such cold photographs. Unfortunately, all current storage media used for cold data storage (Hard Disk Drives or tape) suffer from two fundamental problems. First, the rate of improvement in storage density is at best 20% per year, which substantially lags behind the 60% rate of cold data growth. Second, current storage media have a limited lifetime of five (HDD) to twenty years (tape). As data is often stored for much longer duration (50 or more years) due to legal and regulatory compliance reasons, data must be migrated to new storage devices every few years, thus, increasing the price of data ownership.
An alternative approach may stem from the use of DNA, the support of heredity in living organisms. Using DNA to store cold data is an attractive possibility because it is extremely dense, with a raw limit of 1 exabyte/mm3, and long-lasting, with observed half-life much over 500 years. This comes from recent biotechnological developments allowing easy and affordable DNA writing (synthesis) and DNA reading (sequencing). However, one major problem of DNA storage is that all the information stored on DNA suffers the introduction of errors both in the synthesis and in the sequencing phase. Errors take the form of substitutions, insertions and deletions of single nucleotides. Concerning the introduction of errors, the most critical phase is the sequencing of the strands: in this case the choice of different sequencing machines results in significant fluctuations in the number of sequencing errors, since different techniques are available to tackle this task.
The project is carried out in the context of the EC project OligoArchive (https://oligoarchive.eu) which has the goal of developing a prototype for the storage of information in synthetic DNA. The aim of this PhD project is to develop the mathematical foundations for encoding and decoding information, thereby enabling DNA as a replacement for devices like hard disks or tapes for archiving images. To this end, in this joint PhD project, we aim at developing a novel efficient coding/decoding strategy adapted to the nature of the signal to be encoded, i.e., develop signal processing and image compression techniques for enabling high-density storage of unstructured images in DNA. The proposed coding solution should respect two main constraints: (i) the constructed DNA code should take into account biological restrictions and (ii) the constructed DNA code should be robust to sequencing noise, i.e., errors introduced by the sequencing technology. It will be based on previous works developed by the MediaCoding research group of the I3S laboratory [1, 2, 3, 4, 5]. The PhD project will also focus on the means to decode the synthetic DNA faster through the use of machine learning models and through assumptions about the information stored in the DNA (as opposed to natural DNA where no such assumptions can be made).
The post holder will work I3S laboratory in the SIS/MediaCoding research group (http://mediacoding.i3s.unice.fr). The project allows for some flexibility in the profile of applicants. Candidates with expertise in the following areas can be a good fit:
- Image coding,
- Machine learning in its broadest sense,
- DNA, synthetic biology.
All applicants should be able to demonstrate the following:
- A strong computing background with solid programming skills,
- An ability to work with third-party software and to liaise constructively with the developers of such software,
- The ability to work independently and to drive both the research and software development agenda.
The successful applicant will have an MSc (or equivalent) in an area pertinent to the subject area, ideally computer science.
• Highly motivated and good team player.
• Master 2 degree in signal or image processing or in a related discipline.
• Experienced development skills, C/C++ or equivalent, Matlab, Python.
• An experience in the domain of DNA synthesis and sequencing will be appreciated.
• Curiosity, open-mindedness, creativity, persistence, professionalism, responsibility, and a team player are the key personal skills that we are looking for this position.
1. Appuswamy R., Lebrigand K., Barbry P., Antonini M., Madderson, O., Freemont P., MacDonald J. and Heinis T. OligoArchive: Using DNA in the DBMS storage hierarchy. In Conference on Innovative Data Systems Research (CIDR), 2019.
2. Dimopoulou M., Antonini M., Barbry P., and Appuswamy R. A biologically constrained encoding solution for long-term storage of images onto synthetic DNA. EUSIPCO, Sep 2019, A Coruña, Spain.
3. Melpomeni Dimopoulou M., Marc Antonini, Pascal Barbry, Raja Appuswamy. Storing Digital Data into DNA: A Comparative Study of Quaternary Code Construction, ICASSP, May 2020, Barcelona, Spain
4. Melpomeni Dimopoulou, Marc Antonini. Efficient Storage of Images onto DNA Using Vector Quantization Data Compression Conference, (DCC) 2020, Mar. 2020, Utah, United States
5. Dimopoulou M., Antonini M. Image storage in DNA using Vector Quantization, to be published EUSIPCO, Sep. 2020, Amsterdam, The Netherlands.
The student will work at I3S laboratory in the context of a collaboration between CNRS and Imperial College in London.
We talk about it on Twitter!