En poursuivant votre navigation sur ce site, vous acceptez le dépôt de cookies dans votre navigateur. (En savoir plus)

Scientist Digital data storage on synthetic DNA

This offer is available in the following languages:
Français - Anglais

Date Limite Candidature : jeudi 3 février 2022

Assurez-vous que votre profil candidat soit correctement renseigné avant de postuler. Les informations de votre profil complètent celles associées à chaque candidature. Afin d’augmenter votre visibilité sur notre Portail Emploi et ainsi permettre aux recruteurs de consulter votre profil candidat, vous avez la possibilité de déposer votre CV dans notre CVThèque en un clic !

Faites connaître cette offre !

General information

Reference : UMR7271-VIVROS-024
Date of publication : Thursday, January 13, 2022
Type of Contract : FTC Scientist
Contract Period : 12 months
Expected date of employment : 1 March 2022
Proportion of work : Full time
Remuneration : Between 2690.42 and 3099.75 € gross monthly according to experience.
Desired level of education : PhD
Experience required : 1 to 4 years


Storage of digital data is becoming challenging for the humanity due to the relatively short life span of storage devices. At the same time, the “digital universe” (all digital data worldwide) is forecast to grow to over 175 zettabytes in 2025. A significant fraction of this data is called “cold” or infrequently accessed. Old photographs stored by users on Facebook is one such example of cold data; Facebook recently built an entire data center dedicated to storing such cold photographs. Unfortunately, all current storage media used for cold data storage (Hard Disk Drives or tape) suffer from two fundamental problems. First, the rate of improvement in storage density is at best 20% per year, which substantially lags behind the 60% rate of cold data growth. Second, current storage media have a limited lifetime of five (HDD) to twenty years (tape). As data is often stored for much longer duration (50 or more years) due to legal and regulatory compliance reasons, data must be migrated to new storage devices every few years, thus, increasing the price of data ownership.
An alternative approach may stem from the use of DNA, the support of heredity in living organisms. Using DNA to store cold data is an attractive possibility because it is extremely dense, with a raw limit of 1 exabyte/mm3, and long-lasting, with observed half-life much over 500 years. This comes from recent biotechnological developments allowing easy and affordable DNA writing (synthesis) and DNA reading (sequencing). However, one major problem of DNA storage is that all the information stored on DNA suffers the introduction of errors both in the synthesis and in the sequencing phase. Errors take the form of substitutions, insertions and deletions of single nucleotides. Concerning the introduction of errors, the most critical phase is the sequencing of the strands: in this case the choice of different sequencing machines results in significant fluctuations in the number of sequencing errors, since different techniques are available to tackle this task.
In the context of DNA storage, the biotechnological processes can be pushed to their limit with the objective of decreasing the costs and increasing the throughput. In that case, the number of errors will tend to increase. But as long as the error correcting codes, combined with the post-sequencing data processing, are able to correct the signal, it is possible to optimize the whole process. This is particularly true for the synthesis step which is today optimized to provide oligonucleotides of very high quality. The counterpart is that they are limited in size. Relieving the pressure on quality would probably allow for a very significant lengthening of the oligonucleotides size, and then, maximizing the space for payload information.


The objective of this scientist is to quantify through experimental actions the constraints and the signal degradation brought by the different biotechnological processes. To that goal, we will consider the DNA storage process as a transmission channel (as in digital communication) that generates errors providing a noisy signal and model precisely the type of errors (synthesis, storage packaging, DNA long-term degradation, molecule selection, sequencing), to finally design adapted error correcting codes. This work will provide precise error models of the main biotechnological components in order to feed the upper layers with the best information required by the numerical pre- and post-processing. This will take into account the various constraints on the DNA sequences imposed by the synthesis and sequencing technologies: insertions or deletion errors (indels) caused by stretches of long identical nucleotides (homopolymers); required percentage of GC content in DNA sequences.


Skills :
• Highly motivated and good team player.
• PhD in signal or image processing or in a related discipline.
• Experienced development skills, C/C++ or equivalent, Matlab, Python.
• An experience in the domain of DNA synthesis and sequencing will be appreciated.
• Curiosity, open-mindedness, creativity, persistence, professionalism, responsibility, and a team player are the key personal skills that we are looking for in this position.

Work Context

The scientist will work at I3S laboratory in the context european project « OligoArchive » (https://oligoarchive.eu).

We talk about it on Twitter!