By continuing to browse the site, you are agreeing to our use of cookies. (More details)

PhD student (M/F) in computer science / algorithms for modeling intrinsically disorders proteins

This offer is available in the following languages:
Français - Anglais

Ensure that your candidate profile is correct before applying. Your profile information will be added to the details for each application. In order to increase your visibility on our Careers Portal and allow employers to see your candidate profile, you can upload your CV to our CV library in one click!

Faites connaître cette offre !

General information

Reference : UPR8001-JUACOR-005
Workplace : TOULOUSE
Date of publication : Friday, July 31, 2020
Scientific Responsible name : Juan CORTES
Type of Contract : PhD Student contract / Thesis offer
Contract Period : 36 months
Start date of the thesis : 1 November 2020
Proportion of work : Full time
Remuneration : 2 135,00 € gross monthly

Description of the thesis topic

Summary :

Up to now, computational structural biology problems, such us structure prediction, docking, etc, have been mostly formulated assuming that proteins, in their functional form, are static/rigid molecules. Nevertheless, there is an increasing corpus of work showing the importance of proteins that do no adopt a well-defined three-dimensional form [1,2]. They are the so-called Intrinsically Disordered Proteins (IDPs). IDPs are fully functional despite their lack of a permanent secondary or tertiary structure, and they exploit their plasticity to perform highly specialized tasks that are complementary to these of their globular counterparts [3]. Most IDPs are not pure random coils. Very often, IDPs contain short evolutionary-conserved partially-structured fragments that are responsible for partner recognition and function [4]. Malfunction of disordered proteins due to mutations or the dysregulation of homeostatic or post-translational processes can induce severe diseases, such as cancer or neurodegeneration. The structural properties of IDPs are key to decipher the bases of these functional or pathological processes. In addition to their importance for health, IDPs are of major interest in biomaterials science, where their polymeric nature hosting a large diversity of chemical functionalities offers countless possibilities [5].

Modelling IDPs is extremely challenging, and requires a tight coupling of experimental and computational methods [6,7]. In contrast to structured/globular proteins, IDPs cannot be represented by a single conformation, and their models must be based on ensembles, usually involving thousands of conformations representing a distribution of states that the protein adopts in solution [8,9]. In recent years, researchers at LAAS-CNRS (Toulouse) and CBS (CNRS-Inserm-UM, Montpellier) have collaborated in IDP modeling, and they have developed a new approach to generate realistic conformational ensembles that outperforms previously existing methods [10].

The goal of this thesis, which will be conducted under the co-supervision of Juan Cortés at LAAS-CNRS and Pau Bernadó at CBS, is to get a deeper understanding of the relationship between polypeptide sequence and local structural propensities, which is essential to understand IDP functions. For this, we will build on computational methods that exploit several types of data extracted from experimental methods (X-ray crystallography, NMR and SAXS). The key point of our current approach is a database of tripeptide conformations extracted from high-resolution experimentally solved protein structures, which are organized using tools from applied mathematics and computer sciences (clustering, hierarchical data-structures, …), and then used to build computational models. Our first results demonstrated the capacity of our methodology to construct realistic ensemble models of IDPs that are in agreement with experimental NMR and SAXS data [10]. The aim of this thesis is to go further in this direction, improving and extending our molecular modeling methods to enhance their predictive capabilities. In this sense, we plan to implement machine learning methods and to exploit several recent curated repositories containing IDP sequences, interacting motifs inserted in IDPs, and NMR residue-specific information.

The methodological developments will be implemented in a software prototype that will be tested on two IDPs to experimentally assess its predictive capacity:
(1) p53: TP53 is the most frequently mutated gene in cancer [11]. According to cBioPortal (, the largest database of sequenced cancer cells, several oncogenic mutations have been found in the N- and C-termini of p53, which are intrinsically disordered regions containing interacting motifs for several partners [12]. Software developed during the thesis will be applied to evaluate the structural changes induced by these mutations. The mutants inducing the strongest structural effects will be produced and structurally characterized by NMR in order to validate the predictions.
(2) TIF2: TIF2 is an IDP that regulates gene transcription in several Nuclear Receptors (NRs) [13]. The collaborators at CBS have studied the structural features of a fragment of TIF2 containing three NR binding motifs LLXXLL (manuscript in preparation). Interestingly, these three motifs present different affinity for NRs probably due to their different sequences. The new computational tools developed during the thesis will be used to analyze other sequences that, while maintaining the important Leucine residues, modify the structure of the motifs. In this way, TIF2 will be engineered to have different local and overall binding affinities for NRs. NMR experiments will be conducted to validate our engineered versions of TIF2.

References :
[1] P.E. Wright, H.J. Dyson (2015) Nat Rev Mol Cell Biol, 16:18-29.
[2] V. Csizmok, A.V. Follis, R.W. Kriwacki, J.D. Forman-Kay (2016) Chem Rev, 116:6424-6462.
[3] H. Xie, et al. (2007) J Proteome Res, 6:1882-1898.
[4] P. Tompa, E. Schad, A. Tantos, L. Kalmar (2015) Curr Opin Struct Biol, 35:49-59.
[5] Y. J. Yang, A. L. Holmberg and B. D. Olsen (2017) Annu Rev Chem Biomol Eng, 8:549–575.
[6] D. Eliezer (2009) Curr Opin Struct Biol, 19(1):23-30.
[7] T. N. Cordeiro, F. Herranz-Trillo, A. Urbanek, A. Estaña, J. Cortés, N. Sibille and P. Bernadó (2017) Curr Opin Struct Biol, 42:15-23.
[8] P. Bernadó, L. Blanchard, P. Timmins, D. Marion, R. Ruigrok and M. Blackledge (2005) PNAS, 102:17002-17.
[9] P. Bernadó, M. Blackledge (2010) Nature, 468(7327):1046-8.
[10] A. Estaña, N. Sibille, E. Delaforge, M. Vaisset, J. Cortés, P. Bernadó (2019) Structure, 27(2), 381-391.E2.
[11] C. Kandoth, et al. (2013) Nature, 502:333-339.
[12] H. Tidow, et al. (2007) Proc Natl Acad Sci USA, 104(30):12324-9.
[13] C. Leo, J.D. Chen (2000) Gene, 245:1-11.

Work Context

This work will be carried out within the "Robotics and InteractionS" (RIS) group of LAAS-CNRS, which is developing an original research theme for the modeling of flexible biomolecules based on algorithms inspired by robotics and AI. The work takes place in the context of a collaboration with the Centre de Biochimie Structurale (CBS) in Montpellier and the Institute of Mathematics of Toulouse (IMT).

Constraints and risks

No specific risks or constraints

We talk about it on Twitter!