En poursuivant votre navigation sur ce site, vous acceptez le dépôt de cookies dans votre navigateur. (En savoir plus)

Plant-insect interactions: Genomic approach to study plant adaptations and diversification of Papilionidae (M/W)

This offer is available in the following languages:
Français - Anglais

Date Limite Candidature : lundi 15 août 2022

Assurez-vous que votre profil candidat soit correctement renseigné avant de postuler. Les informations de votre profil complètent celles associées à chaque candidature. Afin d’augmenter votre visibilité sur notre Portail Emploi et ainsi permettre aux recruteurs de consulter votre profil candidat, vous avez la possibilité de déposer votre CV dans notre CVThèque en un clic !

General information

Reference : UMR5554-FABCON-003
Date of publication : Monday, July 25, 2022
Scientific Responsible name : Fabien Condamine
Type of Contract : PhD Student contract / Thesis offer
Contract Period : 36 months
Start date of the thesis : 1 October 2022
Proportion of work : Full time
Remuneration : 2135€ gross (1715€ net monthly, excluding withholding taxes)

Description of the thesis topic

Context of the thesis
Phytophagous insects account for more than 25% of the known biodiversity on Earth (Stork 2018). To explain this extraordinary diversity it has been proposed that their diversification is directly linked to plant diversity. Indeed, plants and insects have diversified in parallel over the last 450 million years (Labandeira & Sepkoski 1993). The rapid diversification of angiosperms, particularly in the Late Cretaceous and early Cenozoic (Bell et al. 2010; Friis et al. 2011; Beaulieu et al. 2015; Magallón et al. 2015), has been a source of ecological opportunities that may explain the diversity of phytophagous insects (Hunt et al. 2007; McKenna et al. 2009; Ahrens et al. 2014; Kawahara et al. 2019).
Fifty years ago, Ehrlich and Raven (1964) conducted a detailed analysis of the relationships between butterflies and their host plants. They argued that plants and phytophagous insects are engaged in an arms race. According to this model ('escape and radiate coevolution model of mutual diversification' (Thompson 1989)), groups of plants that develop new defenses temporarily escape phytophagous insects and diversify rapidly (Ehrlich & Raven 1964). However, these defenses can be circumvented by certain groups of phytophagous insects. The latter then have a selective advantage (new resources to exploit) and can then diversify on plants whose defenses have been neutralized. This arms race thus reciprocally stimulates the diversification of plants and phytophagous insects, through the permanent development of evolutionary novelties. According to Ehrlich and Raven (1964), these novelties correspond mainly (for plants) to toxic secondary compounds and the associated detoxification mechanisms (for phytophagous insects). This theory could explain why these groups account for a very large proportion of the terrestrial biodiversity (Farrell 1998; Janz et al. 2006; Janz 2011; Suchan & Alvarez 2015).
To test the predictions of Ehrlich and Raven, this thesis consists of conducting a genomic survey of molecular evolution in swallowtail butterflies in order to reveal the genomic consequences of host-plant shifts versus host-plant conservatism as determined by ancestral states reconstructions (Allio et al. 2021). The questions we aim to answer by the completion of this thesis are: (1) Are there any signatures of adaptation in the genomes caused by changes in host plants for the different groups of Papilionidae? If yes, we will ask: (2) What genomic changes have evolved in response to the selective constraints imposed by host-plant shifts versus conservatism? (3) Do the same genomic changes underlie host-plant shifts to different host-plant families in independent butterfly lineages? (4) How widespread is positive selection for shifting/non-shifting butterfly lineages?

We will perform a large-scale comparative analysis of the evolution of Papilionidae genomes and focus on their genomic changes following host-plant family changes in deep branches of the phylogeny to (1) provide information on the portions of genomes that have undergone positive or purifying selection, and (2) identify, within the portions that have undergone positive selection, the corresponding genes in order to ultimately study their roles and functions. We will also study whether genomic changes following recent (last 10 million years) host-plant shifts show the same pattern as those involved in deep branches of phylogeny.

Data acquisition and genome assembly: The whole shotgun genomic data have been generated during an on-going PhD thesis and will be used for the genomic survey. 50x sequencing coverage will allow assembly of genomic contigs containing exons of protein-coding genes that are our main targets for these objectives. The draft genomes will be assembled using SOAPdenovo2 (Luo et al. 2012) and Platanus (Kajitani et al. 2014). Then, BLAST searches will be carried out on each genome and for all proteins available on GenBank for Papilio xuthus (which will serve as a reference for Papilionidae genomes, Futahashi et al. 2012) using tblastn (Altschul et al. 2010) and Blast2GO (Conesa et al. 2005). The BLAST searches will be performed on the protein-coding genes present in the reconstructed scaffolds from the genome assembly. Only sequences with at least 60% amino acid similarity will be extracted for each of the protein-coding genes of P. xuthus. If our genome assemblies are fragmentary, as expected for genomes assembled without long reads, several scaffolds can be blasted on the same protein-coding gene. In this case, the different scaffolds will be aligned using translatorX (Abascal et al. 2010) and AGILE (Hughes & Teeling 2018) to obtain the most complete (consensus) sequence for each protein-coding gene and each species. We will detect and remove cross-species contamination in our data using specific tools (Simion et al. 2018).
We will use OrthoFinder (Emms & Kelly 2015) to identify orthologous protein groups, and groups made of single-copy orthologs from all genomes will be extracted and aligned. For all protein-coding genes, we will first compare our protein sets with the established protein sets from all published Lepidoptera genomes (lepbase.org). Contigs of 1-to-1 orthologs will be added to corresponding butterfly protein-coding gene alignments using the MACSE alignment program (Ranwez et al. 2018), which returns both codon and amino acid alignments. Finally, we will clean the alignments from ambiguous positions using HMMCleaner (Philippe et al. 2017) and trimAl (Capella-Gutiérrez et al. 2009). Applying a similar approach, one of our on-going studies found 6621 orthologous nuclear genes with an average of 40x coverage, even for low quality DNA libraries (Allio et al. 2021).

Genome-wide estimation of molecular evolution: We will adapt the approach of Jobson et al. (2010) who performed evolutionary genome scans of genes in long-lived mammal species to identify genes presenting an excess (indicating adaptation) or depletion (indicating purifying selection) of non-synonymous substitutions compared to synonymous substitutions in all branches where a host-plant shift is identified relative to branches with no host-plant shift. Thus, we will specifically look for positively selected genes in the non-shifting species and ancestral branches as controls versus shifting lineages as tests. This method has the advantage of detecting genes where acceleration or slow-down in substitution rate occurred along the branches leading to the new host plant, compared to substitution rate along branches that did not change of host plant, will identify genes evolving under positive or purifying selection. To do so, we will estimate the ratio of non-synonymous substitution (dN) over synonymous substitution (dS) (Yang & Nielsen 2000). Genes under positive selection are characterised by high dN/dS (supérieur à 1) while genes under purifying selection have low dN/dS (inférieur à 1). Analyses of dN/dS are traditionally performed with 1:1 orthologous sequences, but we will also explore an application of these analyses to gene families.
In swallowtail butterflies, adaptation to a host plant can progress on specific genes rather than genome-wide (Berenbaum et al. 1996). Accordingly, analyses will be conducted for individual genes as well as on genes grouped by Gene Ontology categories, in order to potentially identify particular functional categories that might have been affected by evolution in association with new host plants. We will therefore estimate genetic changes in candidate genes throughout the swallowtail phylogeny, like cytochrome P450. We expect that genes involved in herbivory will show a high dN/dS ratio in branches leading to new host plants but low dN/dS in non-shifting swallowtails. However, because such host-plant shifts not only alter candidate genes but may also influence the whole genome (Thompson et al. 1990; Edger et al. 2015), we will perform a genome scan of dN/dS over all genes and across all branches (see Allio et al. 2021) using CODEML (branch-site model) as implemented in PAML (Yang 2007).

Ancient versus recent host-plant shifts: We will assess whether genomic changes are the same between recent (less than 10 million years) and ancient (more than 10 million years) host-plant shifts across the phylogeny. The last 10 million years of the Cenozoic witnessed global climate cooling while the early Cenozoic was characterized by a warm period with tropical regions extending into higher latitudes. Interplay between climate change and host-plant shifts may have impacted both genomic and species diversification (Condamine et al. 2012, 2018). Comparing molecular evolution through time can yield insight into the role of climate change and host-plant shifts. In addition, we will investigate the genomic signature of speciation between recently diverged species that do not have the same feeding habit.

Means to perform the analyses: Genome assembly and genome-scan analyses are power and time consuming. To perform these analyses, we will rely on two ERC-funded computers (specifications: 64 CPUs and 1 To of memory per machine) and we will have access to the computational power of the Montpellier Bioinformatics and Biodiversity platform hosted at the host institution.

Work Context

This PhD thesis will take place at the Institut des Sciences de l'Evolution de Montpellier (ISEM), which is a laboratory developing research on the origin and dynamics of biodiversity, and on the modalities and mechanisms of its evolution (see the lab website: https://isem-evolution.fr). The laboratory hosts more than 250 people.

Constraints and risks

This PhD thesis will require a lot of work in bioinformatics so a lot of screen time and office time.

Additional Information

The GAIA ERC project is led by Fabien Condamine (https://fabiencondamine.org/) of the Phylogeny and Molecular Evolution team at the Institut des Sciences de l'Evolution of Montpellier (ISEM) UMR 5554 CNRS-UM-IRD-EPHE (https://isem-evolution.fr/en/), located in the University of Montpellier. This integrative project is the subject of regular meetings with the other people involved in the laboratory in which the candidate must participate.

We talk about it on Twitter!