By continuing to browse the site, you are agreeing to our use of cookies. (More details)
Portal > Offres > Offre UPR3251-IOAVAS-003 - Analyse linguistique multi-niveaux de la variation sonore (H/F)

Multi-level linguistic analysis of spoken variation

This offer is available in the following languages:
Français - Anglais

Ensure that your candidate profile is correct before applying. Your profile information will be added to the details for each application. In order to increase your visibility on our Careers Portal and allow employers to see your candidate profile, you can upload your CV to our CV library in one click!

Faites connaître cette offre !

General information

Reference : UPR3251-IOAVAS-003
Workplace : ORSAY
Date of publication : Thursday, July 30, 2020
Type of Contract : FTC Scientist
Contract Period : 12 months
Expected date of employment : 1 October 2020
Proportion of work : Full time
Remuneration : From 2695,56 to 3107,82 monthly gross wage according to experience.
Desired level of education : 5-year university degree
Experience required : 1 to 4 years

Missions

The work will take place within the framework of the OTELO project funded by DATAIA institue and MSH Paris-Saclay, which proposes a multi-level analysis of spoken language from large oral, segmented and annotated corpora. The working hypothesis is that language is intrinsically ambiguous and polysemic. Linguists aspire to account for this ambiguity in order to understand how language works. Researchers in computer sciences are also concerned with the formalization of linguistic variation for application purposes. Work that are interested in an in-depth description of the language are rare as they involve
knowledge from several scientific communities (SSH and computer sciences, written language vs. oral language etc.). The OTELO project brings together researchers in SHS and digital sciences with the common goal of reporting on language ambiguities through an analysis that combines exploration of sound variation and the status of lexical units in context.

Activities

The work will focus on the statistical analysis and modelling of different patterns of segmental variation (reductions, lenition, fortition, contextual assimilation, etc.) in English and French, taking into account metadata such as parts of speech, syntactic constituents and semantic constraints, and socio-linguistic variables (speech style, speaker profile). Particular attention will be paid to contextual homophonies with the aim of answering the research question that the fusion of contextual information helps the disambiguation of homophonic units involving, in particular, proper nouns.

Skills

La/le candidat idéal.e est titulaire d'un doctorat en linguistique (phonétique et/ou phonologie, analyse instrumentée de grands corpus) ou TAL et connait la problématique de l'analyse des données massives impliquant des méthodes instrumentées (mise en place de scripts pour l'extraction de paramètres acoustiques, modélisation statistique avec des logiciels comme R, modélisation de la variation avec des méthodes inspirées de la transcription automatique de la parole). La programmation est un plus (Pearl, Python) ainsi que les connaissances linguistiques allant au delà du niveau segmental et supra-segmental. Les données de travail seront en français et en anglais, mais des connaissances concernant d'autres langues traitées par le laboratoire d'accueil et pour lequel des données en grande quantité sont disponibles (comme par exemple, chinois mandarin, arabe) sont un plus.

Work Context

The ideal candidate has a PhD in linguistics (phonetics and/or phonology, instrumented analysis of large corpora) or NLP and is familiar with the problems of massive data analysis involving instrumented methods (implementation of scripts for the extraction of acoustic parameters, statistical modelling with software such as R, modelling of variation with methods inspired by automatic speech transcription). Programming is a plus (Pearl, Python) as well as linguistic knowledge going beyond the segmental and supra-segmental level. The working data will be in French and English, but knowledge of other languages processed by the host laboratory and for which large amounts of data are available (e.g. Mandarin Chinese, Arabic) is a plus.

We talk about it on Twitter!