Reference : UMR7107-ANIFOR-009
Workplace : VILLEJUIF
Date of publication : Monday, November 21, 2022
Type of Contract : FTC Scientist
Contract Period : 4 months
Expected date of employment : 1 February 2023
Proportion of work : Full time
Remuneration : 2889.51€ BRUT monthly for less than 2 years experience
Desired level of education : PhD
Experience required : 1 to 4 years
Probing neural representations of speech to identify the limits of model refinement and automatically detect typological properties of languages.
In consultation with the project partners, the researcher will conduct experiments to determine to what extent fine-tuning methods, which are currently at the heart of many approaches in Natural Language Processing, can be used to develop speech processing systems for any language. The person recruited will participate in the definition and implementation of processing chains integrating innovative methods and tools to facilitate linguistic documentation tasks ("computational linguistic documentation").
The person recruited will be responsible for contributing to the various stages of the research:
- participation in the design of experimental protocols, considering different analysis methods used in the community and in particular challenge sets (corpora made up of examples chosen to illustrate certain linguistic problems) and linguistic probes (classifiers trained to predict certain linguistic properties from the representations discovered by neural networks)
- using causal and counterfactual analysis methods to identify information captured by neural networks, by modifying the architecture and/or representations of the neural networks
- collaborating closely with field linguists to identify features of languages that will be relevant for identifying the limitations of state-of-the-art representation models and furthering the community's work on neural representation analysis.
Ph. D. in Linguistics
Familiarity with machine learning tools
Familiarity with a scripting language (preferably Python)
Ability to dialogue and coordinate with different partners: field linguists, computer researchers, engineers
Sense of rigour, organisation and method
The Laboratoire de Langues et Civilisations à Tradition Orale (LACITO, UMR 7107 CNRS - Université Sorbonne Nouvelle) is a research unit affiliated to the Institut des Sciences Humaines et Sociales of the CNRS. The LACITO is working with computer science laboratories on a project entitled "Computational documentation of languages by 2025".
Constraints and risks
We talk about it on Twitter!