Faites connaître cette offre !
Reference : UMS2000-CYRMIC-005
Workplace : AUBERVILLIERS
Date of publication : Monday, May 11, 2020
Type of Contract : FTC Scientist
Contract Period : 6 months
Expected date of employment : 1 July 2020
Proportion of work : Full time
Remuneration : monthly gross salary between 2728€ and 3145 € depending on experience
Desired level of education : PhD
Experience required : 1 to 4 years
To survey, experiment and help disseminate data mining of Arabic, Turkish and Persian texts, whether printed or manuscript, along the latest methods developed at international level, in particular in matter of OCR technology. The experiment will consist in the XML-TEI edition of a work from the collection of Maghrebian printed and manuscript texts held by the University Library of Oriental Languages and Civilisations in Paris.
Additional budget is available to commission technical assistance from experts in Computer science and Digital humanities. The work will take place within the activities of GIS MOMM, a federation of research teams specialising in the study of the Middle East and the Muslim World. It includes organising a one-week training session on data mining of primary sources in non-Western languages from Africa, Asia and the Middle East.
- Produce a survey of existing methods and softwares at European and International levels in matter of Optical recognition of Arabic scripture (printed and handwritten), identifying the bottlenecks to be overcome in order to get an Open source tool. The deliverable will be a written scientific report.
- Identify the range of analysis made possible by data mining an OCRed text. The deliverable will take the shape of a demonstrator with permanent URL.
- Encode in XML-TEI and enhance a corpus to be selected in the collection of Maghrebian printed and manuscript texts held by the University Library of Oriental Languages and Civilisations in Paris. The deliverable will be the online publication made with the assistance of encoding expertise.
- Elaborate a pluriannual national plan of training on data mining of primary sources in non-Western languages from Africa, Asia and the Middle East and organise the first session (to be taking place before end of 2020).
- PhD in Arabic Linguistics or Middle Eastern and Islamic studies in the Humanities and Social sciences using Arabic sources (history, art history and archaeology, political science, islamology, linguistics, literature, sociology, anthropology)
- Good awareness of the academic landscape of Middle Eastern studies in France and Europe
- Significant command of digital humanities in the context of Open Science
- Fluency in written and spoken academic English
- Ability to draft syntheses from the data collected
- Ability to relate to a range of interlocutors
The scientist will have office space at the Campus-Condorcet in Paris, within the room allocated to GIS MOMM. She or he will work in close connection with the direction of GIS MOMM (Eric Vallet, Elise Massicard, Mercedes Volait) and its administrative officer (Cyrielle Michineau).
We talk about it on Twitter!