RLC

Russian Learner Corpus

Russian language in a multilingual world

Associated Event of the April HSE International Conference

Learn more


April
International
Academic
Conference

What is RLC?

The Russian Learner Corpus (RLC) is a collection of texts produced by two categories of non-standard speakers of Russian: learners of Russian as a Foreign language and speakers of Heritage Russian with different dominant languages. The corpus contains both oral and written production and enables search by morphological properties and a variety of deviations from Standard Russian ranging from mistakes in orthography and grammar to non-standard use of lexical and syntactic constructions.
The preliminary linguistic analysis and tagging is done by the members of the Learner Russian Research Group under Ekaterina Rakhilina (Higher School of Economics).

The majority of texts are coming from teachers of Russian as a second language and/or Heritage language in different countries. RLC comprises both academic and non-academic texts, such as movie and picture descriptions, book summaries, expository essays and others (see HELP).
Part of RLC is RULEC – a longitudinal subcorpus of Academic Writing produced by Heritage and L2 speakers of Russian collected by Olessya Kisselev and Anna Alsufieva of Portland State University over a period of 4 years.
Data on Heritage Russian oral production include the results of experimental studies: frog stories (based on the methodology described in Berman & Slobin 1994; Slobin 2004) and narratives based on a short cartoon (“Nu pogodi!”) (see Isurin & Ivanova-Sullivan 2008 and Polinsky 2008 for more details). Also, see our "Partners"


Metadata

Each text in the Corpus is assigned background information.

Mandatory fields

  • Oral / Written
  • Author’s language background (Heritage / L2)
  • Author’s dominant language
  • Author’s proficiency in Russian

Optional Fields

  • Author’s gender
  • Date
  • Genre

A more elaborated system of text marking is used in RULEC.


Partners

Maria Polinsky (Harvard University)
Olessya Kisselev (Penn State University)
Anna Alsufieva
Evgeny Dengub (Middlebury Langugage Schools)
Irina Dubinina (Brandeis University)
Anna Mikhaylova (University of Oregon)
Alla Smyslova (Columbia University)
Ekaterina Protassova (University of Helsinki)
Anna Pavlova (University of Mainz)
Anna Möhl (Johannes Gutenberg University of Zurich)
Anka Bergmann (Humboldt University of Berlin)
Irina Kor Chahine (University of Nice Sophia Antipolis)
Suhyoun Lee (Seoul National University)
Svetlana Slavkova (Bologna University)
Francesca Biagini (Bologna University)
Monica Perotto (Bologna University)
Svetlana Sokolova (Tromse University)
Natalia Ringblom (University of Stockholm)
Hayashida Rie (Osaka University)
Tsuneto Shogo (Osaka University)
Margarita Kazakevich (Osaka Universty)
Nazija Zhanpeisova (Aktubinsk University)
Alexander Krasovitsky (University of Oxford)
Rashida Kasymova (Al-Farabi Kazakh National University)
Aimgyl Kazkenova (Satbayev University)
Oksana Palikova (University of Tartu)

Languages

Currently, RLC contains production by L2 and Heritage speakers who have as their dominant language:

Abkhaz
Albanian
Amharic
Arabic
Azerbaijani
Bengali
Bulgarian
Chinese
Croatian
Czech
Dagestanian
Dari
Dutch
English
Estonian
Farsi
Finnish
French
Georgian
German (including Swiss German
Hebrew
Hindi
Hungarian
Indonesian
Italian
Japanese
Kazakh
Korean
Lao
Macedonian
Mongolian
Nepali
Norwegian
Pashto
Portuguese
Romanian
Serbian
Shona
Slovak
Slovene
Spanish
Swedish
Tajik
Thai
Turkish
Turkmen
Uzbek
Vietnamese

Search

RLC enables both lexico-grammatical search and exact search. A user can specify morphological and grammatical features of a word, as well as search by types of deviations from Standard Russian (errors). See HELP for detailed information.


Search results

Apart from the original sentence, the user is presented with its two-levelled correction: the first level shows formal corrections (orthography, case forms, gender / number agreement, tense and aspect), the second level displays corrected lexical and constructional violations.


Using RLC

Comprising texts from two different groups of non-sandard speakers of Russian, RLC is a valuable source for various studies in the fields of Second Language Acquisition, Second Language teaching, language interference and theoretical linguistics.

Corpus data and its flexible search system provide a sound basis for comparative research in Heritage and L2 production and enables a deeper insight into complicated phenomena, such as non-standard use of Russian aspect, cases, prepositional phrases, as well as lexical and semantic misuse in multi-word constructions.

Apart from telling a lot about non-standard Russian, RLC is a powerful tool for opening new facets of Standard Russian grammar: deviations in language use help uncover subtle rules that previously have been paid no attention to.


Our team

The corpus was created by the Linguistic Laboratory of Corpus Technologies of National Research University Higher School of Economics:

Chief: Ekaterina Rakhilina

Tagging and Research: Anastasia Vyrenkova
Anastasia Ivanenko
Alina Ladygina
Olga Eremina
Daniil Fedorov
Ekaterina Shnittke
Ekaterina Vlasova
Olga Kultepina
Olga Vedenina
Ivan Smirnov
Kirill Semenov
Kirill Aksenov
Maria Grabovskaya
Sofia Goldina
Students of School of Linguistics (HSE)

Developing:
Elena Sokur
Ekaterina Uetova
Elmira Mustakimova
Timofey Arkhangelskiy

If you have any questions concerning the error classification, the state of the project or partnership, or if you encounter any problems with the corpus' functionality, please, contact the chiefs of the corpus and the developer : small.corpora@gmail.com.


Publications

2017
Vyrenkova A. S., Rakhilina E. V. Learner corpora supporting lexical typology, in: XVII April International Academic Conference on Economic and Social Development: в 4 кн. / Ed.: E.G. Yasin, Vol. 4. М. : HSE Publishing House, 2017. P. 450-460.

2016
Polinsky M., Ekaterina Rakhilina, Anastasia Vyrenkova. Linguistic creativity in heritage speakers // Glossa. 2016. Vol. 43. P. 1-29.

Ekaterina Rakhilina, Anastasia Vyrenkova, Elmira Mustakimova, Alina Ladygina, Ivan Smirnov. Building a learner corpus for Russian // Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016. http://aclweb.org/anthology/W16-65

Zarifyan M., Melnik A. A., Vyrenkova A. S. A case of using a Multilingual Database of synonyms for designing lexical drills / NRU HSE. Series WP BRP "Linguistics". 2016.

Рахилина Е. В. О новых инструментах описания русской грамматики: корпус ошибок // Русский язык за рубежом. 2016. № 3. С. 20-25

Рахилина Е.В., Ладыгина А.А. Русские конструкции со значением чередования ситуаций. Язык: поиски, факты, гипотезы. Лексрус Москва, 2016. С. 320-335

2015
Рахилина Е. В., Выренкова А. С. Корпусные исследования особенностей речи нестандартных говорящих ("херитажный русский") // Acta Linguistica Petropolitana. Труды института лингвистических исследований. 2015. Т. XI. № 1. С. 621-639.

Ладыгина А.А. Изменения в предложных конструкциях в эритажном русском (Russian Heritage language)//IV конференция «Русский язык:конструкционные и лексико-семантические подходы», Санкт-Петербург, 16-18 апреля 2015 г.

K. Rakhilina, O. Kisselev, E. Smolovskaya, E. Mescheryakova. Доклад: Russian in the English mirror: (non)grammatical constructions in learner Russian. Corpus Linguistics 2015 (Lancaster).

Е. Смоловская. Доклад: Ошибки нестандартных говорящих: некоторые особенности русской речи иностранцев с доминирующим английским. XIII КОНГРЕСС МАПРЯЛ «Русский язык и литература в пространстве мировой культуры» (Гранада)

2014
Полинская М., Рахилина Е. В., Выренкова А. С. Грамматика ошибок и грамматика конструкций: «эритажный» («унаследованный») русский язык // Вопросы языкознания. 2014. № 3. С. 3-19.

Rakhilina E. V., Vyrenkova A. S. Language Interference in Heritage Russian: Constructional Violations / Working papers by NRU HSE. Series WP BRP "Linguistics". 2014. No. 11.

Ладыгина А.А. Русские эритажные конструкции: корпусное исследование. Дипломная работа. Москва, МГУ

Ладыгина А.А. Семантика конструкций «Х обладает Y», «X владеет Y» (корпусное исследование)//Постерный доклад на I Международной научно-практической конференции «Корпусные технологии и компьютерные методы в современной гуманитарной науке», НИУ-ВШЭ, Нижний Новгород, 11-12 апреля 2014

2013
Ладыгина А.А. Корпус Russisch in Deutschland: состав и особенности разметки// Материалы международной научно-практической конференции "Корпусные технологии. Digital Humanities и современное знание", Нижний Новгород 18-19 октября 2013 г.

Рахилина Е. В., Выренкова А. С. Ошибки в речи херитажных говорящих (на материале текстов русских эмигрантов в США) // В кн.: Проблемы онтолингвистики - 2013 / Рук.: Т. Круглякова; сост.: Т. Круглякова; отв. ред.: Т. Круглякова; под общ. ред.: Т. Круглякова; науч. ред.: Т. Круглякова. СПб. : Российский государственный педагогический университет им. А.И. Герцена, 2013. С. 435-439.

Рахилина Е.В., Ладыгина А.А. То взлёт, то посадка// Тезисы докладов, третья конференция "Русский язык: конструкционные и лексико-семантические подходы", ИЛИ РАН, СПб, 12-14 сентября 2013


Links