LEXICAL DIVERSITY, LENGTH OF TEXT, FREQUENCIES OF PARTS OF SPEECH AS INDICATORS OF SOCIOLINGUISTIC VARIATION

Authors

  • Ekaterina S. Khudiakova Perm State University
  • Stepan D. Kiselev Perm State University

Keywords:

sociolinguistic variation, gender, age, specialty, education level, lexical diversity, length of text in word forms, frequency of parts of speech.

Abstract

The study examines the extent to which three indicators (the lexical diversity (LD), the length of text in tokens (word forms), and the frequency of different parts of speech) demonstrate sociolinguistic variation in oral monologues. Based on a balanced sample of authors (N=48), adjusted for factors gender, age, specialty, and education level, statistical indicators of sample differences were calculated. The study utilized exclusively machine analysis methods using Python scripts. The results showed that, in the studied material, the lexical diversity parameter differs only depending on the "age" factor; the volume also differentiates the texts of younger and older informants. The frequencies of parts of speech vary significantly between the texts of men and women, and their qualitative differences likely reflect text-generation strategies.

Author Biographies

Ekaterina S. Khudiakova, Perm State University

Assistant Professor, Theoretical and Applied Linguistics Department

Stepan D. Kiselev, Perm State University

Student, Faculty of Philology

References

Беляева А.С., Ерофеева Е.В. Зависимость частеречного варьирования устных спонтанных монологов от темы текста и социальных параметров говорящих // Филология в XXI веке. 2020. № 2(6). С. 77–88.

Богданова-Бегларян Н.В. и др. Некоторые инвариантные характеристики русской разговорной речи: фонетика, морфология, синтаксис / Богданова-Бегларян Н.В., Блинова О.В., Мартыненко Г.Я., Шерстинова Т.Ю. // Компьютерная лин-гвистика и интеллектуальные технологии: по материалам междунар. конф. 87 «Диалог 2017». 2017. [Электронный ресурс]. URL: https://publi-cations.hse.ru/mirror/pubs/share/direct/213482429 (дата обращения: 27.10.2025).

Ерофеева Т.И. Социолект: стратификацион-ное исследование: монография / Т.И. Ерофеева. Пермь: Перм. гос. ун-т, 2009. 240 с.

Захарова Е.Ю., Савина О.Ю. Лексическое разнообразие текста и способы его измерения // Вестник Тюменского государственного университета. Гуманитарные исследования. Humanities. 2020. Т.6. №1 (21). С. 20–34.

Литвинова Т.А. и др. Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count / Литвинова Т.А., Литвинова О.А., Рыжкова Е.С., Бирюкова Е.Д., Середин П.В., Загоровская О.В. // Научный диалог. 2015. № 12 (48). С. 101–109.

Covington M.A., McFall J.D. Cutting the Gordian Knot: The Moving-Average Type-Token Ratio (MATTR) // Journal of Quantitative Linguistics. 2010. Vol. 17 (2). Pp. 94–100.

Dubois S., Sankoff D. The Variationist Approach toward Discourse Structural Effects and Sociointeractional Dynamics // The Handbook of discourse analysis / D. Schiffrin, D. Tannen and H. Hamilton (eds.). Malden, Massachusetts / Oxford: Blackwell Publishers Inc., 2001. Pp. 282–303.

Hess C.W. et al. The Type-Token Ratio and vocabulary performance / Hess C.W., Ritchie K.P., Landry R.G. // Psychological Report. 1984. Vol. 55 (1). Pp. 51–57.

Jarvis S. Capturing the Diversity in Lexical Diversity // Language Learning. 2013. Vol. 63. Pp. 87–106.

Liimatta A. Register variation across text lengths: Evidence from social media // International Journal of Corpus Linguistics. 2023. Vol. 28. №. 2. Pp. 202–231.

Litvinova T. et al. Differences in type-token ratio and part-of-speech frequencies in male and female Russian written texts / Litvinova T., Seredin P., Litvinova O., Zagorovskaya O. // Proceedings of the Workshop on Stylistic Variation / Copenhagen: Association for Computational Linguistics. 2017. Pp. 69–73.

McCarthy P.M., Jarvis S. Vocd: A theoretical and empirical evaluation // Language Testing. 2007. Vol. 24 (4). Pp. 459–488.

Mohd Razali N., Yap B. Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests // Journal of Statistical Modeling and Analytics (JOSMA). 2011. № 2. С. 21–33.

Singh S. A Pilot Study on Gender Differences in Conversational Speech on Lexical Richness Measures // Literary and Linguistic Computing. 2001. Vol. 16 (3). Pp. 251–264.

Yu G. Lexical Diversity in Writing and Speaking Task Performances // Applied Linguistics. 2010. Vol. 31 (2). Pp. 236–259.

Zenker F., Kile K. Investigating minimum text lengths for lexical diversity indices // Assessing Writing. 2021. Vol. 47. Pp. 1–15.

Published

2026-01-02

Issue

Section

Статьи