LEXICAL AND GRAMMATICAL MARKERS OF EMOTIONS AS PARAMETERS FOR SENTIMENT ANALYSIS OF INTERNET TEXTS IN RUSSIAN
DOI:
https://doi.org/10.17072/2073-6681-2019-3-38-46Keywords:
verbal markers, machine learning, sentiment analysis, ranked classifier, classification of basic emotions, computational linguistics, social mediaAbstract
The article covers intermediate results of the creation of an automatic classifier for Russian-language Internet texts, which distributes those into 8 classes, in accordance with 8 basic emotions proposed by the Swedish biologist Hugo Levheim: ‘anger / rage’, ‘interest / excitement’, ‘enjoyment / joy’, ‘contempt / disgust’, ‘surprise’, ‘shame / humiliation’, ‘fear / terror’, ‘distress / anguish’. The material of the training sample are anonymous texts in the genre of ‘Internet revelations’ posted by users of the social network VKontakte. The operation of the classifier is based on the machine learning algorithm using the support vector machine method. The input parameters are the frequency of the punctuation marks ‘?’, ‘!’, ‘?!’, ‘...’ used, the presence of the negative particle ‘ne’ <not> , the use of constructions ‘takoi <such> + adjective’, ‘tak <so> + adverb’, the collocation ‘kogda lyudi govoryat’ <when people say>, the presence of parceling, question words, particle ‘-to’, lexemes from lexical fields ‘death’, ‘disease’, ‘family’, ‘loneliness’, as well as measure and degree adverbs.The results considered in the paper consist in the validation of the most characteristic verbal markers of specific emotions as parameters that determine the accuracy of the classifier. We conclude that there is a dependence between the efficiency of parameters and the frequency of correlating verbal markers occurrence within emotional text corpora. The achieved accuracy of the classifier is compared with the results of a dummy classifier that performs attribution randomly.In conclusion, the paper highlights the most useful verbal markers, assesses the prospects of this project in terms of practical problems, and raises the question of continuing the study to increase the accuracy of attribution.References
Болотнов В. И. Эмоциональность текста в аспектах языковой и неязыковой вариативности: основы эмотивной стилистики текста. Ташкент: Фан, 1981. 116 с.
Большакова Е. И. и др. Автоматическая обработка текстов на естественном языке и анализ данных / Е. И. Большакова, К. В. Воронцов, Н. Э. Ефремова, Э. С. Клышинский, Н. В. Лукашевич, А. С. Сапин. М.: Изд-во НИУ ВШЭ, 2017. 269 с.
Колмогорова А. В. Вербальные маркеры эмоций в контексте решения задач сентимент-анализа // Вопросы когнитивной лингвистики. 2018. № 1. С. 83–93.
Колмогорова А. В., Калинин А. А. Частотность и сочетаемость соматизмов в текстах различной эмоциональной тональности // Компьютерные и интеллектуальные технологии. 2018. Вып. 17. С. 317–330.
Колмогорова А. В., Калинин А. А., Маликова А. В. Лингвистические принципы и методы компьютерной лингвистики для решения задач сентимент-анализа русскоязычных текстов // Актуальные проблемы филологии и педагогический лингвистики. 2018. № 1(29). С. 139–148.
Шаховский В. И. Эмоции как объект исследования в лингвистике // Вопросы психолингвистики. 2009. № 9. С. 29–42.
Юсупова Н. И., Богданова Д. Р., Бойко М. В. Алгоритмическое и программное обеспечение для анализа тональности текстовых сообщений с использованием машинного обучения // Вестник Уфимского государственного авиационного технического университета. 2018. № 16 (6(51)). С. 91–99.
Bollen J., Mao H., Zeng X. Twitter mood predicts the stock market // Journal of Computational Science. 2011. № 1(2). P. 1–8.
Chetviorkin I. I., Loukachevitch N. V. Sentiment analysis track at romip-2012 // Компьютерная лингвистика и интеллектуальные технологии, по материалам конференции «Диалог-2013». 2013. Т. 2. С. 40–50.
Lövheim H. A New Three-dimensional Model for Emotions and Monoamine Neurotransmitters // Medical hypotheses. 2011. № 78. P. 341–348.
Pang B., Lee L. Opinion Mining and Sentiment Analysis // Foundations and Trends in Information Retrieval. 2008. Vol. 2, № 1–2. P. 1–135.
Pang B., Lee L., Vaithyanathan Sh. Thumbs up? Sentiment classification using machine learning techniques // Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2002. P. 79–86.
VanderPlas J. Python Data Science Handbook: Essential Tools for Working with Data. Sebastopol: O’Reilly Media, 2017. 548 p.
Wiebe J., Riloff E. Creating subjective and objective sentence classifiers from unannotated texts // Computational Linguistics and Intelligent Text Processing. Berlin: Springer, 2005. 486 p.
Witten I. H., Frank E. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) // Burlington: Morgan Kaufmann, 2005. P. 56–63.
References
Bolotnov V. I. Emotsional’nost’ teksta v aspektakh yazykovoy i neyazykovoy variativnosti: osnovy emotivnoy stilistiki teksta [Emotionality of text in the aspects of linguistic and non-linguistic variability: basics of text emotivity]. Tashkent, Fan Publ., 1981. 116 p. (In Russ.)
Bol’shakova E. I., Vorontsov K. V., Efremova N. E., Klyshinskiy E. S., Lukashevich N. V., Sapin A. S. Avtomaticheskaya obrabotka tekstov na estestvennom yazyke i analiz dannykh [Automatic natural language text processing and data analysis]. Moscow, HSE Publishing House, 2017. 269 p. (In Russ.)
Kolmogorova A. V. Verbal’nye markery emotsiy v kontekste resheniya zadach sentiment-analiza [Verbal markers of emotions in sentiment analysis researches]. Voprosy kognitivnoy lingvistiki [Issues of Cognitive Linguistics], 2018, issue 1, pp. 83–93. (In Russ.)
Kolmogorova A. V., Kalinin A. A. Chastotnost’ i sochetaemost’ somatizmov v tekstakh razlichnoy emotsional’noy tonal’nosti [Frequency and compatibility of somatisms in texts of different emotional tonality]. Komp’yuternye i intellektual’nye tekhnologii [Computer and Intellectual Technologies], 2018, issue 17, pp. 317–330. (In Russ.)
Kolmogorova A. V, Kalinin A. A., Malikova A. V. Lingvisticheskie printsipy i metody komp’yuternoy lingvistiki dlya resheniya zadach sentiment-analiza russkoyazychnykh tekstov [Linguistic principles and computational linguistics methods for the purposes of sentiment analysis of Russian texts]. Aktual’nye problemy filologii i pedagogicheskiy lingvistiki [Current Issues in Philology and Pedagogical Linguistics], 2018, issue 1(29), pp. 139–148. (In Russ.)
Shahovskiy V. I. Emotsii kak ob”ekt issledovaniya v lingvistike [Human emotions as an object of the study in linguistics]. Voprosy psikholingvistiki [Journal of Psycholinguistics], 2009, issue 9, pp. 29–42. (In Russ.)
Yusupova N. I., Bogdanova D. R., Boyko M. V. Algoritmicheskoe i programmnoe obespechenie dlya analiza tonal’nosti tekstovykh soobshcheniy s ispol’zovaniem mashinnogo obucheniya [Algorithms and software for sentiment analysis of text messages using machine learning]. Vestnik Ufimskogo gosudarstvennogo aviatsionnogo tekhnicheskogo universiteta [Herald of Ufa State Aviation Technical University], 2018, issue 16 (6(51)), pp. 91–99. (In Russ.)
Bollen J., Mao H., Zeng X. Twitter mood pre-dicts the stock market. Journal of Computational Science, 2011, pt. 1(2), pp. 1–8. (In Eng.)
Chetviorkin I. I., Loukachevitch N. V. Sentiment analysis track at romip-2012. Komp’yuternaya lingvistika i intellektual’nye tekhnologii, po materialam konferentsii «Dialog-2013» [Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference ‘Dialogue’ (2013)], 2013, vol. 2, pp. 40–50. (In Eng.)
Lövheim H. A New Three-dimensional Model for Emotions and Monoamine Neurotransmitters. Medical Hypotheses, 2011, pt. 78, pp. 341–348. (In Eng.)
Pang B., Lee L. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2008, vol. 2, issues 1–2, pp. 1–135. (In Eng.)
Pang B., Lee L., Vaithyanathan Sh. Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002, pp. 79–86. (In Eng.)
VanderPlas J. Python data science handbook: Essential tools for working with data. Sebastopol, O’Reilly Media, 2017. 548 p. (In Eng.)
Wiebe J., Riloff E. Creating subjective and objective sentence classifiers from unannotated texts. Computational Linguistics and Intelligent Text Processing. Berlin, Springer, 2005. 486 p. (In Eng.)
Witten I. H., Frank E. Data mining: Practical machine learning tools and techniques (Second Edition). Burlington, Morgan Kaufmann, 2005, pp. 56–63. (In Eng.)