Topic Modelling in Computer Security Discourse: a Case Study of Whitepaper Publications and News Feeds
DOI:
https://doi.org/10.17072/2073-6681-2022-2-18-26Keywords:
topic modelling; computer security discourse; KNIME; infodemiology; political incidental news exposure; content analysis; RSS feeds; cognitive-discursive linguisticsAbstract
Up-to-date information plays a crucial role in modern linguistic research. For this reason,computational linguistic methods, including those aided with analytical and machine-learning tools, are attracting growing attention. Some of their applications in cognitive-discursive linguistics are keyword extraction, topic modelling, and content analysis. Text-mining tools facilitate time-consuming linguistic work andadd to the results’ reliability and greater statistical precision by processing a significantly larger data volume.Most studies, however, have overlooked interference of socially significant but context-irrelevant (e.g. political) information into a specialized discourse by focusing mainly on one data format. The current study,aimed at topic modelling, has been carried out on the computer security discourse. We have implemented theproject on the KNIME analytical platform. The model enables comparison between topics extracted frompublished articles and date-specific RSS news feeds. The study provides important insights into infodemiology and political incidental news exposure occurring in computer-security-oriented RSS feeds on theKaspersky website but untraceable in the papers published on the same website in a PDF format. The resultsreported here provide further evidence for the need to consider the hypercontext of professional communication and employ real-time data in solving similar problems within cognitive-discursive linguistics.Our contribution to the development of cognitive-discursive linguistics is the method for comparingtopics within one discourse, taking into account near-real-time data. For computational linguistics, the significance of our work lies in describing a new application of the topic extraction workflow freely available onthe KNIME hub.References
Budaev E. Metaphors of disease in the Russian press, XLinguae. 2021, vol. 10, issue 2, pp. 30-37. doi 10.18355/XL.2017.10.02.03. (In Russ.)
Chudinov A. P., Sergienko N. A., Glushak V. M. Good, Evil, Truth, Lie in Russian, Ukrainian, Brit¬ish, and American linguo-cultures: Results of a psy¬cholinguistic experiment. Sibirskiy Filologicheskiy Zhurnal [The Siberian Journal of Philology], 2021, issue 2, pp. 297-311. doi 10.17223/18137083/75/21 (In Russ.)
Dancy-Scott N., Dutcher G. A., Keselman A., Hochstein C., Copty C., Ben-Senia D., Rajan S., Asencio M. G., Choi J. J. Trends in HIV terminolo¬gy: Text mining and data visualization assessment of international AIDS conference abstracts over 25 years. JMIR Public Health and Surveillance, 2018, vol. 4, issue 5. doi 10.2196/PUBLICHEALTH.8552. (In Eng.)
Dewi A., Thiel K. Topic extraction: Optimizing the number of topics with the elbow method. KNIME, June 19, 2017. Available at:
https://www.knime.com/blog/topic-extraction-opti- mizing-the-number-of-topics-with-the-elbow-met- hod (accessed 30 Apr 2022). (In Eng.)
Document Vector Node. KNIMETV, Decem¬ber 9, 2020. Available at: https://www.youtube. com/watch?v=kLlmCWnknhE (accessed 30 Apr 2022). (In Eng.)
Flores-Ruiz D., Elizondo-Salto A., Barroso- Gonzalez M. d. l. O. Using social media in tourist sentiment analysis: A case study of Andalusia during the Covid-19 pandemic. Sustainability, 2021, vol. 13, issue 7 (3836), pp. 1-19. doi
3390/SU13073836. (In Eng.)
Ertek G., Kailas L. Analyzing a decade of wind turbine accident news with topic modeling. Sustain¬ability, 2021, vol. 13, issue 12757, pp. 1-34. doi 10.3390/su132212757 (In Eng.)
Isaeva E., Baiburova O., Manzhula O. Anthro¬pomorphism in computer security terminology through the prizm of smart cognitive framing. Sci¬ence and Global Challenges of the 21st Century - Science and Technology. Perm Forum 2021. Lecture Notes in Networks and Systems. 2022, vol. 342, pp. 460-474. doi 10.1007/978-3-030-89477-1_46. (In Eng.)
Isaeva E. V. Metaphor in terminology: Finding tools for efficient professional communication. Fachsprache, 2019, vol. 41, special issue 1. doi 10.24989/fs.v41is1.1766. (In Eng.)
Isaeva E. V., Crawford R. Semantic framing of computer viruses: The study of semantic roles’ dis¬tribution. Vestnik Permskogo universiteta. Ros- siyskaya i zarubezhnaya filologiya [Perm University Herald. Russian and Foreign Philology], 2019, vol. 11, issue 1, pp. 5-13. doi 10.17072/2073-6681¬2019-1-5-13. (In Eng.)
Gustafson N., Pera, M. S., Ng, YK. Generating fuzzy equivalence classes on RSS news articles for retrieving correlated information. In: Gervasi O., Murgante B., Lagana A., Taniar D., Mun Y., Gav¬rilova M. L. (eds) Computational Science and Its Applications - ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science. 2008. Springer, Berlin, Heidelberg, vol. 5073, pp. 232-247. doi 10.1007/978-3-540-69848-7_20. (In Eng.)
Lee C., Lim C. From technological develop¬ment to social advance: A review of Industry 4.0 through machine learning. Technological Fore¬casting and Social Change, 2021, vol. 167 (120653). doi 10.1016/J.TECHFORE.2021. 120653. (In Eng.)
Liew T. M., Lee C. S. Examining the utility of social media in Covid-19 vaccination: Unsupervised learning of 672,133 twitter posts. JMIR Public Health and Surveillance, 2021, vol. 7, issue 11, pp. 1-19. doi 10.2196/29789. (In Eng.)
Liu Y., Zavarsky P., Malik Y. Non-linguistic features for cyberbullying detection on a social media platform using machine learning. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science, vol. 11982. Springer, Cham, pp. 391-406. doi 10.1007/978-3-030-37337-5_31. (In Eng.)
Matthes J., Nanz A., Stubenvoll M., Heiss R. Processing news on social media. The political inci¬dental news exposure model (PINE). Journalism, 2020, vol. 21, issue 8, pp. 1031-1048. doi: 10.1177/1464884920915371. (In Eng.)
Mukhametzyanova L. R., Mardieva L. A., Chud¬inov A. P. The titles of newspapers and magazines as artifacts of the epoch. Journal of Research in Ap¬plied Linguistics, 2020, vol. 11, pp. 400-405. doi 10.22055/RALS.2020.16338. (In Eng.)
Photiou A., Nicolaides C., Dhillon P. S. Social status and novelty drove the spread of online infor¬mation during the early stages of COVID-19. Scien¬tific Reports, vol. 11, issue 1 (20098). doi 10.1038/S41598-021-99060-Y. (In Eng.)
Sebestyen V., Domokos E., Abonyi J. Multilayer network based comparative document analysis (MUNCoDA). MethodsX, 2020, vol. 7, 100902. doi 10.1016/J.MEX.2020.100902. (In Eng.)
Wu Y. C. Multilingual news extraction via stop¬word language model scoring. Journal of Intelli¬gent Information Systems, 2017, vol. 48, issue 1, pp. 191-213. doi 10.1007/S10844-016-0395-6. (In Eng.)