СПОСІБ ПОБУДОВИ ПРОГРАМНИХ ДЕТЕКТОРІВ ДЛЯ ВИЯВЛЕННЯ ПРОГРАМНИХ БОТІВ В СОЦІАЛЬНИХ МЕРЕЖАХ

Lesya LYUSHENKO; Yaroslav PEREHUDA

doi:10.32689/maup.it.2024.1.8

Authors

Lesya LYUSHENKO https://orcid.org/0000-0003-4319-5955
Yaroslav PEREHUDA https://orcid.org/0009-0002-7292-7887

DOI:

https://doi.org/10.32689/maup.it.2024.1.8

Keywords:

large language models (LLM), neural networks, metadata analysis, software bots, social networks

Abstract

The purpose of this work is to study in detail the effectiveness of using large language models (LLM) to detect software bots in social networks. The work focuses on analyzing the effectiveness of different detection methods and determining the potential of LLM as a means to improve the accuracy and efficiency of the bot identification process. The study covers the analysis of three main approaches to bot detection: metadata analysis, text analysis, and graph analysis. Both traditional machine learning methods and the latest LLM are analyzed for their ability to analyze big data from social networks. The main technique is benchmarking, which involves the use of extended datasets such as TwiBot20 and TwiBot-22 to evaluate the performance of each method using metrics such as accuracy and F1-measure. It provides an objective view of the performance of different approaches to bot detection. The scientific novelty of this work is the use of LLM to analyze various types of data from social networks to detect software bots. The authors consider the integration of LLM into traditional detection methods, which allows adapting detection processes to the complex behavior of software bots, ensuring high accuracy and efficiency. Conclusions. LLMs demonstrate high efficiency in detecting software bots, outperforming traditional methods by some indicators. However, given the computational demands of LLM, the authors recommend considering hybrid approaches that combine the advantages of LLM with the efficiency of traditional methods to optimize resource usage and provide a more robust and adaptive bot detection system. This approach can improve the overall performance of bot detection systems, reduce computing resource costs, and provide more accurate and effective detection of malicious actors in social networks. Further research is recommended to improve the integration of LLM into bot detection systems, especially in the context of the dynamic behavior of social networks and the evolution of software bots.

References

Allem J.-P., Ferrara E. The importance of debiasing social media data to better understand e-cigarette related attitudes and behaviors. Journal of medical Internet research. 2016. Vol. 18. № 8.

BIC: Twitter bot detection with text-graph interaction and semantic consistency. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics / Lei Z. et al. Canada, 2023. Vol. 1. P. 10326–10340.

Bot2Vec: A general approach of intra-community oriented representation learning for bot detection in different types of social networks. Information Systems / Pham P. et al. 2022. Vol. 103. DOI: 10.1016/j.is.2021.101771.

BotOrNot: A system to evaluate social bots. Proceedings of the 25th International Conference Companion on World Wide Web / Davis C.A. et al. 2016. P. 273–274.

BotRGCN: Twitter bot detection with relational graph convolutional networks. Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining / Feng S. et al. IEEE, 2021. P. 236–239.

Cresci S. A decade of social bot detection. Communications of the ACM. 2020. Vol. 63. № 10. P. 72–83.

Detect me if you can: Spam bot detection using inductive representation learning. Companion proceedings of the 2019 World Wide Web conference / Ali A.S. et al. 2019. P. 148–153.

Detecting bots in social-networks using node and structural embeddings. Journal of Big Data / Dehghan A. et al. 2023. Vol. 10. № 1. P. 1–37.

Dukic D., Keca D., Stipic D. Are you human? detecting bots on twitter using BERT. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2020. P. 631– 636.

Ferrara E. What types of covid-19 conspiracies are populated by twitter bots? First Monday, 25(6), 2020. DOI: 10.5210/fm.v25i6.10633.

Howard P.N., Kollanyi B., Woolley S. Bots and automation over twitter during the US election. Computational propaganda project : working paper series. 2016. № 21(8).

Kudugunta S., Ferrara E. Deep neural networks for bot detection. Information Sciences. 2018. № 467. P. 312–322.

Llama 2: Open foundation and fine-tuned chat models / Touvron H. et al. 2023. (Preprint arXiv:2307.09288).

MetaICL: Learning to learn in context. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies / Min S. et al. 2022. P. 2791–2809.

Mistral 7b / Jiang A.Q. et al. 2023. (Preprint arXiv:2310.06825).

Multi-modal social bot detection: Learning homophilic and heterophilic connections adaptively. Proceedings of the 31st ACM International Conference on Multimedia / Li S. et al. ACM, 2023 P. 3908–3916.

Predicting online extremism, content adopters, and interaction reciprocity. Social Informatics / Ferrara E. et al. Bellevue, 2016. P. 22–39.

Roberta: A robustly optimized BERT pretraining approach / Liu Y. et al. 2019. (Preprint arXiv:1907.11692).

Rumelhart D.E., Hinton G.E., Williams R.J. Learning Internal Representations by Error Propagation, Parallel Distributed Processing, Explorations in the Microstructure of Cognition. Biometrika / ed. Rumelhart D.E., McClelland J. 1986. Vol. 1. № 71. P. 599-607.

Sasaki Y. The truth of the F-measure. Teach tutor mater. 2007. Vol. 1. № 5. P. 1-5.

Satar: A self-supervised approach to twitter account representation learning and its application in bot detection. Proceedings of the 30th ACM International Conference on Information & Knowledge Management / Feng S. et al. ACM, 2021. P. 3808–3817.

Scalable and generalizable social bot detection through data selection. Proceedings of the AAAI conference on artificial intelligence / Yang K.-C. et al. 2020. Vol. 34. P. 1096–1103.

Shahi G.K., Dirkson A., Majchrzak T.A. An exploratory study of COVID-19 misinformation on Twitter. Online social networks and media. 2021. № 22.

Social bot-aware graph neural network for early rumor detection. Proceedings of the 29th International Conference on Computational Linguistics / Huang Z. et al. Gyeongju, 2022. P. 6680– 6690.

Taxonomy of risks posed by language models. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency / Weidinger L. et al. ACM, 2022. P. 214–229.

The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. Proceedings of the 26th international conference on world wide web companion / Cresci S. et al. 2017. P. 963–972.

Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems / Ouyang L. et al. 2022. № 35.

Twibot-20: A comprehensive twitter bot detection benchmark. Proceedings of the 30th ACM International Conference on Information & Knowledge Management / Feng S. et al. 2021. P. 4485–4494.

Twibot-22: Towards graph-based twitter bot detection. Advances in Neural Information Processing Systems / Feng S. et al. 2022. Vol. 35. P. 35254–35269.

Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. American journal of public health / Broniatowski D.A. et al. 2018. Vol. 108. № 10. P. 1378–1384.

Wei F., Nguyen U.T. Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings. 2019 First IEEE International conference on trust, privacy and security in intelligent systems and applications (TPSISA). IEEE, 2019. P. 101–109.

Yang K.-C., Ferrara E., Menczer F. Botometer 101: Social bot practicum for computational social scientists. Journal of Computational Social Science. 2022. Vol. 5. № 2. P. 1511–1528.

METHOD OF BUILDING SOFTWARE DETECTORS FOR DETECTING SOFTWARE BOTS IN SOCIAL NETWORKS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Language