Mathematical modeling of recommender system and data processing of a telecommunications company using machine learning models

Nikita A. Andriyanov, Madina-Bonu R. Atakhodzhaeva, Evgeny I. Borodin

Abstract


The purpose of the study is to develop data modeling methods for projecting recommender algorithms using doubly stochastic autoregressive models of random processes and checking their adequacy by applying machine learning algorithms to cluster users in a simulated data set and predict probabilities of interest. Research methods. The article discusses the methods used in the construction of recommender systems. At the same time, the problem of modeling user behavior using a doubly stochastic model is considered. This model is proposed for generating artificial data. The doubly stochastic model allows generating non-stationary processes, thus creating users with different probabilistic properties in different groups of objects of interest. After that, artificially created users (and their activity) are clustered based on a modified K-means algorithm. The main modification is the need for automatic pre-estimation of the number of clusters, and not its choice by a person. Next, the behavior of representatives of each user group for new events is modeled. Based on the generated information and training data, the problem of predictiing and ranking the services offered is solved. At the same time, at the first stage, the use of regression models is sufficient to assign users to a group and form offers for this user. Results of the study. On the training data in 2 clusters, high determination indices were achieved, which indicates approximately 90% of the explained variance when using the proposed doubly stochastic model. Particular attention is paid to the work of modern recommender systems on the example of the Disco system developed by Yandex. In addition, pre-processing and preliminary analysis of data from the real sector was performed, namely, the data of a telecommunications company are being studied. For the purpose of issuing relevant proposals for communication services, a test recommender system has been developed. Conclusion. Thus, the main results of the work include a mathematical model that simulates the reaction of users to various services, as well as a logistic regression model used to predict the probability of a user's interest in a new service. Based on predicted probabilities, it is not difficult to rank new proposals. Approbation on the synthesized data showed the high efficiency of the model.

Keywords


recommender systems; mathematical modeling; doubly stochastic model; logistic regression, machine learning

References


Будущее искусственного интеллекта в России: как технологии превратятся в решения. URL: cnews.ru/articles/2019-10-02_budushchee_iskusstvennogo_intellekta (дата обращения: 02.03.2022).

Сбербанк заработает на искусственном интеллекте 450 миллиардов рублей. URL: www.vedomosti.ru/technology/articles/2020/02/19/823464-sberbank-zarabotaet (дата обращения: 06.03.2022).

Авхадеев Б.Р., Воронова Л.И., Охапкина Е.П. Разработка рекомендательной системы на основе данных из профиля социальной сети «ВКонтакте» // Вестник НВГУ. 2014. № 3. URL: https://cyberleninka.ru/article/n/razrabotka-rekomendatelnoy-sistemy-na-osnove-dannyh-iz-profilya-sotsialnoy-seti-vkontakte (дата обращения: 08.03.2022).

Кластеризация профилей пользователей в рекомендательных системах поддержки жизнеобеспечения на основе реальных неявных данных / С.А. Филиппов, В.Н. Захаров, С.А. Ступников, Д.Ю. Ковалев // Труды XVIII Международной конференции DAMDID/RCDL’2016 «Аналитика и управление данными в областях с интенсивным использованием данных». 2016. С. 98–103.

Isinkaye F.O., Folajimi Y.O., Ojokoh B.A. Recommendation systems: Principles, methods and evaluation // Egyptian Informatics Journal. 2015. Vol. 16 (3). P. 261–273.

Нефедова Ю.С. Архитектура гибридной рекомендательной системы GEFEST (Generation–Expansion–Filtering–Sorting–Truncation) // Системы и средства информатики. 2012. Т. 22 (2).

С. 176–196.

Ullrich T. On the Autoregressive Time Series Model Using Real and Complex Analysis // Forecasting. 2021. Vol. 3. P. 716–728. DOI: 10.3390/forecast3040044

Neural autoregressive distribution estimation / B. Uria, M.-A. Côté, K. Gregor et al. // JMLR. 2016. Vol. 17 (1). P. 7184–7220.

Модели систем квазипериодических процессов на основе цилиндрических и круговых изображений / В.Р. Крашенинников, Ю.Е. Кувайскова, О.Е. Маленова, А.Ю. Субботин // Известия Самарского научного центра Российской академии наук. 2021. Т. 23, № 1. C. 103–110. DOI: 10.37313/1990-5378-2021-23-1-103-110

Андриянов Н.А., Васильев К.К. Свойства авторегрессий с кратными корнями характеристических уравнений // Вестник УлГТУ. 2019. № 1 (85). URL: https://cyberleninka.ru/article/n/ svoystva-avtoregressiy-s-kratnymi-kornyami-harakteristicheskih-uravneniy (дата обращения: 08.03.2022).

Васильев К.К., Попов О.В. Авторегрессионные модели случайных полей с кратными корнями // Труды 4-й конференции «РОАИ: новые информационные технологии». 1998. Т. 4 (1).

С. 258–260.

Krasheninnikov V.R., Subbotin A.Yu. Doubly stochastic model of a quasi-periodic process as an image on a cylinder // Proceedings of the International Scientific and Technical Conference “Advanced Information Technologies”. 2018. P. 1017–1021.

Васильев К.К., Дементьев В.Е., Андриянов Н.А. Оценивание параметров дважды стохастических случайных полей // Радиотехника. 2014. № 7. С. 103–106.

Vasil'ev K.K., Dement'ev V.E., Andriyanov N.A. Doubly stochastic models of images // Pattern Recognition and Image Analysis. 2015. Vol. 25 (1). P. 105–110. DOI: 10.1134/S1054661815010204

Dementyiev V.E., Andriyanov N.A., Vasilyiev K.K. Use of Images Augmentation and Implementation of Doubly Stochastic Models for Improving Accuracy of Recognition Algorithms Based on Convolutional Neural Networks // 2020 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO). 2020. P. 1–4. DOI: 10.1109/SYNCHROINFO49631.2020.9166000

Alzen J.L., Langdon L.S., Otero V.K. A logistic regression investigation of the relationship between the Learning Assistant model and failure rates in introductory STEM courses // IJ STEM. 2018. Vol. 5. P. 56–63. DOI: 10.1186/s40594-018-0152-1

Coates A., Ng A.Y. Learning Feature Representations with K-means. Stanford University Press. 2012. 318 p.

Севастьянова М.Д., Желябин Д.В., Андриянов Н.А. Применение прикладных методов обработки данных в задаче кластеризации многомерных данных в сфере образования // Современные проблемы проектирования, производства и эксплуатации радиотехнических систем: сб. науч. тр. 2021. С. 172–177.




DOI: http://dx.doi.org/10.14529/ctcr220202

Refbacks

  • There are currently no refbacks.