Обзор современных систем обработки временных рядов

Елена Владимировна Иванова, Михаил Леонидович Цымблер

Аннотация


Временной ряд представляет собой последовательность хронологически упорядоченных числовых значений, отражающих течение некоторого процесса или явления. В настоящее время одним из наиболее актуальных классов задач обработки временных рядов являются приложения Индустрии 4.0 и Интернета вещей. В данных приложениях типичной является задача обеспечения умного управления и предиктивного технического обслуживания сложных машин и механизмов, которые оснащаются различными сенсорами. Такие сенсоры имеют высокую дискретность снятия показаний и за сравнительно короткое время продуцируют временные ряды длиной от десятков миллионов до миллиардов элементов. Получаемые с сенсоров данные накапливаются и подвергаются интеллектуальному анализу для принятия стратегически важных решений. Обработка временных рядов требует специфического системного программного обеспечения, отличного от имеющихся реляционных СУБД и NoSQL-систем. Системы обработки временных рядов должны обеспечивать, с одной стороны, эффективные операции добавления новых атомарных значений, поступающих в потоковом режиме, а с другой стороны, эффективные операции интеллектуального анализа, в рамках которых временной ряд рассматривается как единое целое. В статье рассмотрены особенности обработки временных рядов в сравнении с данными реляционной и нереляционной природы, и даны формальные определения основных задач интеллектуального анализа временных рядов. Представлен обзор основных возможностей трех наиболее популярных современных систем обработки временных рядов: InfluxDB, OpenTSDB, TimescaleDB.


Ключевые слова


обработка и анализ временных рядов; NoSQL, реляционная СУБД; InfluxDB; OpenTSDB; TimescaleDB

Полный текст:

PDF

Литература


Agrawal B., Chakravorty A., Rong C., Wlodarczyk T.W. R2Time: A framework to analyse Open TSDB time-series data in HBase. Proceedings of the 6th International Conference on Cloud Computing Technology and Science, CloudCom 2014 (Singapore, December, 15–18, 2014). IEEE, 2014. P. 970–975. DOI: 10.1109/CloudCom.2014.84.

Andersen M.P., Culler D.E. BTrDB: Optimizing storage system design for timeseries processing. Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016 (Santa Clara, United States, February, 22–25, 2016). P. 39–52. URL: https://www.usenix.org/system/files/conference/fast16/fast16-papers-andersen.pdf (accessed: 30.07.2020).

Andiojaya A., Demirhan H. A bagging algorithm for the imputation of missing values in time series. Expert Syst. Appl. 2019. Vol. 129. P. 10–26. DOI: 10.1016/j.eswa.2019.03.044.

Arous I., Khayati M., Cudre-Mauroux P., et al. RecovDB: Accurate and efficient missing blocks recovery for large time series. Proceedings of the 35th International Conference on Data Engineering, ICDE 2019 (Macao, Macao, April, 8–11, 2019). IEEE Computer Society, 2019. P. 1976–1979. DOI: 10.1109/ICDE.2019.00218.

Bader A., Kopp O., Falkenthal M. Survey and comparison of open source time series databases. Proceedings of the Workshop on Business, Technologies and Web, BTW 2017 (Stuttgart, Germany, March, 6–7, 2017). Gesellschaft fur Informatik, 2017. P. 249–268. URL: https://dl.gi.de/bitstream/handle/20.500.12116/922/paper31.pdf (accessed: 16.07.2020).

Berndt D.J., Clifford J. Using Dynamic Time Warping to find patterns in time series. Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop (Seattle, Washington, USA, July 1994). 1994. P. 359–370.

Cao K., Liu Y., Meng G., Sun O. An overview on Edge Computing research. IEEE Access. 2020. Vol. 8. P. 85714–85728. DOI: 10.1109/ACCESS.2020.2991734.

Chandola V., Banerjee A., Kumar V. Anomaly detection for discrete sequences: A survey. IEEE Trans. Knowl. Data Eng. 2012. Vol. 24, no. 5. P. 823–839. DOI: 10.1109/TKDE.2010.235.

Cook A.A., Misirli G., Fan Z. Anomaly detection for IoT time-Series data: A Survey. IEEE Internet Things Journal. 2020. Vol. 7, no. 7. P. 6481–6494. DOI: 10.1109/JIOT.2019.2958185.

Da X.L., Duan L. Big data for cyber physical systems in Industry 4.0: a survey. Enterp. Inf. Syst. 2019. Vol. 13, no. 2. P. 148–169. DOI: 10.1080/17517575.2018.1442934.

Davoudian A., Chen L., Liu M. A survey on NoSQL stores. ACM Comput. Surv. 2018. Vol. 51, no. 2. P. 40:1–40:43. DOI: 10.1145/3158661.

DB-Engines Ranking of Time Series DBMS. URL: https://dbengines.com/en/ranking/time+series+dbms (accessed: 16.07.2020).

Deri L., Mainardi S., Fusco F. tsdb: A compressed database for time series. Proceedings of the 4th International Workshop on Traffic Monitoring and Analysis, TMA 2012 (Vienna, Austria, March, 12, 2012). P. 143–156. DOI: 10.1007/978-3-642-28534-9_16.

Donovan A.A.A., Kernighan B.W. The Go programming language. Addison-Wesley, 2015. 380 p. ISBN: 978-0134190440

Hellerstein J.M., Re C., Schoppmann F., et al. The MADlib analytics library or MAD skills, the SQL. PVLDB. 2012. Vol. 5, no. 12. P. 1700–1711. DOI: 10.14778/2367502.2367510.

Esling P., Agon C. Time-series data mining. ACM Comput. Surv. 2012. Vol. 45, no. 1. P. 12:1–12:34. DOI: 10.1145/2379776.2379788.

Fu T.C. A review on time series data mining. Eng. Appl. of AI. 2011. Vol. 24, no. 1. P. 164–181. DOI: 10.1016/j.engappai.2010.09.007.

Garcia-Molina H., Ullman J.D., Widom J. Database systems – the complete book. Pearson, 2009. 1203 p.

Grzesik P., Mrozek D. Comparative analysis of time series databases in the context of Edge сomputing for low power sensor networks. Proceedings of the 20th International Conference on Computational Science, ICCS 2020 (Amsterdam, The Netherlands, June, 3–5, 2020). Part V. 2020. P. 371–383. DOI: 10.1007/978-3-030-50426-7_28.

Guo Z., Wan Y., Ye H. A data imputation method for multivariate time series based on generative adversarial network. Neurocomputing. 2019. Vol. 360. P. 185–197. DOI: 10.1016/j.neucom.2019.06.007.

Hamdi S., Chaabane N., Bedoui M.H. Intra and Inter Relationships between Biomedical Signals: A VAR Model Analysis. Proceedings of the International Conference on Information Visualisation, IV 2019 (Paris, France, July, 2–5, 2019). P. 411–416. DOI: 10.1109/IV.2019.00076.

Hanif M. Relationship between oil and stock markets: Evidence from Pakistan stock exchange. International Journal of Energy Economics and Policy. 2020. Vol. 10, no. 5. P. 150–157. DOI: 10.32479/ijeep.9653.

Harizopoulos S., Abadi D.J., Madden S., Stonebraker M. OLTP through the looking glass, and what we found there. Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker / Ed. by Brodie M.L. ACM / Morgan & Claypool, 2019. P. 409–439. DOI: 10.1145/3226595.3226635.

Holt C.E. Forecasting seasonals and trends by exponentially weighted averages. International Journal of Forecasting. 2004. Vol. 20, no. 1. P. 5–10. DOI: 10.1016/j.ijforecast.2003.09.015.

Hyndman R.J., Koehler A.B. Another look at measures of forecast accuracy. International Journal of Forecasting. 2006. Vol. 22, no. 4. P. 679–688. DOI: 10.1016/j.ijforecast.2006.03.001.

Idreos S., Groffen F., Nes N., et al. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Engineering Bulletin. 2012. Vol. 35, no. 1. P. 40–45.

InfluxDB 1.8 Documentation. URL: https://docs.influxdata.com/influxdb/v1.8/ (accessed: 27.09.2020).

KairosDB documentation. URL: https://kairosdb.github.io/docs/build/html/ (accessed: 27.09.2020).

Kdb+ and q documentation. URL: https://code.kx.com/ (accessed: 27.09.2020).

Keogh E., Lin J., Fu A. HOT SAX: efficiently finding the most unusual time series subsequence. Proceedings of the 5th IEEE International Conference on Data Mining, ICDM’05 (Houston, Texas, November, 27–30, 2005). 2005. P. 8. DOI: 10.1109/ICDM.2005.79.

Khayati M., Cudré-Mauroux P., Bohlen M.H. Scalable recovery of missing blocks in time series with high and low cross-correlations. Knowl. Inf. Syst. 2020. Vol. 62, no. 6. P. 2257–2280. DOI: 10.1007/s10115-019-01421-7.

Kumar S., Tiwari P., Zymbler M. Internet of Things is a revolutionary approach for future technology enhancement: a review. Journal of Big Data. 2019. Vol. 6. Article 111. DOI: 10.1186/s40537-019-0268-2.

Lan L., Shi R., Wang B., et al. A lightweight time series main-memory database for IoT real-time services. Proceedings of the 6th International Conference on Internet of Vehicles, Technologies and Services Toward Smart Cities, IOV 2019 (Kaohsiung, Taiwan, November, 18–21, 2019). P. 220–236. DOI: 10.1007/978-3-030-38651-1_19.

Li C., Li B., Bhuiyan M.Z.A., et al. FluteDB: An efficient and scalable in-memory time series database for sensor-cloud. J. Parallel Distributed Comput. 2018. Vol. 122. P. 95–108. DOI: 10.1016/j.jpdc.2018.07.021.

Lin T., Kaminski N., Bar-Joseph Z. Alignment and classification of time series gene expression in clinical studies. Bioinf. 2008. Vol. 24, no. 13. P. 147–155. DOI: 10.1093/bioinformatics/btn152.

Liu X.-Y., Ren C.-L. Fast subsequence matching under time warping in time-series databases. Proceedings of the International Conference on Machine Learning and Cybernetics, ICMLC 2013 (Tianjin, China, July, 14–17, 2013). P. 1584–1590. DOI: 10.1109/ICMLC.2013.6890855.

MacDonald A. PhilDB: the time series database with built-in change logging. PeerJ Comput. Sci. 2016. Vol. 2. P. e52. DOI: 10.7717/peerj-cs.52.

Matallah H., Belalem G., Bouamrane K. Evaluation of NoSQL databases: MongoDB, Cassandra, HBase, Redis, Couchbase, OrientDB. Int. J. Softw. Sci. Comput. Intell. 2020. Vol. 12, no. 4. P. 71–91. DOI: 10.4018/IJSSCI.2020100105.

Meng J., Yuan J., Hans M., Wu Y. Mining motifs from human motion. Proceedings of the Eurographics 2008 – Short Papers (Crete, Greece, April, 14–18, 2008). Eurographics Association, 2008. P. 71–74. DOI: 10.2312/egs.20081024.

Mueen A., Keogh E.J., Zhu Q., Cash S., Westover M.B. Exact Discovery of Time

Series Motifs. Proceedings of the SIAM International Conference on Data Mining, SDM 2009 (Sparks, Nevada, USA, April, 30 – May, 2, 2009). SIAM, 2009. P. 473–484. DOI: 10.1137/1.9781611972795.41.

Namiot D. Time series databases. Selected Papers of the XVII International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2015 (Obninsk, Russia, October, 13–16, 2015). P. 132–137. URL: http://ceur-ws.org/Vol-1536/paper20.pdf (accessed: 16.07.2020).

O’Neil P., Cheng E., Gawlick D., O’Neil E. The log-structured merge-tree (LSM-tree). Acta Informatica. 1996. Vol. 33. P. 351–385.

OpenTSDB 3.0 Documentation. URL: http://opentsdb.net/docs/3x/build/html/ (accessed: 27.09.2020).

Pelkonen T., Franklin S., Cavallaro P., et al. Gorilla: A fast, scalable, in-memory time series database. Proc. VLDB Endow. 2015. Vol. 8, no. 12. P. 1816–1827. DOI: 10.14778/2824032.2824078.

Petersen D., Middleton D. Linear interpolation, extrapolation, and prediction of random space-time fields with a limited domain of measurement. IEEE Transactions on Information Theory. 1965. Vol. 11, no. 1. P. 18–30. DOI: 10.1109/TIT.1965.1053734.

Petre I., Boncea R., Radulescu C.Z., et al. A time-series database analysis based on a multiattribute maturity model. Studies in Informatics and Control. 2019. Vol. 2, no. 2. P. 177–188. DOI: 10.24846/v28i2y201906.

Prometheus Documentation. URL: https://prometheus.io/docs/ (accessed: 27.09.2020).

Queiroz-Sousa P.O., Salgado A.C. A review on OLAP technologies applied to information networks. ACM Trans. Knowl. Discov. Data. 2020. Vol. 14, no. 1. P. 8:1–8:25. DOI: 10.1145/3370912.

Rakthanmanon T., Campana B.J.L., Mueen A., et al. Searching and mining trillions of time series subsequences under Dynamic Time Warping. The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’12 (Beijing, China, August, 12–16, 2012). 2012. P. 262–270. DOI: 10.1145/2339530.2339576.

Ratanamahatana C.A., Keogh E.J. Three myths about Dynamic Time Warping data mining. Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005 (Newport Beach, CA, USA, April, 21–23, 2005). 2005. P. 506–510. DOI: 10.1137/1.9781611972757.50.

Rhea S., Wang E., Wong E., et al. LittleTable: A time-series database and its uses. Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017 (Chicago, IL, USA, May, 14–19, 2017). P. 125–138. DOI: 10.1145/3035918.3056102.

Riak KV Documentation. URL: https://docs.riak.com/riak/kv/ (accessed: 27.09.2020).

Riak TS Documentation. URL: https://docs.riak.com/riak/ts/ (accessed: 27.09.2020).

Seltzer M.I. Berkeley DB: A retrospective. IEEE Data Eng. Bull. 2007. Vol. 30, no. 3. P. 21–28. URL: http://sites.computer.org/debull/A07Sept/seltzer.pdf (accessed: 30.07.2020).

Sim H., Khan A., Vazhkudai S.S., Lim S.-H., Butt A.R., Kim Y. An Integrated Indexing and Search Service for Distributed File Systems. IEEE Transactions on Parallel and Distributed Systems. 2020. Vol. 31, no. 10. P. 2375–2391. DOI: 10.1109/TPDS.2020.2990656.

Sivasubramanian S. Amazon dynamoDB: a seamlessly scalable non-relational database service. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale Arizona, USA, May, 2012). P. 729–730. DOI: 10.1145/2213836.2213945.

Shen Z., Zhang Y., Lu J., et al. A novel time series forecasting model with deep learning. Neurocomputing. 2020. Vol. 396. P. 302–313. DOI: 10.1016/j.neucom.2018.12.084.

Shieh J., Keogh E.J. iSAX: Indexing and mining terabyte sized time series. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, Nevada, USA, August, 24–27, 2008). ACM, 2008. P. 623–631. DOI: 10.1145/1401890.1401966.

Shvachko K., Kuang H., Radia S., Chansler R. The Hadoop Distributed File System. Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 10 (May, 2010). P. 1–10. DOI: 10.1109/MSST.2010.5496972.

Song I.-Y. Data Warehouse. Encyclopedia of Database Systems (2nd ed.). Ed. Liu L., Özsu M.T. Springer, 2018. DOI: 10.1007/978-1-4614-8265-9_882.

TimescaleDB Documentation. URL: https://docs.timescale.com/ (accessed: 27.09.2020).

Torkamani S., Lohweg V. Survey on time series motif discovery. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2017. Vol. 7, no. 2. DOI: 10.1002/widm.1199.

Truong C.D., Anh D.T. A survey on time series motif discovery. Int. J. Bus. Intell. Data Min. 2019. Vol. 15, no. 2. P. 204–227. DOI: 10.1504/IJBIDM.2019.101266.

Tsubouchi Y., Wakisaka A., Hamada K., et al. HeteroTSDB: An extensible time series database for automatically tiering on heterogeneous key-value stores. Proceedings of the 43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019 (Milwaukee, WI, USA, July, 15–19, 2019). Vol. 1. P. 264–269. DOI: 10.1109/COMPSAC.2019.00046.

Vibhute A., Halder S., Singh P., et al. Decadal variability of tropical Indian Ocean sea surface temperature and its impact on the Indian summer monsoon. Theoretical and Applied Climatology. 2020. Vol. 141, no. 1-2. P. 551–566. DOI: 10.1007/s00704-020-03216-1.

Winters P.R. Forecasting sales by exponentially weighted moving averages. Management Science. 1960. Vol. 6. P. 324–342. DOI: 10.1287/mnsc.6.3.324.

Wu J., Wang P., Pan N., et al. KV-Match: A subsequence matching approach supporting normalization and time warping. Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE 2019 (Macao, China, April, 8–11, 2019). P. 866–877. DOI: 10.1109/ICDE.2019.00082.

Yang F., Tschetter E., Léauté X., et al. Druid: a real-time analytical data store.

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14 (New York, NY, US, June, 2014). P. 157–168. DOI: 10.1145/2588555.2595631.

Yang Y., Cao Q., Jiang H. EdgeDB: An efficient time-series database for Edge Computing. IEEE Access. 2019. Vol. 7. P. 142295–142307. DOI: 10.1109/ACCESS.2019.2943876.

Yankov D., Keogh E.J., Rebbapragada U. Disk aware discord discovery: Finding unusual time series in terabyte sized datasets. Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007 (Omaha, Nebraska, USA, October, 28–31, 2007). IEEE Computer Society, 2007. P. 381–390. DOI: 10.1109/ICDM.2007.61.

Yeh C.-C.M., Zhu Y., Ulanova L., et al. Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile. Data Min. Knowl. Discov. 2018. Vol. 32, no. 1. P. 83–123. DOI: 10.1007/s10618-017-0519-9.

Zhang Y.-F., Thorburn P.J., Xiang W., Fitch P. SSIM – A deep learning approach for recovering missing time series sensor data. IEEE Internet Things Journal. 2019. Vol. 6, no. 4. P. 6618–6628. DOI: 10.1109/JIOT.2019.2909038.




DOI: http://dx.doi.org/10.14529/cmse200406