TECHNOLOGY FOR IMPROVING STORAGE EFFICIENCY IN NO-SQL DATABASES

Authors

DOI:

https://doi.org/10.32689/maup.it.2024.3.2

Keywords:

No-SQL database, data deduplication, data compression, cloud storage optimization, storage efficiency

Abstract

The article presents the results of using deduplication and compression methods to optimize data storage in cloud No-SQL databases. The purpose of the article is to reduce the volume of stored data by using Hadoop MapReduce technology for information processing and MongoDB for storing aggregated key-value pairs. Methodology. The study is based on a combination of data deduplication and compression performed using Hadoop MapReduce. This approach allows you to process large amounts of information by optimizing storage processes in MongoDB. Results. A series of experiments was conducted to evaluate the reduction of data volumes and check the speed of processing requests. The proposed system architecture demonstrates ease of integration with existing backup tools, making this technology practical for implementation in real-world environments. The results of the experiments indicate the high efficiency of the application of these technologies for large files, which allows to reduce storage requirements by more than 90%. Scientific novelty. The proposed solution introduces an innovative approach to data processing and storage in cloud environments. For the first time in the context of No-SQL databases, deduplication and compression methods are combined, which creates new opportunities for saving space and increasing system performance. The research extends the applications of these techniques to include the potential for multimedia files and real-time streaming data. Conclusions. The obtained results testify to the high efficiency of using deduplication and compression technology to reduce data volumes in cloud-based No-SQL databases. The implementation of these methods allows you to significantly reduce storage costs, increase the speed of data processing and adapt to the growing needs of modern industries. The next stages of research will include the development of predictive models to optimize the application of technologies in real time, which opens new horizons in data management.

References

Roy-Hubara N., Sturm A. Design methods for the new database era: A systematic literature review. Software and Systems Modeling, 2019. № 19, pp. 297–312. doi:10.1007/s10270-019-00739-8.

Ramzan S., Bajwa I. S., Kazmi R., Amna. Challenges in NoSQL-based distributed data storage: A systematic literature review. Electronics, 2019. № 8, pp. 1–29. doi:10.3390/electronics8050488.

Kim W., Lee I. Survey on data deduplication in cloud storage environments. Journal of Information Processing Systems, 2021. № 17(3), pp. 658–673. doi:10.3745/JIPS.03.0160.

Kumar N., Shobha, Jain S. C. efficient data deduplication for big data storage systems. In Progress in Advanced Computing and Intelligent Engineering. 2019. № 714, pp. 351–371. 10.1007/978-981-13-0224-4_32

Wang C., Fu Y., Yan J., Wu X., Zhang Y., Xia H., Yuan Y. A cost‐efficient resemblance detection scheme for postdeduplication delta compression in backup systems. Concurrency and Computation: Practice and Experience. 2022. № 34(3), pp. e6558. doi:10.1002/cpe.6558.

Zhang D., Le J., Mu N., Wu J., Liao X. Secure and Efficient data deduplication in JointCloud storage. IEEE Transactions on Cloud Computing. 2023. № 11(1), pp. 156–167. doi: 10.1109/TCC.2021.3081702.

Tan H., Zou X., Wan B., Gu Z., Xia W. SuperDelta: Multiple referenced base chunks scheme for fine-grained deduplication backup storage system. Data Compression Conference Proceedings. 2024. pp. 362–371. doi:10.1109/DCC58796.2024.00044.

Ge X., Zhou C. A data allocation strategy for deduplication backup systems in disk arrays. Proceedings of SPIE – The International Society for Optical Engineering. 2024. pp. 1325004. doi:10.1117/12.3038451

Zhang D., Deng Y., Zhou Y., Li J., Zhu W., Min G. MGRM: A multi-segment greedy rewriting method to alleviate data fragmentation in deduplication-based cloud backup systems. IEEE Transactions on Cloud Computing. 2023. № 11(3), pp. 2503–2516. doi:10.1109/TCC.2022.3214816

Koushik C. S. N., Choubey S. B., Choubey A., Sinha G. R. Data deduplication for cloud storage. In Data Deduplication Approaches. 2021. pp. 307–317. doi:10.1016/b978-0-12-823395-5.00010-0.

Published

2024-12-24

How to Cite

КОЗУБ, В. (2024). TECHNOLOGY FOR IMPROVING STORAGE EFFICIENCY IN NO-SQL DATABASES. Information Technology and Society, (3 (14), 14-22. https://doi.org/10.32689/maup.it.2024.3.2