AUTOMATED DATA MIGRATION METHOD ACROSS STORAGE VARIANTS IN SYSTEMS WITH MULTIVARIANT PERSISTENCE

Authors

DOI:

https://doi.org/10.32689/maup.it.2025.4.21

Keywords:

invariants, deserialization, transformation, mapping, fact

Abstract

The purpose of the research is to develop a concept and architecture of an automated data migration system capable of ensuring consistency, integrity and equivalence of information during the transition between heterogeneous storage models in conditions of multivariate persistence. The work is devoted to the creation of a method of automated data migration in environments where the same unit of information can exist in several equivalent forms. Methodology. The method used is based on a formal model of a fact that preserves invariants of structure, types and semantics during the transition between different storage models. The principles of correct mappings between repositories and the conditions under which data transformations remain lossless and reproducible are determined. A system of invariants is proposed that fixes the boundaries of permissible transformations and ensures the logical integrity of facts regardless of the specific storage technology. The key element of the method is a subject-oriented language for describing transformations, which sets the rules for the transition between data models in a formalized form. Its constructions allow performing operations of projection, nesting, expansion, connection and induction of relations between structures; the language has a clear typing and provides correctness checking at the compilation and execution stages. Based on the developed language, a prototype of the system was created, which performs automated data transformation between relational, document, graph and keyvalue repositories without user intervention. The solution architecture includes modules for checking invariants, compiling transformations and performing migrations with transactional control, rollback and transaction log, as well as cost estimation and correctness checking services. Scientific novelty. For the first time, a holistic formal method for automated data migration between types of repositories in systems with multivariate persistence has been formed. A mathematical model of the fact, invariants and mappings has been constructed, which determine the boundaries of permissible transformations and guarantee semantic equivalence between different data materializations. A subject-oriented transformation language with formal verification of totality, typing, and preservation of invariants is proposed; the architecture of an automated system with unified adapters to heterogeneous repositories is implemented. Within the framework of the experiments, a comparison was made with manual transformations and the classical ETL approach, which allowed us to establish a zero number of invariant violations with only a slight increase in migration time. Conclusions. The proposed method is characterized by the ability to provide correct, fault-tolerant, and formally verified data migration without loss of integrity. The prototype confirmed the zero level of invariant violations and the stability of results in various scenarios, as well as the method’s suitability for scaling. The results obtained prove the effectiveness of the proposed approach and open up prospects for further adaptation of the system through the use of training agents and dynamic migration management policies.

References

Fernández Candel C. J., Sevilla Ruiz D., García-Molina J. J. A unified metamodel for NoSQL and relational databases. Information Systems. 2022. Vol. 104. 101898. DOI: https://doi.org/10.1016/j.is.2021.101898

Glake D., Kiehn F., Schmidt M., Panse F., Ritter N. Towards polyglot data stores – overview and open research questions. arXiv preprint: website. 2022. DOI: https://doi.org/10.48550/arXiv.2204.05779

Goch L., Chen S. Data migration in large scale storage systems with varying file sizes. Proceedings of the 2024 8th International Conference on Algorithms, Computing and Systems (ICACS ‘24). 2025. P. 77–82. DOI: https://doi.org/10.1145/3708597.3708609

Hussein A. A. Data migration need, strategy, challenges, methodology, categories, risks, uses with cloud computing, and improvements in its using with cloud using suggested proposed model (DMig 1). Journal of Information Security. 2021. Vol. 12. No. 1. P. 49–64. DOI: https://doi.org/10.4236/jis.2021.121004

Kazanavičius J., Mažeika D., Kalibatienė D. An Approach to Migrate a Monolith Database into Multi-Model Polyglot Persistence Based on Microservice Architecture: A Case Study for Mainframe Database. Applied Sciences. 2022. Vol. 12. No. 12. 6189. DOI: https://doi.org/10.3390/app12126189

Kiehn F., Schmidt M., Glake D., Panse F., Wingerath W., Wollmer B., Poppinga M., Ritter N. Polyglot data management: state of the art & open challenges. Proceedings of the VLDB Endowment. 2022. Vol. 15. No. 12. P. 3750–3753. DOI: https://doi.org/10.14778/3554821.3554891

Koukaras P. Data integration and storage strategies in heterogeneous analytical systems: architectures, methods, and interoperability challenges. Information. 2025. Vol. 16. No. 11. 932. DOI: https://doi.org/10.3390/info16110932

Koupil P., Holubová I. A unified representation and transformation of multi-model data using category theory. Journal of Big Data. 2022. Vol. 9. Article No. 61. DOI: https://doi.org/10.1186/s40537-022-00613-3

Lu J., Liu Z. H., Xu P., Zhang C. UDBMS: road to unification for multi-model data management. arXiv preprint: website. 2016. DOI: https://doi.org/10.48550/arXiv.1612.08050.

Nadal S., Romero O., Abelló A., Vassiliadis P., Vansummeren S. An integration-oriented ontology to govern evolution in big data ecosystems. Information Systems. 2018. Vol. 76. P. 68–88. DOI: https://doi.org/10.1016/j.is.2018.01.006

Nadig R., Arulchelvan V., Bera R., Shahroodi T., Singh G., Kakolyris A., Sadrosadati M., Park J., Mutlu O. Harmonia: a multi-agent reinforcement learning approach to data placement and migration in hybrid storage systems. arXiv preprint: website. 2025. DOI: https://doi.org/10.48550/arXiv.2503.20507

Peretiatko M., Shirokopetleva M., Lesna N. Research of methods to support data migration between relational and document data storage models. Innovative Technologies and Scientific Solutions for Industries. 2022. No. 2 (20). P. 64–74. DOI: https://doi.org/10.30837/ITSSI.2022.20.064

Ramos-Vidal D., Cortiñas A., Luaces M. R., Pedreira O., Saavedra Places Á., Assunção W. K. G. Seamless data migration between database schemas with DAMI-framework: an empirical study on developer experience. arXiv preprint: website. 2025. DOI: https://doi.org/10.48550/arXiv.2504.17662

Uotila V., Lu J. A formal category theoretical framework for multi-model data transformations. Lecture Notes in Computer Science. 2021. Vol. 12921. P. 14–28. DOI: https://doi.org/10.1007/978-3-030-93663-1_2

Ye F., Sheng X., Nedjah N., Sun J., Zhang P. A benchmark for performance evaluation of a multi-model database vs. polyglot persistence. Journal of Database Management. 2023. Vol. 34. No. 3. P. 20. DOI: https://doi.org/10.4018/JDM.321756

Published

2025-12-30

How to Cite

ПАТЛАНЬ, Є., & БІЛОУС, І. (2025). AUTOMATED DATA MIGRATION METHOD ACROSS STORAGE VARIANTS IN SYSTEMS WITH MULTIVARIANT PERSISTENCE. Information Technology and Society, (4 (19), 129-139. https://doi.org/10.32689/maup.it.2025.4.21