METHOD OF SEMANTIC PREFILTERING OF CAUSAL RELATIONSHIPS IN HIGH-DIMENSIONAL NETWORKS

Authors

DOI:

https://doi.org/10.32689/maup.it.2026.1.5

Keywords:

causal graphs, semantic pre-filtering, language models, vector representations, adaptive pruning, structural learning

Abstract

The object of research is the process of discovering causal relationships in high-dimensional networks. The addressed problem is the exponential computational complexity of classical structural learning algorithms and their inability to operate effectively without historical statistical observations. The obtained results involve the development and empirical validation of a novel semantic pre-filtering method for causal graphs. Relying on the formulated semantic sparsity hypothesis, the method narrows the search space based solely on node metadata. The algorithm comprises four stages: generating textual interpretations for each node using language models, transforming these descriptions into dense numerical vectors, calculating a cosine similarity matrix, and applying an adaptive pruning strategy. These results solved the problem by rejecting up to 88.3 percent of irrelevant node pairs for massive graphs while preserving over 90 percent of true causal edges. This efficiency is explained by the fact that in large systems, causal links occur predominantly between semantically related entities. Vectorization models quantify this semantic proximity and scale naturally with network size. The approach can be practically applied in high-tech domains as a pre-filtering tool before executing traditional causal discovery algorithms. Conditions for its effective use include high-density networks and scenarios where collecting massive historical datasets is technically impossible, provided qualitative metadata is available.

References

A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs / T. D. Le et al. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2019. Vol. 16, no. 5. P. 1483–1495. DOI: 10.1109/TCBB.2016.2591526

A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images / J. Ramsey et al. International Journal of Data Science and Analytics. 2017. Vol. 3, no. 2. P. 121–129. DOI: 10.1007/s41060-016-0032-z

Ancestral causal learning in high dimensions with a human genome-wide application / U. Noè et al. arXiv preprint arXiv:1910.05166. 2019. DOI: 10.48550/arXiv.1910.05166

Balashankar A., Subramanian L. Learning Faithful Representations of Causal Graphs. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL). 2021. P. 1002–1011. DOI: 10.18653/v1/2021.acl-long.81

Causal Parrots: Large Language Models May Talk Causality But Are Not Causal / M. Zečević et al. Transactions on Machine Learning Research. 2023. DOI: 10.48550/arXiv.2308.13067

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality / E. Kıcıman et al. Transactions on Machine Learning Research. 2024. DOI: 10.48550/arXiv.2305.00050

Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research. Vol. 3. P. 507–554. DOI: 10.1162/153244303321897717

Detecting and quantifying causal associations in large nonlinear time series datasets / J. Runge et al. Science Advances. 2019. Vol. 5, no. 11. DOI: 10.1126/sciadv.aau4996

Efficient Causal Graph Discovery Using Large Language Models / T. Jiralerspong et al. arXiv preprint arXiv:2402.01207.2024. DOI: 10.48550/arXiv.2402.01207

Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology). Vol. 70, no. 5. P. 849–911. DOI: 10.1111/j.1467-9868.2008.00674.x

Feigenbaum, I., Khanna, S., Vempala, S. S. (2024). On the Unlikelihood of D-Separation. Proceedings of Machine Learning Research. Vol. 246. P. 1–17. DOI: 10.48550/arXiv.2303.05628

Guo, C., Luk, W. (2022). Accelerating Constraint-Based Causal Discovery by Shifting Speed Bottleneck. Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ‘22). P. 123–134. DOI: 10.1145/3490422.3502363

Hagedorn, C., Huegle, J. (2021). GPU-Accelerated Constraint-Based Causal Structure Learning for Discrete Data. SIAM International Conference on Data Mining (SDM). P. 37–45. DOI: 10.1137/1.9781611976700.5

IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data / T. Feng et al. arXiv preprint arXiv:2406.10526. 2024. DOI: 10.48550/arXiv.2406.10526

Darvariu, V.-A., Hailes, S., Musolesi, M. (2024). Large Language Models are Effective Priors for Causal Graph Discovery. arXiv preprint arXiv:2401.12838. DOI: 10.48550/arXiv.2401.12838

Large Language Models for Causal Discovery: Current Landscape and Future Directions / G. Wan et al. International Joint Conference on Artificial Intelligence (IJCAI). 2024. DOI: 10.24963/ijcai.2024/889

LLM-Driven Causal Discovery via Harmonized Prior / T. Ban et al. IEEE Transactions on Knowledge and Data Engineering. 2024. DOI: 10.1109/TKDE.2025.3353067

Magliacane, S., Claassen, T., Mooij, J. M. (2016). Ancestral Causal Inference. Advances in Neural Information Processing Systems. Vol. 29. P. 4473–4481. URL: https://papers.nips.cc/paper/6266-ancestral-causal-inference

Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction, and Search. 2nd ed. Cambridge : MIT Press, 543 p. DOI: 10.7551/mitpress/1754.001.0001

Ultra-Scalable and Efficient Methods for Hybrid Observational and Experimental Local Causal Pathway Discovery / A. Statnikov et al. Journal of Machine Learning Research. 2015. Vol. 16. P. 3219–3267. URL: https://jmlr.org/papers/v16/statnikov15a.html

Published

2026-06-01

How to Cite

Ковенько, О. А., & Апенько, Н. В. (2026). METHOD OF SEMANTIC PREFILTERING OF CAUSAL RELATIONSHIPS IN HIGH-DIMENSIONAL NETWORKS. Information Technology and Society, (1 (20), 44-50. https://doi.org/10.32689/maup.it.2026.1.5

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.