METHOD OF SEMANTIC PREFILTERING OF CAUSAL RELATIONSHIPS IN HIGH-DIMENSIONAL NETWORKS
DOI:
https://doi.org/10.32689/maup.it.2026.1.5Keywords:
causal graphs, semantic pre-filtering, language models, vector representations, adaptive pruning, structural learningAbstract
The object of research is the process of discovering causal relationships in high-dimensional networks. The addressed problem is the exponential computational complexity of classical structural learning algorithms and their inability to operate effectively without historical statistical observations. The obtained results involve the development and empirical validation of a novel semantic pre-filtering method for causal graphs. Relying on the formulated semantic sparsity hypothesis, the method narrows the search space based solely on node metadata. The algorithm comprises four stages: generating textual interpretations for each node using language models, transforming these descriptions into dense numerical vectors, calculating a cosine similarity matrix, and applying an adaptive pruning strategy. These results solved the problem by rejecting up to 88.3 percent of irrelevant node pairs for massive graphs while preserving over 90 percent of true causal edges. This efficiency is explained by the fact that in large systems, causal links occur predominantly between semantically related entities. Vectorization models quantify this semantic proximity and scale naturally with network size. The approach can be practically applied in high-tech domains as a pre-filtering tool before executing traditional causal discovery algorithms. Conditions for its effective use include high-density networks and scenarios where collecting massive historical datasets is technically impossible, provided qualitative metadata is available.
References
A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs / T. D. Le et al. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2019. Vol. 16, no. 5. P. 1483–1495. DOI: 10.1109/TCBB.2016.2591526
A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images / J. Ramsey et al. International Journal of Data Science and Analytics. 2017. Vol. 3, no. 2. P. 121–129. DOI: 10.1007/s41060-016-0032-z
Ancestral causal learning in high dimensions with a human genome-wide application / U. Noè et al. arXiv preprint arXiv:1910.05166. 2019. DOI: 10.48550/arXiv.1910.05166
Balashankar A., Subramanian L. Learning Faithful Representations of Causal Graphs. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL). 2021. P. 1002–1011. DOI: 10.18653/v1/2021.acl-long.81
Causal Parrots: Large Language Models May Talk Causality But Are Not Causal / M. Zečević et al. Transactions on Machine Learning Research. 2023. DOI: 10.48550/arXiv.2308.13067
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality / E. Kıcıman et al. Transactions on Machine Learning Research. 2024. DOI: 10.48550/arXiv.2305.00050
Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research. Vol. 3. P. 507–554. DOI: 10.1162/153244303321897717
Detecting and quantifying causal associations in large nonlinear time series datasets / J. Runge et al. Science Advances. 2019. Vol. 5, no. 11. DOI: 10.1126/sciadv.aau4996
Efficient Causal Graph Discovery Using Large Language Models / T. Jiralerspong et al. arXiv preprint arXiv:2402.01207.2024. DOI: 10.48550/arXiv.2402.01207
Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology). Vol. 70, no. 5. P. 849–911. DOI: 10.1111/j.1467-9868.2008.00674.x
Feigenbaum, I., Khanna, S., Vempala, S. S. (2024). On the Unlikelihood of D-Separation. Proceedings of Machine Learning Research. Vol. 246. P. 1–17. DOI: 10.48550/arXiv.2303.05628
Guo, C., Luk, W. (2022). Accelerating Constraint-Based Causal Discovery by Shifting Speed Bottleneck. Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ‘22). P. 123–134. DOI: 10.1145/3490422.3502363
Hagedorn, C., Huegle, J. (2021). GPU-Accelerated Constraint-Based Causal Structure Learning for Discrete Data. SIAM International Conference on Data Mining (SDM). P. 37–45. DOI: 10.1137/1.9781611976700.5
IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data / T. Feng et al. arXiv preprint arXiv:2406.10526. 2024. DOI: 10.48550/arXiv.2406.10526
Darvariu, V.-A., Hailes, S., Musolesi, M. (2024). Large Language Models are Effective Priors for Causal Graph Discovery. arXiv preprint arXiv:2401.12838. DOI: 10.48550/arXiv.2401.12838
Large Language Models for Causal Discovery: Current Landscape and Future Directions / G. Wan et al. International Joint Conference on Artificial Intelligence (IJCAI). 2024. DOI: 10.24963/ijcai.2024/889
LLM-Driven Causal Discovery via Harmonized Prior / T. Ban et al. IEEE Transactions on Knowledge and Data Engineering. 2024. DOI: 10.1109/TKDE.2025.3353067
Magliacane, S., Claassen, T., Mooij, J. M. (2016). Ancestral Causal Inference. Advances in Neural Information Processing Systems. Vol. 29. P. 4473–4481. URL: https://papers.nips.cc/paper/6266-ancestral-causal-inference
Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction, and Search. 2nd ed. Cambridge : MIT Press, 543 p. DOI: 10.7551/mitpress/1754.001.0001
Ultra-Scalable and Efficient Methods for Hybrid Observational and Experimental Local Causal Pathway Discovery / A. Statnikov et al. Journal of Machine Learning Research. 2015. Vol. 16. P. 3219–3267. URL: https://jmlr.org/papers/v16/statnikov15a.html







