MODIFICATION OF THE RECIPROCAL RANK FUSION METHOD TO IMPROVE HYBRID SEARCH RESULTS IN INFORMATION SYSTEMS WITH VECTOR DATABASES

Authors

DOI:

https://doi.org/10.32689/maup.it.2025.2.2

Keywords:

semantic search, lexical search, hybrid search, combined search, vector databases, information systems

Abstract

The aim of this study is to improve the Reciprocal Rank Fusion (RRF) method to enhance the accuracy of result merging in hybrid search systems that utilize vector databases. Hybrid search involves the integration of results obtained through various strategies, including lexical, semantic, and visual search. However, the traditional RRF formula does not take into account the degree of document relevance, which may lead to situations where low-relevance results from one retrieval strategy affect the high-relevance results from another. As a consequence, the desired ranking of documents in the final result list may be distorted.The methodology of this research is based on analyzing relevance scores obtained through lexical and semantic search.A relevance-based classification approach is proposed, where search results are grouped according to their level of relevance using a stepwise degradation of the original query. A modified RRF formula is introduced, in which document ranks are adjusted based on relevance group membership using an exponential function. This approach helps reduce the influence of low-relevance results on highly relevant ones. For experimental validation, the MS MARCO dataset was used, which contains real user queries and manually annotated relevance judgments. The effectiveness of the original and modified RRF methods was compared using the MRR@10 metric.The scientific novelty of this work lies in the use of a dynamic, context-sensitive approach to forming relevance groups without requiring repeated access to the database or index. The proposed solution operates solely on the query text, ensuring computational efficiency and minimizing resource consumption. Unlike existing RRF modifications, the proposed method allows for flexible adjustment of ranking weights based on lexical and semantic analysis.The findings of the study confirm that the proposed modification of the RRF method improves the ranking position of relevant documents in hybrid search results. The modified method achieves a higher MRR@10 score (0.1880 compared to 0.1718 for the classical RRF), indicating a reduction in the negative impact of irrelevant results.

References

Aggarwal C. C. Data Mining: The Textbook. Springer. 2015.

Bajaj P., Campos D., Craswell N., Deng L., Gao J., Liu X., Majumder R., McNamara A., Mitra B., Nguyen T., et al. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268. 2016.

Bendersky M., Zhuang H., Ma J., Han S., Hall K., McDonald R. RRF102: Meeting the TREC-COVID challenge with a 100+ runs ensemble. arXiv. 2020. https://doi.org/10.48550/arXiv.2010.00200

Bruch S., Gai S., Ingber A. An analysis of fusion functions for hybrid retrieval. arXiv preprint arXiv:2210.11934. 2022. URL: https://arxiv.org/abs/2210.11934

Cormack G. V., Clarke C. L. A., Buecher S. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. SIGIR, 2009. 758–759.

Johnson J., Douze M., Jégou H. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 2019. 7(3), 535–547.

Kim S.-W., Gil J.-M. Research paper classification systems based on TF-IDF and LDA schemes. Human-centric Computing and Information Sciences, 2019. 9(1), 30.

Liu L., Zhang M. Exp4Fuse: A rank fusion framework for enhanced sparse retrieval using large language model-based query expansion. arXiv. 2025. https://doi.org/10.48550/arXiv.2506.04760

Mourão A., Martins F., Magalhães J. Inverse Square Rank Fusion for Multimodal Search. Proceedings of the 12th International Workshop on Content-Based Multimedia Indexing (CBMI 2014), 2014. 1–6. https://doi.org/10.1109/ CBMI.2014.684982

Radford A., et al. Learning Transferable Visual Models From Natural Language Supervision. ICML. 2021.

Robertson S., Zaragoza H. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends® in Information Retrieval, 2009. 3(4), 333–390.

Samuel S., DeGenaro D., Guallar-Blasco J., Sanders K., Eisape O., Spendlove T., Reddy A., Martin A., Yates A., Yang E., Carpenter C., Etter D., Kayi E., Wiesner M., Murray K., Kriz R. MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion. arXiv. 2025. URL: https://arxiv.org/abs/2503.20698

Published

2025-09-23

How to Cite

БІЛИЙ, М., & КРИЛОВ, Є. (2025). MODIFICATION OF THE RECIPROCAL RANK FUSION METHOD TO IMPROVE HYBRID SEARCH RESULTS IN INFORMATION SYSTEMS WITH VECTOR DATABASES. Information Technology and Society, (2 (17), 14-19. https://doi.org/10.32689/maup.it.2025.2.2