АНАЛІЗ МЕТОДІВ СТИСНЕННЯ ЗГОРТКОВИХ НЕЙРОННИХ МЕРЕЖ ДЛЯ ЕФЕКТИВНОГО РОЗГОРТАННЯ У СЕРЕДОВИЩІ EDGE AI

Dmytro MARCHUK

doi:10.32689/maup.it.2025.4.17

Authors

Dmytro MARCHUK State University "Zhytomyr Polytechnic" https://orcid.org/0000-0001-8675-8047

DOI:

https://doi.org/10.32689/maup.it.2025.4.17

Keywords:

Edge AI, Edge Devices, convolutional neural network, Model Compression, quantization, pruning, model compression

Abstract

The article is devoted to the research and empirical evaluation of convolutional neural network compression methods for their effective deployment in the Edge AI environment. Despite their high accuracy, traditional CNN architectures, such as ResNet-18, are too resource-intensive for peripheral devices with limited computing power, RAM, and energy consumption. The main focus is on finding the optimal balance between reducing resource consumption and maintaining high classification accuracy. The goal of this work is to investigate and demonstrate the effectiveness of special model compression techniques, including quantization, pruning, and knowledge distillation, for successfully transferring the powerful capabilities of CNNs to edge devices. The scientific novelty lies in a comprehensive, quantitative comparison of the impact of three main optimization techniques on key model performance indicators. Demonstration that full integer quantization (PTQ Int8) provides a compression ratio of 11.06x with minimal accuracy loss (0.0030), confirming it as the optimal first step. A comparative analysis proving that unstructured compression (50% of ResNet-18 weights) fully recovers and exceeds the baseline accuracy after fine-tuning, while structured compression leads to irreversible accuracy loss (up to 45.70%) under limited retraining conditions, requiring a more balanced approach. Confirmation that knowledge distillation allows the MobileNetV2 model to outperform its traditionally trained version (91.8% vs. 89.5%), maximizing accuracy under severe architectural constraints. Conclusion. Model compression is an engineering trade-off and a necessary condition for creating highly efficient, lowlatency, and energy-efficient deep learning solutions that can be successfully deployed in edge computing environments. The use of quantization allows energy-intensive models to be transformed into practical Edge AI solutions.

References

Марчук Д. К. Аналіз сучасних алгоритмів виявлення і розпізнавання об’єктів з відеопотоку для систем управління паркуванням в реальному часі. Вісник Хмельницького національного університету. Серія: Технічні науки. 2023. № 3 (321). С. 17–23. https://www.doi.org/10.31891/2307-5732-2023-321-3-17-23

Коломоєць С. Застосування штучного інтелекту в розпізнаванні медичних зображень. Інформаційні технології та суспільство. 2024. вип. 3 (14). С. 23–28. https://doi.org/10.32689/maup.it.2024.3.3

Advanced Quantization and Pruning Methods for Optimizing Deep Learning Models on Edge Devices. 2025. URL: https://www.researchgate.net/publication/397380491_Advanced_Quantization_and_Pruning_Methods_for_Optimizing_Deep_Learning_Models_on_Edge_Devices

Balderas L, Lastra M, Benitez JM. Optimizing Convolutional Neural Network Architectures. Mathematics. 2024. Vol. 12, No. 19. P. 3032. https://doi.org/10.3390/math12193032

Careem R., Johar G., Khatibi A. Deep neural networks optimization for resource-constrained environments: techniques and models. Indonesian Journal of Electrical Engineering and Computer Science. 2024. Vol. 33, № 3. P. 1843–1854. DOI: http://doi.org/10.11591/ijeecs.v33.i3.pp1843-1854

Godase, Vaibhav Vilas, Edge AI for Smart Surveillance: Real-time Human Activity Recognition on Low-power Devices. International Journal of AI and Machine Learning Innovations in Electronics and Communication Technology. Vol. 1, Issue 1 (January – June) 2025. Р. 29–46, URL: https://ssrn.com/abstract=5383804 or http://dx.doi.org/10.2139/ssrn.5383804

Husom E. J., Goknil A., Astekin M., Shar L. K., Kåsen A., Sen S., Soylu A. Sustainable llm inference for edge ai: Evaluating quantized llms for energy efficiency, output accuracy, and inference latency. ACM Transactions on Internet of Things. Apr 4 2025. https://doi.org/10.48550/arXiv.2504.03360

Pareek S., Al-Samalek A. S., Alkhayyat A., Singh S., Singh A., Dasi S. Efficient Vision Transformers for Edge Devices: Pruning and Quantization Approaches. 4th International Conference on Technological Advancements in Computational Sciences (ICTACS). Tashkent, Uzbekistan, 2024. P. 1465–1471. https://doi.org/10.1109/ICTACS62700.2024.10840584

Wang, X., Jia, W. Optimizing edge AI: a comprehensive survey on data, model, and system strategies. arXiv preprint arXiv:2501.03265. 2025. URL: https://arxiv.org/abs/2501.03265

ANALYSIS OF CONVOLUTIONAL NEURAL NETWORK COMPRESSION METHODS FOR EFFECTIVE DEPLOYMENT IN EDGE AI ENVIRONMENTS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Language