LIGHTWEIGHT YET PRECISE: INCEPTION-RESNET-A BACKBONE FOR YOLO-BASED WCE POLYP DETECTION

Saydirasulov S.N.; Mukhamadiyev A.N.; Turimov D.M.; Kilichev D.

doi:10.51699/hr6vna74

PDF

Published: 2025-10-23

DOI: https://doi.org/10.51699/hr6vna74

Keywords:

Wireless Capsule Endoscopy (WCE), Polyp Detection, Object Detection, YOLO/YOLOv4-tiny, Inception-ResNet-A, Lightweight Backbone, Real-Time Detection, Medical Image Analysis, Gastrointestinal Endoscopy, Clinical AI.

Saydirasulov S.N.

Department of IT Convergence Engineering Gachon University

Mukhamadiyev A.N.

Department of IT Convergence Engineering Gachon University

Turimov D.M.

Department of IT Convergence Engineering Gachon University, Gyeongi-do, South Korea

Kilichev D.

Department of IT Convergence Engineering Gachon University, Gyeongi-do, South Korea

Abstract

Wireless capsule endoscopy (WCE) produces long, variable-quality video streams in which early and reliable polyp detection is critical. We present YOLO-InceptionResNet-A, a lightweight object detector that replaces the standard YOLOv4-tiny backbone with an Inception-ResNet-A block to enrich multi-scale feature representation while preserving real-time efficiency. The proposed pipeline operates in two stages: (i) a frame-level screening classifier to filter normal/abnormal images, and (ii) the detector for precise polyp localization. To respect clinical color sensitivity, we adopt conservative, clinically aware augmentation (brightness and mild hue jitter), alongside standard normalization. We evaluate on the Kvasir family of WCE images using patient-level splits and report object-detection metrics ([email protected], mAP@[.5:.95], precision/recall/F1, and IoU), frame-level classification metrics (AUROC, sensitivity, specificity), and throughput on a single RTX 3090 GPU. Across benchmarks, our backbone swap consistently improves detection mAP and recall over YOLOv3, YOLOv4, and YOLOv4-tiny baselines, while maintaining low latency suitable for real-time review. Ablation studies isolate the contributions of the Inception-ResNet-A backbone and the augmentation policy, demonstrating that richer multi-scale features are the primary driver of the gains. We discuss limitations related to dataset size and domain shift, and outline external validation on additional WCE datasets as future work. These results indicate that targeted backbone re-architecture can deliver lightweight yet precise WCE polyp detection without sacrificing speed—an attractive trade-off for clinical deployment.

Issue

Vol. 3 No. 10 (2025): Innovative: International Multi-disciplinary Journal of Applied Technology

Section

Articles

How to Cite

LIGHTWEIGHT YET PRECISE: INCEPTION-RESNET-A BACKBONE FOR YOLO-BASED WCE POLYP DETECTION. (2025). Innovative: International Multidisciplinary Journal of Applied Technology (2995-486X), 3(10), 119-127. https://doi.org/10.51699/hr6vna74

References

[1] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Proc. MICCAI, 2015, pp. 234–241.

[2] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation,” IEEE Trans. Med. Imaging, vol. 39, no. 6, pp. 1856–1867, 2020.

[3] D. Jha, P. H. Smedsrud, M. A. Riegler, et al., “ResUNet++: An Advanced Architecture for Medical Image Segmentation,” in Proc. IEEE ISM, 2019, pp. 225–2255 (short paper).

[4] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv:1804.02767, 2018.

[5] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv:2004.10934, 2020.

[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. CVPR, 2016, pp. 770–778.

[7] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” in Proc. AAAI, 2017, pp. 4278–4284.

[8] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, et al., “CSPNet: A New Backbone that can Enhance Learning Capability of CNN,” in Proc. CVPR Workshops, 2020, pp. 390–391.

[9] Z. Zheng, P. Wang, W. Liu, et al., “Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression,” in Proc. AAAI, 2020, pp. 12993–13000. (Includes GIoU/DIoU/CIoU variants.)

[10] S. Borgli, V. Thambawita, P. H. Smedsrud, et al., “HyperKvasir, a Comprehensive Multi-Class Image and Video Dataset for Gastrointestinal Endoscopy,” Sci. Data, vol. 7, no. 283, 2020.

[11] K. Pogorelov, K. R. Randel, C. Griwodz, et al., “KVASIR: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection,” in Proc. ACM MMSys, 2017, pp. 164–169.

[12] D. Jha, P. H. Smedsrud, M. A. Riegler, et al., “Kvasir-SEG: A Segmented Polyp Dataset,” in Proc. Int’l Conf. Multimedia Modeling (MMM) Workshops, 2020, pp. 451–462. (Also arXiv:1911.07069.)

[13] P. H. Smedsrud, V. Thambawita, S. Hicks, et al., “Kvasir-Capsule, a Video Capsule Endoscopy Dataset,” Sci. Data, vol. 8, no. 142, 2021.

[14] S. Chetcuti and R. Sidhu, “Capsule Endoscopy—Recent Developments and Future Directions,” Expert Rev. Gastroenterol. Hepatol., vol. 15, pp. 127–137, 2021.

[15] N. Tajbakhsh, L. Jeyaseelan, Q. Li, et al., “Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation,” Med. Image Anal., vol. 63, 2020. (For discussion of label noise and robustness.)

[16] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal Loss for Dense Object Detection,” in Proc. ICCV, 2017, pp. 2980–2988.

[17] C. Shorten and T. M. Khoshgoftaar, “A Survey on Image Data Augmentation for Deep Learning,” J. Big Data, vol. 6, no. 60, 2019.

[18] S. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On Calibration of Modern Neural Networks,” in Proc. ICML, 2017, pp. 1321–1330.

[19] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks,” in Proc. CVPR, 2018, pp. 7794–7803. (Representative of attention-style backbones.)

[20] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in Proc. CVPR, 2009, pp. 248–255.

[21] D. A. McNemar, “Note on the Sampling Error of the Difference Between Correlated Proportions or Percentages,” Psychometrika, vol. 12, no. 2, pp. 153–157, 1947.

[22] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the Areas Under Two or More Correlated ROC Curves: A Nonparametric Approach,” Biometrics, vol. 44, no. 3, pp. 837–845, 1988.

[23] M. Urban, T. Tripathi, A. Alkayali, et al., “Deep Learning Localizes and Identifies Polyps in Real Time with 96% Accuracy in Screening Colonoscopy,” Gastroenterology, vol. 155, no. 4, pp. 1069–1078, 2018. (Representative clinical CADe study.)

[24] J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, et al., “WM-DRIVE: A Benchmark for Polyp Detection in Colonoscopy,” Med. Image Anal., vol. 17, no. 8, pp. 1185–1207, 2012. (Evaluation and dataset guidance.)

[25] J. Liu, Y. Chen, Z. Wang, et al., “Deep Learning for Automatic Polyp Detection in Colonoscopy: A Systematic Review and Meta-Analysis,” Endoscopy, vol. 53, no. 12, pp. 1244–1256, 2021. (Survey/meta-analysis for clinical perspective.)

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

How to Cite

References