Distributed Hierarchical Structure for Multi-Objective OCR Text Recognition in Electrical Cabinet Inspection

Authors: Zhixin Kang , Shu Lin, Jungang Xu, and Qingjie Kong
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 3168-3183
Keywords: OCR Text recognition Text detection Deep learning.

Abstract

Electrical Cabinet Label Recognition is an important part of robotic intelligent inspection. Accurate label recognition is a prerequisite for effectively recording inspection anomalies. After the inspection robot takes pictures, OCR Optimal Character Recognition can detect the location of the electrical cabinet labels and recognize the content on the labels. Firstly, We use the oclip method to reduce the model training time and reduce the need for dataset size. Secondly, In the electrical cabinet label dataset, the text location detection accuracy based on the cutting-edge model DBNet__ reaches 96.74 , and the text content recognition accuracy based on ABINet reaches 89.33 . Through comparative experiments, we found that applying only the ABINet visual model can improve the text recognition accuracy to 90.58 , indicating that the language model in ABINet does not perform well for this task. The ABINet visual model is better at extracting local information from the image text, while the ABINet language model excels at recognizing semantic relationships across different parts of the text. Thirdly, Leveraging this characteristic, we designed a distributed hierarchical structure for the multi-objective OCR text recognition framework ABINet-TS. In the first layer, the visual model is used to recognize local information from the image, while in the second layer, the language model is applied to correlate and correct the predictions made by the visual model. This further improves the text recognition accuracy to 91.74 .We further replace the language model in ABINet-TS with BERT, which further improved the accuracy of text lines to 92.16 .
📄 View Full Paper (PDF) 📋 Show Citation