Real-time Optical Character Recognition in Manufacturing Using YOLOv8 and Embedded Systems for Engraved Characters on a Metal Surface
Article information
Abstract
This study introduces a YOLOv8-based Optical Character Recognition (OCR) system specifically optimized for engraved character recognition, aiming to facilitate digital transformation and enhance smart manufacturing processes. To overcome limitations of manual part identification and quality inspection prevalent in conventional manufacturing environments, this study employed engraved character data from metal scroll compressor components. A lightweight deep learning model was designed and deployed on a Raspberry Pi platform to enable real-time character recognition. In a controlled laboratory environment, more than 150 images were acquired and processed through data augmentation and normalization techniques. The YOLOv8 object detection model was trained under various lighting and angular conditions, achieving high recognition performance (recall: 94.8%, mAP: 88.8%). Additionally, a post-processing algorithm was implemented to organize detected characters by their positions and classes, thereby reconstructing final product identification codes. Results confirmed the feasibility of real-time quality inspection and the potential for process automation in manufacturing environments. Future research needs to focus on enhancing recognition precision through improved post-processing techniques for reverse-oriented characters and diverse text layouts, while also exploring alternative embedded platforms to further optimize system efficiency.
1 Introduction
The digital transformation of the manufacturing industry has become a key factor in enhancing competitiveness. With the exponential increase in data collected from manufacturing sites, big data presents a significant opportunity to shift this manufacturing paradigm toward smart manufacturing. This transition enables companies to adopt data-driven strategies to secure a competitive edge [1]. Moreover, the effective utilization of information embedded in big data has underscored the growing importance of various data-driven model approaches powered by artificial intelligence [2].
Smart manufacturing has garnered significant attention due to its potential to simultaneously achieve efficiency, profitability, and sustainability. However, for small and medium-sized enterprises (SMEs) engaged in conventional machining processes, digitalization remains insufficiently implemented, making the adoption of real-time process monitoring and automated inspection systems challenging. In particular, character information on workpiece surfaces is primarily read manually and entered into computers, with quality inspections still heavily reliant on manual labor. This method is not only labor-intensive and inefficient but also leads to reduced reliability and difficulties in analyzing defect causes during processing [3].
These challenges are particularly pronounced in precision-machined components such as fixed scroll parts, which are the focus of this study, as even minor quality variations can significantly impact product performance [4]. Therefore, the application of real-time monitoring and OCR-based quality inspection is expected to facilitate immediate quality control and early defect detection through automated inspection systems. In fact, Optical Character Recognition (OCR) has established itself as a successful technology in pattern recognition and artificial intelligence. While numerous commercial OCR systems exist, further research and optimization are required to meet the precision and speed demands of specific industrial applications [5].
As manufacturing process automation has recently emerged as a crucial factor in improving productivity and reducing costs, OCR technology, which automatically recognizes characters or markings engraved on components, plays a vital role in inventory management, quality control, and traceability. However, recognizing engraved characters on metal surfaces is a complex task that requires the integration of various technologies, including machine vision, pattern recognition, image processing, and artificial intelligence. The recognition of engraved characters and fine markings, commonly found in manufacturing environments, poses a greater challenge than standard text recognition.
Additionally, the contrast between the engraved text and the metal background is often minimal, particularly when dealing with highly reflective or rough surfaces. These factors make it difficult to apply traditional OCR methods without extensive preprocessing, including contrast enhancement, shadow removal, and advanced feature extraction. The recognition of engraved characters and fine markings, commonly found in manufacturing environments, thus requires a more sophisticated approach integrating machine vision, pattern recognition, image processing, and artificial intelligence.
To achieve high-speed, high-precision processing in embedded environments, lightweight models and robust post-processing techniques are essential. A recent study [6] proposed an approach where alphanumeric codes attached to manufactured metal components were captured using a smartphone camera. The captured images were then processed through preprocessing techniques such as contrast and brightness adjustment, as well as binarization, before applying OCR to extract the codes automatically. Subsequently, leading OCR engines such as Tesseract, Keras OCR, and EasyOCR were compared and analyzed. The recognized code information was transmitted to a temporary storage system, demonstrating the potential for expansion into various industries, including heavy industry, construction, and marine engineering. This study highlights the feasibility of integrating smartphone-based image acquisition with preprocessing and OCR techniques to reduce error rates and enhance operational efficiency while maintaining adaptability for diverse manufacturing environments.
Based on this background, this study aims to develop a machine vision OCR module for inspecting fixed scroll components and intelligently automating precision machining processes.
2 Background
To achieve digital transformation in the manufacturing industry and implement smart manufacturing, it is essential to effectively utilize the data generated during production processes. In particular, variations in product quality and component identification data are critical factors that directly impact productivity, quality management, and inventory traceability. However, conventional manufacturing systems still rely on manual processes in which operators read character information from workpiece surfaces and input it into computers. This approach is labor-intensive, prone to errors, and significantly undermines both efficiency and reliability.
To address these challenges, OCR technology is being introduced into manufacturing environments, laying the foundation for process automation and real-time monitoring. In particular, when dealing with engraved characters on metal workpieces—where recognition is hindered by complex factors such as lighting variations, reflections, and machining marks—traditional OCR techniques face inherent limitations. As a result, the development of OCR systems tailored to manufacturing environments has emerged as a critical research topic, as such advancements can significantly improve data collection accuracy and operational efficiency.
2.1 Optical Character Recognition
OCR is a technology that identifies characters within an image and converts them into digital text. In its early stages, OCR primarily relied on preprocessing segmentation recognition pipeline-based methods, with Tesseract being one of the most widely used engines. While Tesseract achieves high accuracy in processing simple black-and-white document images, it requires appropriate preprocessing when dealing with complex color images or those with significant background noise [7,10]. Jing Li [3] et al. proposed a method to enhance the recognition accuracy of Tesseract-based OCR to 98.6% by addressing complex backgrounds and reflection issues in metal workpieces. Their approach incorporated shading techniques and the Retinex algorithm in the image acquisition stage, along with median filtering to reduce noise. However, in dynamic industrial environments where factors such as lighting variations, reflections, and image distortions are prevalent, basic preprocessing techniques alone have inherent limitations.
Recent studies have actively explored deep learning-based object detection models to simultaneously detect and recognize text regions. For instance, Nagaoka [8] et al. proposed a Faster R-CNN-based text detection method that utilizes feature maps from multiple convolutional layers to detect text of varying sizes at multiple resolutions. Similarly, Shashidhar [9] et al. introduced a method for recognizing characters on Indian number plates by first removing motion blur using a point spread function and Wiener filtering, followed by character detection with YOLOv3. These advanced OCR approaches offer high accuracy and fast processing speeds, making them particularly suitable for industrial environments where real-time processing is essential.
The review by Wen et al. [11] on steel surface defect recognition and the study by Verma & Rajotia [12] on machining feature recognition methodologies address the challenges and solutions associated with image analysis in complex industrial environments and CAD-based feature recognition, respectively. Wen et al. discuss efforts to overcome challenges such as dataset acquisition, lighting inconsistencies, reflections, and noise in steel defect detection through deep learning and traditional machine learning techniques. Meanwhile, Verma & Rajotia analyze various methodologies, including graph-based approaches, for the automatic recognition of machining features within CAD models. These studies highlight the necessity of algorithm optimization to enhance automation and real-time processing performance in the manufacturing industry, contributing to improvements in data reliability, quality control, and production automation.
For the real-time application of OCR in manufacturing environments, lightweight deep learning models capable of high-speed inference on embedded systems, such as Raspberry Pi, along with sophisticated post-processing techniques, are essential. The advancements in these technologies will reflect the challenges and solutions outlined in the aforementioned research on steel defect recognition and machining feature recognition. Ultimately, they will serve as key enablers in ensuring data reliability, facilitating defect cause analysis, and realizing production automation in complex industrial settings.
2.2 You Only Look Once
You Only Look Once (YOLO) is an algorithm that introduced an innovative single-stage approach to object detection. First proposed by Redmon et al. in 2015, YOLO revolutionized the field by dividing the entire input image into a grid and simultaneously predicting a fixed number of bounding boxes and class probabilities for each grid cell. This design enabled faster and more efficient object detection compared to traditional multi-stage methods [13,15].
The YOLO series has undergone continuous advancements. Jiang et al. (2022) analyzed the key improvements from YOLO V1 to V5, highlighting the introduction of anchor boxes, multi-scale feature extraction, residual connections, and network optimization techniques that significantly enhanced both accuracy and speed [13]. Meanwhile, Hussain (2023) revisited the development trajectory up to the latest YOLOv8 from the perspective of industrial manufacturing and defect detection applications, emphasizing model optimization strategies suited for real-time processing and constrained computational environments in manufacturing settings [14].
The YOLO series has gained significant recognition in the field of general object detection due to its high accuracy and real-time processing capabilities [16].
Specifically, its single-stage detection architecture directly predicts bounding boxes and class probabilities from the input image, achieving high frame rates and rapid response times. Continuous improvements in network architecture have facilitated the design of lightweight models, enabling effective deployment in computationally constrained environments such as embedded systems. The adoption of multi-scale feature extraction and residual connections has demonstrated robust generalization performance in detecting objects of varying sizes and shapes. Compared to its predecessors, YOLOv8 features a more streamlined model architecture with optimized pre-processing and post-processing operations, delivering superior real-time performance even on embedded devices [13–15].
Recent studies highlight that YOLO-based algorithms are becoming essential tools in industrial manufacturing applications, particularly for surface defect detection. With the growing demand for real-time quality inspection and automated defect detection, YOLO’s lightweight architecture, rapid inference speed, and high accuracy have positioned it as a key technology for ensuring data reliability and enhancing production efficiency in manufacturing environments.
The YOLOv8 architecture consists of two primary components: the backbone and the head, both implemented using fully convolutional neural networks [20].
The backbone comprises a series of sequential convolutional layers designed to extract relevant features from the input image. This stage is responsible for capturing spatial and semantic information at multiple scales, enabling the network to learn robust feature representations. The head processes the feature maps generated by the backbone to produce the final detection results, including bounding box coordinates and class probabilities. In YOLOv8, the head is modular, enabling independent management of objectness scoring, classification, and regression tasks. It employs a series of convolutional and linear layers to efficiently map the extracted features to the final outputs. This design is optimized for both speed and accuracy, allowing YOLOv8 to deliver fast and precise object detection.
Accordingly, this study builds upon previous discussions on the advancements of the YOLO series and aims to utilize YOLOv8 for recognizing engraved characters on fixed scroll compressor components in manufacturing environments. The lightweight architecture and optimized computations of YOLOv8 are expected to enable real-time quality inspection and the rapid and accurate extraction of component identification codes, even in embedded environments with limited computational resources [16]. By applying YOLOv8 to images of metal scroll components collected in a controlled laboratory setting, this study seeks to contribute to the enhancement of data reliability and the implementation of production automation as a key technology in manufacturing environments.
3 Methodology
This study proposes a YOLOv8-based Optical Character Recognition (OCR) model designed to recognize identification characters engraved on metal components and implements a real-time inference system in a Raspberry Pi environment. Fig. 1 provides a visual representation of the overall process flow, encompassing data collection, augmentation, model training, real-time inference, and result visualization. In the initial stage, a total of 150 images OCR scroll data samples were acquired using a USB camera in a laboratory setting, specifically from scroll compressors. These collected images were then annotated and segmented into predefined character classes, including digits (0–9) and letters (A, C, H, P, Q, Y). To improve the model’s generalization capability, the segmented data underwent augmentation, increasing the dataset threefold through techniques such as rotation, brightness adjustment, and exposure modification. The augmented dataset was then utilized for training a YOLOv8-based OCR model to accurately detect and recognize the engraved characters. The trained model was deployed on a Raspberry Pi for real-time image capture and inference, processing live input from the camera to perform OCR recognition. Finally, the results were displayed with bounding boxes and confidence scores, and the OCR accuracy was evaluated based on the model’s ability to recognize engraved characters that are distinguishable to humans.
3.1 Data Collection and Processing
In this study, over 150 images were collected in a controlled laboratory environment and used for training the engraved character OCR model. Fig. 1 presents images of the fixed scroll components made of metal, which were utilized in this research. Figs. 2(a)–2(c) illustrate (a) the overall shape of the component, including the central scroll structure, (b) the side view, and (c) the front view, where the product identification codes “Q7” and “APH 34” are engraved. The collected dataset consists of four types of scroll compressor components, structured to facilitate the recognition of detailed part numbers and product identification codes. Labeling was defined into 15 classes composed of numerical and alphabetical characters, as outlined in Table 1. Each object was annotated using bounding box coordinates in the format (x, y, w, h).
The collected data were normalized to be compatible with the YOLOv8 model. In this study, object detection aims to identify various objects within an image and classify them into numeric (0–9) and alphabetic character classes. Additionally, data augmentation techniques were applied to enhance the model’s generalization performance.
Specifically, to improve robustness to variations in character orientation, a rotation adjustment within a ±15° range was applied. To improve the robustness of lighting variations, brightness was adjusted within a ±15% range, and to maintain recognition performance in low-light conditions, exposure modifications within a ±10% range were introduced. As a result, the augmented dataset was expanded to include over 300 images.
3.2 YOLOv8-based OCR Model Training
In this study, a YOLOv8 instance segmentation model was trained to recognize engraved characters on metal fixed scroll components. To ensure real-time inference in a Raspberry Pi environment, the model was optimized for lightweight deployment, enabling effective component identification and quality verification within embedded systems in manufacturing environments. The dataset used for model training consisted of over 300 images, which were split into 80% for training, 10% for validation, and 10% for testing.
To enhance the robustness of the YOLOv8 model against variations in lighting conditions and viewing angles, data augmentation techniques, including controlled rotation adjustments (15), brightness modulation (15%), and exposure variations (10%), were applied. The input image resolution was standardized at 640 × 480 pixels to maintain consistency across the dataset.
The model was fine-tuned by optimizing hyperparameters critical to its performance in an embedded environment. A cross-entropy-based loss function was employed for object detection and OCR tasks, ensuring effective differentiation between character classes. The AdamW optimization algorithm was selected to improve training stability and accelerate convergence, mitigating the risk of overfitting and local minima entrapment. Table 2 presents the hyperparameter configurations used for training the YOLOv8 model, detailing key settings optimized for real-time OCR in an embedded environment. The training process was conducted using a batch size of 16, a learning rate of 1e–3, and a total of 10 epochs, which were empirically determined based on preliminary experiments to balance computational efficiency and model accuracy.
Fig. 3 illustrates the changes in loss and accuracy metrics throughout the YOLOv8 training process. The results indicate that OCR box and mask losses continuously decreased, while precision and recall steadily improved as training progressed. Notably, the simultaneous increase in mAP(0.5) and mAP(0.5 : 0.95) suggests that the model attained a high level of detection and segmentation capability across various sizes and shapes of engraved characters. These findings suggest that the trained YOLOv8 model is expected to perform fast and accurate engraved character recognition, even in the real-time embedded environments required in manufacturing settings.
Through this analysis of the training results, it was confirmed that the model progressively stabilized and achieved high accuracy in recognizing engraved characters. The following section evaluates real-time inference using the trained model and assesses its applicability in actual manufacturing environments.
3.3 Real-time OCR System Implementation
In this study, a real-time OCR system was developed using the YOLOv8 model deployed on a Raspberry Pi to recognize engraved characters on metal scroll compressor components (Fig. 1). The Raspberry Pi was configured with a USB camera set to a fixed resolution of 640 × 480, capturing images of the component surface at approximately 20 fps. To ensure accurate character recognition, the trained YOLOv8 instance segmentation weights from Section 3.2 were utilized for inference. Given the limited computational resources of the Raspberry Pi, a lightweight architecture was adopted to maintain stable real-time processing, optimizing both memory usage and CPU/GPU performance.
The system was implemented using OpenCV and YOLOv8 for real-time object detection and character recognition. The experimental setup involved installing essential software packages, including Python, OpenCV, Ultralytics’ YOLOv8, and NumPy, ensuring compatibility with the Raspberry Pi hardware. The YOLOv8n model was selected due to its efficient balance between accuracy and computational cost. The captured video frames were processed in real time by the YOLOv8 model, which detected objects and extracted relevant textual information. Bounding boxes, confidence scores, and class labels were overlaid on the video feed for visualization, highlighting detected characters with sufficient confidence. The results demonstrated that the YOLOv8-based OCR system could function effectively within the computational constraints of the Raspberry Pi, confirming its feasibility for real-time character recognition in an embedded environment.
During the inference process, the detected character objects were mapped to actual numerical or alphabetical values based on predefined classification IDs. Subsequently, the bounding box coordinates and class information of each recognized character were stored, followed by a process to reconstruct the final product identification code. This procedure was carried out as follows:
1) Normalization: The bounding box coordinates (x,y) of each detected object were normalized by dividing them by the total image width (640) and height (480), respectively. This ensured consistent coordinate comparisons across different resolutions.
2) Sorting by X- and Y-Axis: The detected labels were grouped based on a predefined threshold (y_threshold) along the Y-axis to cluster characters belonging to the same horizontal line. The grouped characters were then sorted in ascending order based on their X-axis coordinates.
3) Merging: Within each group, numerical and alphabetical character classes were distinguished and concatenated sequentially to reconstruct the final identification code.
Through this process, the position and class information of each detected character were sorted and grouped, allowing for the restoration of the engraved identification code on the actual component. Finally, the generated product identification code was overlaid in real time on a display connected to the Raspberry Pi, enabling the user to verify the results instantly.
4 Results and Discussion
In this study, both YOLOv8- and YOLOv5-based object detection algorithms were applied to the OCR of engraved characters on metal scroll compressor components, and a real-time processing system was implemented in a Raspberry Pi environment. As shown in Table 3, the YOLOv8-based model achieved a mean Average Precision (mAP) of 88.8%, a Precision of 79.0%, and a Recall of 94.8%, whereas the YOLOv5-based model yielded significantly lower results (mAP: 17.1%, Precision: 56.3%, Recall: 32.2%). These results indicate that the YOLOv8 model is far more effective in detecting engraved characters of various sizes and shapes under the given experimental conditions, demonstrating much higher recall and overall recognition performance compared to YOLOv5. However, the lower precision in both models suggests there is still room for further improvement in minimizing false positives, especially for visually similar or overlapping characters.
During the experiments, recognition accuracy was higher for larger characters and when the characters were oriented in the correct direction. Additionally, recognition performance improved under consistent lighting conditions and stable camera angles. Furthermore, the system demonstrated the capability of real-time inference at multiple frames per second even in an embedded environment such as Raspberry Pi, thereby validating its practical applicability in resource-constrained manufacturing settings. The lower precision compared to recall in this study can be attributed to instances of duplicate recognition of reversed (inverted) characters and visually similar characters.
In Fig. 4(a), an example is presented where the digit ‘4’ is detected twice at the same position. Although only a single engraved character is present, the model incorrectly detects another ‘4’ due to similar contour features in the surrounding area. In Fig. 4(b), an inverted ‘7’ is misclassified as the adjacent ‘Q.’ Similar issues were also observed where ‘6’ and ‘9’ were misinterpreted as each other due to their rotational symmetry, and ‘P’ and ‘R’ were mistakenly recognized interchangeably.
The real-time inference setup for OCR is illustrated in Fig. 5, showcasing five different engraved character samples on metal scroll compressor components. These samples include numerical and alphanumeric markings: “25”, “29”, “APH34”, “Q3”, and “YPH29”. The images were captured using a USB camera connected to a Raspberry Pi, operating at a fixed resolution of 640 × 480. The YOLOv8 model was employed to detect and recognize these characters in real time. This setup ensures that the trained model effectively identifies and distinguishes various engraved texts despite variations in surface texture, lighting conditions, and character fonts.
Fig. 6 presents the inference time analysis for the OCR system across different character samples. The inference time, measured in milliseconds (ms), varies slightly across different test cases due to differences in character complexity and environmental factors such as lighting and surface reflections. The plotted data demonstrate the stability of the system, with fluctuations observed in specific cases due to varying recognition difficulty. The Raspberry Pi maintained a relatively consistent processing time, confirming the feasibility of deploying YOLOv8 on resource-constrained embedded systems for real-time OCR applications.
The performance metrics of the inference speed for each case are summarized in Table 4. The mean inference time for each sample ranges between 521.04 and 542.79 ms, with standard deviations varying from 12.45 to 32.37 ms. The lowest mean inference time was observed for “29” (521.04 ms), while the highest was recorded for “YPH29” (542.79 ms), indicating minor variations based on the complexity of the engraved characters. The results confirm that the lightweight YOLOv8 model effectively balances speed and accuracy, allowing for real-time recognition within the computational constraints of the Raspberry Pi.
5 Conclusion and Future Research
This study presented an approach that integrates engraved character labeling and data augmentation, a real-time processing structure for embedded devices, and post-processing logic for character grouping. The results demonstrate that even for challenging engraved character recognition tasks, an effective OCR system can be implemented in real-world manufacturing processes by combining proper labeling, data augmentation, model optimization, and real-time post-processing techniques [10,19].
For future research, improving accuracy metrics and inference speed, as well as enhancing adaptability to environmental variations, should be prioritized. Expanding the character classes and refining post-processing techniques to handle multi-line text and inverted character arrangements will further enhance the practical applicability of this approach for industrial process automation. Additionally, exploring alternative embedded platforms or integrating hardware acceleration could further optimize processing speed and energy efficiency, expanding the scope of this research in real-world applications.
Santosh Kumar Henge and Dr. B. Rama [15] explored a neural fuzzy hybrid system to recognize inverted characters and mixed text printed on the reverse side of documents. Their study proposed an approach to address recognition errors caused by differences in alignment, orientation, and inversion characteristics between the front and back of a document. This work highlights the necessity of post-processing logic and data augmentation techniques to mitigate duplication and misclassification issues when recognizing inverted or flipped characters. Additionally, Yao [17] et al. introduced rotation-invariant features to effectively detect text at various angles, including reversed text, contributing to reducing OCR errors caused by irregular text arrangements. However, in industrial manufacturing environments, additional factors such as lighting reflections, contamination, and surface damage necessitate further integration of data augmentation techniques and optical filtering methods [11,12].
During the process of reconstructing product identification codes, the system performed accurately when characters were arranged in a single horizontal line. However, for multi-line text or vertically or invertedly arranged characters, additional refinements in post-processing logic were required. In actual manufacturing environments, factors such as vibrations, temperature fluctuations, and dust accumulation may introduce additional noise.
Lai [10] emphasized that, for upside-down images commonly encountered in industrial settings, verifying and correcting the image orientation prior to OCR is essential for reliable recognition. To address this, images may be rotated by 180 degrees either manually or automatically, using orientation detection algorithms that analyze the position of expected text or display frame features. Only after this orientation correction is performed can the OCR engine be effectively applied. Similarly, Tang [18] et al. highlight the necessity of post-processing techniques to mitigate OCR recognition errors caused by environmental noise, such as lighting variations and surface degradation in manufacturing environments. Therefore, future work should explore retraining strategies or model updates that account for these real-world variables.
Therefore, future research should incorporate data augmentation for reversed characters and implement post-processing logic to further distinguish confusing characters, which could significantly reduce duplication and misclassification, ultimately improving precision.
Notes
Acknowledgement(s)
This paper is based on research conducted under the projects of ‘A Development in The Scroll Compressor Parts Inspection System and Cutting Process Monitoring Data Analysis Technology (No. JH250005)’ funded by Korea Institute of Industrial Technology (KITECH), and ‘Development of automatic control solution for industrial robotics based on 200um precision 3D reconstruction and application to grinding process (No. P0026191)’ funded by Korea Institute for Advancement of Technology (KIAT).
References
Biography
![]()
Youngjoo Hyun is in the Smart Manufacturing R&D Department at Korea Institute of Industrial Technology (KITECH). She got a M.S. degree in the Department of Industrial Engineering at Yonsei University. Her research interests include anomaly detection in computer vision and machine learning
![]()
Eunseok Nam is a principal researcher in Autonomous Manufacturing & Process R&D Department at Korea Institute of Industrial Technology (KITECH). He got Ph. D. degree at Yonsei University. His research subjects are Precision Manufacturing, and Smart Manufacturing System.
![]()
Youngjun Yoo is a principal researcher in Industrial transformation research department at Korea Institute of Industrial Technology (KITECH). Before the KITECH, he worked Samsung Heavy Industries for smart ships. He got Ph. D. degree at Pohang University of Science and Technology. His research subjects are inspection equipment using AI and IoT equipment, data analysis, and smart manufacturing.