- Tài khoản và mật khẩu chỉ cung cấp cho sinh viên, giảng viên, cán bộ của TRƯỜNG ĐẠI HỌC FPT
- Hướng dẫn sử dụng: Xem Video .
- Danh mục tài liệu mới: Tại đây .
- Đăng nhập : Tại đây .
SỐ LƯỢT TRUY CẬP


accurate visitors web counter
Visits Counter
FPT University|e-Resources > Đồ án tốt nghiệp (Dissertations) > Khoa học máy tính - Trí tuệ nhân tạo >
Please use this identifier to cite or link to this item: http://ds.libol.fpt.edu.vn/handle/123456789/3783

Title: Efficient Multi-Scale for Arbitrary Scene Text Detection for High-Resolution Image
Other Titles: Mô hình phát hiện thông tin văn bản trong ảnh chất lượng cao theo thời gian thực
Authors: Phan, Duy Hung
Do, Quang Manh
Tran, Minh Khoi
Duong, Minh Hieu
Keywords: Artificial Intelligence
High-Resolution Image
Text Detection
Issue Date: 2023
Publisher: FPTU Hà Nội
Abstract: Many datasets centered around scene text detection have emerged with the progressive evolution of deep learning techniques. These datasets exhibit attributes of high-resolution imagery containing diminutive textual elements, thereby establishing a burgeoning trend in computational tasks. Conventional approaches to mitigate the challenge of small text within these images involve downsizing the image dimensions. However, such a strategy often leads to text obfuscation and perceptual deterioration, consequently undermining performance outcomes. Thus, the employment of substantial models operating on enlarged input scales becomes imperative, albeit demanding significant GPU computational resources and prolonged training durations. In the context of this investigative inquiry, we introduce "TextFocus," an algorithm designed to harness a multi-scale training strategy optimally and efficiently. Instead of meticulously scrutinizing individual pixels across an image pyramid, the TextFocus algorithm adopts a discerning approach. It endeavors to delineate contextual domains encompassing instances of ground-truth text, referred to as "chips." Subsequently, the algorithm engages in an intricate process of identifying all textual regions within the sampled image. This entails accumulating comprehensive textual insights from each "chip," which are subjected to meticulous post-processing techniques, culminating in deriving definitive outcomes for text detection. The prowess of TextFocus lies in its capacity to adeptly transmute expansive image samples, boasting dimensions of 4000x4000 pixels, into scaled-down, lower-resolution "chips" measuring 640x640 pixels. This transformation imparts a dual advantage of expediting training procedures and enabling the accommodation of larger batch sizes, with a remarkable upper limit of 50 batches on a solitary GPU, even under conventional scaling paradigms. While the prevailing wisdom dictates an incremental enhancement in outcomes with augmented training dimensions, our approach deviates from this paradigm. Our experimentation illustrates that training on high-resolution scales might not yield optimal performance. Our implementation employs a ResNet-18 backbone, augmented by a segment-like head architecture. The empirical outcomes showcase a commendable F1 score of 0.828 on the SCUT-CTW1500 dataset [1], alongside a respectable F1 score of 0.611 on the Large CTW dataset [2]. These achievements are coupled with a real-time operational capacity, as substantiated by the acceptable frames per second (FPS) metric.
URI: http://ds.libol.fpt.edu.vn/handle/123456789/3783
Appears in Collections:Khoa học máy tính - Trí tuệ nhân tạo

Files in This Item:

File Description SizeFormat
Efficient-Multi-Scale_Report.pdfFree14.19 MBAdobe PDF book.png
View/Open
Efficient-Multi-Scale_Slide.pdfFree2.9 MBAdobe PDF book.png
View/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

  Collections Copyright © FPT University

FSE Hoa Lac Library

Add : Room 107, 1st floor, Hoa Lac campus, Km28 Thang Long Avenue, Hoa Lac Hi-Tech Park

Office tel: + 844.66805912  / Email :  thuvien_fu_hoalac@fpt.edu.vn

 - Feedback