Efficient Multi-Scale for Arbitrary Scene Text Detection for High-Resolution Image

- Tài khoản và mật khẩu chỉ cung cấp cho sinh viên, giảng viên, cán bộ của TRƯỜNG ĐẠI HỌC FPT
- Hướng dẫn sử dụng: Xem Video .
- Danh mục tài liệu mới: Tại đây .
- Đăng nhập : Tại đây .

UniKey
Bộ gõ Tiếng Việt miễn phí
Font ABC-VNI
Bộ Font ABC - VNI thông dụng
WinRAR
Soft nén và giải nén mạnh
Flashget v3.3
Trình hỗ trợ tăng tốc download files
Mozilla Firefox
Trình duyệt Web nhanh nhất
Foxit Reader 5
Xem PDF miễn phí

HƯỚNG DẪN SỬ DỤNG
Hướng dẫn cách sử dụng thư viện
GỬI YÊU CẦU
Gửi yêu cầu cho thủ thư

SỐ LƯỢT TRUY CẬP

Visits Counter

FPT University|e-Resources > Đồ án tốt nghiệp (Dissertations) > Khoa học máy tính - Trí tuệ nhân tạo >

Please use this identifier to cite or link to this item: http://ds.libol.fpt.edu.vn/handle/123456789/3783

Title:	Efficient Multi-Scale for Arbitrary Scene Text Detection for High-Resolution Image
Other Titles:	Mô hình phát hiện thông tin văn bản trong ảnh chất lượng cao theo thời gian thực
Authors:	Phan, Duy Hung Do, Quang Manh Tran, Minh Khoi Duong, Minh Hieu
Keywords:	Artificial Intelligence High-Resolution Image Text Detection
Issue Date:	2023
Publisher:	FPTU Hà Nội
Abstract:	Many datasets centered around scene text detection have emerged with the progressive evolution of deep learning techniques. These datasets exhibit attributes of high-resolution imagery containing diminutive textual elements, thereby establishing a burgeoning trend in computational tasks. Conventional approaches to mitigate the challenge of small text within these images involve downsizing the image dimensions. However, such a strategy often leads to text obfuscation and perceptual deterioration, consequently undermining performance outcomes. Thus, the employment of substantial models operating on enlarged input scales becomes imperative, albeit demanding significant GPU computational resources and prolonged training durations. In the context of this investigative inquiry, we introduce "TextFocus," an algorithm designed to harness a multi-scale training strategy optimally and efficiently. Instead of meticulously scrutinizing individual pixels across an image pyramid, the TextFocus algorithm adopts a discerning approach. It endeavors to delineate contextual domains encompassing instances of ground-truth text, referred to as "chips." Subsequently, the algorithm engages in an intricate process of identifying all textual regions within the sampled image. This entails accumulating comprehensive textual insights from each "chip," which are subjected to meticulous post-processing techniques, culminating in deriving definitive outcomes for text detection. The prowess of TextFocus lies in its capacity to adeptly transmute expansive image samples, boasting dimensions of 4000x4000 pixels, into scaled-down, lower-resolution "chips" measuring 640x640 pixels. This transformation imparts a dual advantage of expediting training procedures and enabling the accommodation of larger batch sizes, with a remarkable upper limit of 50 batches on a solitary GPU, even under conventional scaling paradigms. While the prevailing wisdom dictates an incremental enhancement in outcomes with augmented training dimensions, our approach deviates from this paradigm. Our experimentation illustrates that training on high-resolution scales might not yield optimal performance. Our implementation employs a ResNet-18 backbone, augmented by a segment-like head architecture. The empirical outcomes showcase a commendable F1 score of 0.828 on the SCUT-CTW1500 dataset [1], alongside a respectable F1 score of 0.611 on the Large CTW dataset [2]. These achievements are coupled with a real-time operational capacity, as substantiated by the acceptable frames per second (FPS) metric.
URI:	http://ds.libol.fpt.edu.vn/handle/123456789/3783
Appears in Collections:	Khoa học máy tính - Trí tuệ nhân tạo

Files in This Item:

File	Description	Size	Format
Efficient-Multi-Scale_Report.pdf	Free	14.19 MB	Adobe PDF	View/Open
Efficient-Multi-Scale_Slide.pdf	Free	2.9 MB	Adobe PDF	View/Open

Recommend this item

View Statistics

FSE Hoa Lac Library

Add : Room 107, 1st floor, Hoa Lac campus, Km28 Thang Long Avenue, Hoa Lac Hi-Tech Park

Office tel: + 844.66805912 / Email : thuvien_fu_hoalac@fpt.edu.vn

- Feedback