Nhan đề: Key Information Extraction from Vietnamese Invoices by Combining Layout and Context
Tác giả: Ngô, Tuấn Anh; Trần, Mạnh Cường
Tóm tắt: This thesis introduces a deep approach, an effective and robust framework in handling complex document layout, visual features, and textual semantics for Key Information Extraction (KIE). The algorithm combines graph learning with graph convolution, resulting in a richer semantic representation that includes both textual and visual features and a clear global layout. The model's input only with the coordinates of token bounding boxes, avoiding the use of raw images. It leads to a layout-aware language model, which can fine-tune downstream tasks. The model is evaluated on a key information extraction task using publicly available datasets SROIE. We show that it achieves superior performance on datasets consisting of visually rich documents while outperforming the baseline RoBERTa on documents.