- Tài khoản và mật khẩu chỉ cung cấp cho sinh viên, giảng viên, cán bộ của TRƯỜNG ĐẠI HỌC FPT
- Hướng dẫn sử dụng: Xem Video .
- Danh mục tài liệu mới: Tại đây .
- Đăng nhập : Tại đây .
SỐ LƯỢT TRUY CẬP


accurate visitors web counter
Visits Counter
FPT University|e-Resources > Đồ án tốt nghiệp (Dissertations) > Khoa học máy tính - Trí tuệ nhân tạo >
Please use this identifier to cite or link to this item: http://ds.libol.fpt.edu.vn/handle/123456789/3998

Title: Visual Question Answering for Medical Data using A Visio-Linguistic Model
Other Titles: Trả lời câu hỏi trực quan cho dữ liệu y tế sử dụng mô hình ngôn ngữ thị giác
Authors: Bùi, Văn Hiếu
Trần, Quang Đức
Trần, Thị Kim Thanh
Lê, Việt Tiến
Keywords: Trí tuệ nhân tạo
Artificial Intelligence
Visio-linguistic
Computer Vision
Natural Language Processing
Prototype Learning
Visual Question Answering
Issue Date: 2023
Publisher: FPTU Hà Nội
Abstract: Recently, the research on Medical Visual Question Answering (Med-VQA) [1] is becoming significantly popular. Med-VQA intends to answer the question, given an image with vital clinic-relevant information, helps physicians in diagnosing diseases, giving patients better insights about illness. Med-VQA performs worse than general domain VQA due to a lack of accurate data such as the typical image as X-ray image. And another reason is proposed models are complicated in both image encoder and text encoder, which does not completely have outstanding performance. In order to deal with Med-VQA data limitation, recents studies primarily refine the fusion module which is responsible for synthesizing the question features and image features and provide models pre-trained by self-collection new dataset, overlooking the effect of question and image history. In this thesis, we introduce a visio-linguistic model, the architecture employing an Associative Memory Module in the shape of separate storage of visual-linguistic individual experiences and their relationship to enhance context. Additionally, we introduce a Prototype Learning block to carry out stratified prototype learning on textual, visual embeddings utilizing morden Hopfield layers. Our model endeavors to acquire the most significant prototypes from the embeddings of texts and images with the augmentation of memory from associate memory modules. This is in contrast to directly acquiring concrete representations of joint features for different meanings in text and image. Then, by using these learned prototypes, more complex semantics can be represented for the answer. On VQA-RAD datasets, the proposed method accomplishes state-of-the-art performance with notable accuracy improvements of 0.45 %.
URI: http://ds.libol.fpt.edu.vn/handle/123456789/3998
Appears in Collections:Khoa học máy tính - Trí tuệ nhân tạo

Files in This Item:

File Description SizeFormat
Report-Visual-Question-Answering.pdfFree1.94 MBAdobe PDF book.png
View/Open
Slide-Visual-Question-Answering.pdfFree4.43 MBAdobe PDF book.png
View/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

  Collections Copyright © FPT University

FSE Hoa Lac Library

Add : Room 107, 1st floor, Hoa Lac campus, Km28 Thang Long Avenue, Hoa Lac Hi-Tech Park

Office tel: + 844.66805912  / Email :  thuvien_fu_hoalac@fpt.edu.vn

 - Feedback