Visual Question Answering for Medical Data using A Visio-Linguistic Model

- Tài khoản và mật khẩu chỉ cung cấp cho sinh viên, giảng viên, cán bộ của TRƯỜNG ĐẠI HỌC FPT
- Hướng dẫn sử dụng: Xem Video .
- Danh mục tài liệu mới: Tại đây .
- Đăng nhập : Tại đây .

UniKey
Bộ gõ Tiếng Việt miễn phí
Font ABC-VNI
Bộ Font ABC - VNI thông dụng
WinRAR
Soft nén và giải nén mạnh
Flashget v3.3
Trình hỗ trợ tăng tốc download files
Mozilla Firefox
Trình duyệt Web nhanh nhất
Foxit Reader 5
Xem PDF miễn phí

HƯỚNG DẪN SỬ DỤNG
Hướng dẫn cách sử dụng thư viện
GỬI YÊU CẦU
Gửi yêu cầu cho thủ thư

SỐ LƯỢT TRUY CẬP

Visits Counter

FPT University|e-Resources > Đồ án tốt nghiệp (Dissertations) > Khoa học máy tính - Trí tuệ nhân tạo >

Please use this identifier to cite or link to this item: http://ds.libol.fpt.edu.vn/handle/123456789/3998

Title:	Visual Question Answering for Medical Data using A Visio-Linguistic Model
Other Titles:	Trả lời câu hỏi trực quan cho dữ liệu y tế sử dụng mô hình ngôn ngữ thị giác
Authors:	Bùi, Văn Hiếu Trần, Quang Đức Trần, Thị Kim Thanh Lê, Việt Tiến
Keywords:	Trí tuệ nhân tạo Artificial Intelligence Visio-linguistic Computer Vision Natural Language Processing Prototype Learning Visual Question Answering
Issue Date:	2023
Publisher:	FPTU Hà Nội
Abstract:	Recently, the research on Medical Visual Question Answering (Med-VQA) [1] is becoming significantly popular. Med-VQA intends to answer the question, given an image with vital clinic-relevant information, helps physicians in diagnosing diseases, giving patients better insights about illness. Med-VQA performs worse than general domain VQA due to a lack of accurate data such as the typical image as X-ray image. And another reason is proposed models are complicated in both image encoder and text encoder, which does not completely have outstanding performance. In order to deal with Med-VQA data limitation, recents studies primarily refine the fusion module which is responsible for synthesizing the question features and image features and provide models pre-trained by self-collection new dataset, overlooking the effect of question and image history. In this thesis, we introduce a visio-linguistic model, the architecture employing an Associative Memory Module in the shape of separate storage of visual-linguistic individual experiences and their relationship to enhance context. Additionally, we introduce a Prototype Learning block to carry out stratified prototype learning on textual, visual embeddings utilizing morden Hopfield layers. Our model endeavors to acquire the most significant prototypes from the embeddings of texts and images with the augmentation of memory from associate memory modules. This is in contrast to directly acquiring concrete representations of joint features for different meanings in text and image. Then, by using these learned prototypes, more complex semantics can be represented for the answer. On VQA-RAD datasets, the proposed method accomplishes state-of-the-art performance with notable accuracy improvements of 0.45 %.
URI:	http://ds.libol.fpt.edu.vn/handle/123456789/3998
Appears in Collections:	Khoa học máy tính - Trí tuệ nhân tạo

Files in This Item:

File	Description	Size	Format
Report-Visual-Question-Answering.pdf	Free	1.94 MB	Adobe PDF	View/Open
Slide-Visual-Question-Answering.pdf	Free	4.43 MB	Adobe PDF	View/Open

Recommend this item

View Statistics

FSE Hoa Lac Library

Add : Room 107, 1st floor, Hoa Lac campus, Km28 Thang Long Avenue, Hoa Lac Hi-Tech Park

Office tel: + 844.66805912 / Email : thuvien_fu_hoalac@fpt.edu.vn

- Feedback