- Tài khoản và mật khẩu chỉ cung cấp cho sinh viên, giảng viên, cán bộ của TRƯỜNG ĐẠI HỌC FPT
- Hướng dẫn sử dụng:
Xem Video
.
- Danh mục tài liệu mới:
Tại đây
.
-
Đăng nhập
:
Tại đây
.
Automatic caption generation for images has attracted a great deal of attention from many machine learning researchers in recent years. However, a lot of work on this aspect is solved for English. This paper contributes to research on the Image Captioning task in terms of extending the existing dataset for Vietnamese descriptions for images and comparing different model approaches on Vietnamese caption generation. Since most of the available image captioning has been created for the English language and in other most spoken languages such as Chinese, there are very few dataset for Vietnamese. In this specific case, we create a dataset consisting of 4500 captions for 900 images that enlarge the current Vietnamese captions dataset and the translated version of preprocessed English captions from the train dataset of MS-COCO. We evaluated our extended dataset on a neural network-based image caption generation model, then compare it with the Vietnamese image captioning dataset UIT-ViIC, and we made an enhanced Vietnamese caption model based on the most famous image captioning model to improve the accuracy metrics.