- Tài khoản và mật khẩu chỉ cung cấp cho sinh viên, giảng viên, cán bộ của TRƯỜNG ĐẠI HỌC FPT
- Hướng dẫn sử dụng:
Xem Video
.
- Danh mục tài liệu mới:
Tại đây
.
-
Đăng nhập
:
Tại đây
.
Fashion parsing is a fundamental task when deploying applications such as product images search, recommended, visual try on,..etc. It must first recognize the human body component of the input image in order to determine where the clothing area is located and then synthesise clothes in that location. However, this is quite complicated because the clothes are not uniform in style: wrinkling, fading overtime or the minor inter-class variance: The factors that make an image distinguishable from other classes are quite small (the long skirt image may be mistaken for a slightly shorter skirt or cross-domain issues: the user domain image is different from the store domain image. This thesis presents an approach for a fashion parsing task: detect the type of clothes and segment on pixel level in the images. We found that not every feature map is important to pay attention to and conversely there are feature maps that bring a lot of important information. Previous works have not focused on using this mechanism for fashion parsing tasks. Therefore, we tested the attention mechanism on the Mask-RCNN with modified backbone feature extraction to know how this mechanism affects model result: Integrated channel attention in backbone to collect more important features about clothes and suppress less useful features. Experiments show that applying the channel attention module does not improve results than the original mask-rcnn and state of the art models.