The Joint Video and Language Understanding workshop at ICCV 2017 is on 23rd October afternoon session. After LSMDC, we will present a short introduction and summary of the submissions. We also have two short oral presentation slots for challenge winners. As it is a vision conference, we are primarily targeting the Movie: Video + Subtitles answering task. All other tasks (e.g., subtitles, scripts, plot based answering) are welcome to present/discuss their work too!
The ICCV special edition of RSIP Vision publication includes a nice roundup of the workshop. Read it here.
The top-6 submissions on the Video-based answering leaderboard as of 14th October 2017 are:
|Team name||Accuracy||Short note|
|TJU_MM||39.03||Layered Memory Networks|
|SNUVL & SKTVT||38.16||Local Average Pooling Networks|
|tjumedia-cpp||37.20||Representing Movie Content Hierarchically|
|TJU_MM||37.04||Sequential Video VLAD|
|SNUVL & SKTVT||36.25||Read-Write Memory Network|
|BI_kmkim||34.74||Multimodal Sequence Memory for video story QA|
Challenge winner will get a GPU sponsored by
Watch this space for information about which questions/movies were easiest, which questions/movies still really much more effort, and other fun tidbits. Coming soon!