ICCV2017 logo

The Joint Video and Language Understanding Workshop @ ICCV 2017

MovieQA and The Large Scale Movie Description Challenge (LSMDC)

Final Submission Deadline: 13th October 2017, 23:59 EDT (Toronto time)
Workshop Date: 23rd October 2017, 14:00 CEST onwards, Venice, Italy


The Joint Video and Language Understanding workshop at ICCV 2017 is on 23rd October afternoon session. After LSMDC, we will present a short introduction and summary of the submissions. We also have two short oral presentation slots for challenge winners. As it is a vision conference, we are primarily targeting the Movie: Video + Subtitles answering task. All other tasks (e.g., subtitles, scripts, plot based answering) are welcome to present/discuss their work too!


The ICCV special edition of RSIP Vision publication includes a nice roundup of the workshop. Read it here.

Challenge Winners

The top-6 submissions on the Video-based answering leaderboard as of 14th October 2017 are:

Team name Accuracy Short note
TJU_MM 39.03 Layered Memory Networks
SNUVL & SKTVT 38.16 Local Average Pooling Networks
tjumedia-cpp 37.20 Representing Movie Content Hierarchically
TJU_MM 37.04 Sequential Video VLAD
SNUVL & SKTVT 36.25 Read-Write Memory Network
BI_kmkim 34.74 Multimodal Sequence Memory for video story QA

Challenge winner will get a GPU sponsored by

New Data

  • More data for movie-based answering. After the CVPR 2018 deadline, 15th November 2017, we encourage people to switch movie-based answering (video+subtitle) submissions to the updated video dataset with 80 more movies. We plan to keep the old leaderboard available for fair model comparisons. NOTE: The total number of questions (~15K) is unchanged. More questions will be available for answering with videos. The additional video and meta information will be included in the download package, and current members will be notified.
  • Plot-based retrieval. We are introducing a new task that could be used to pre-train or develop stronger video-language models. This will feature retrieving the correct set of video clips for each sentence from the plot synopsis. Each sentence can be viewed independently or the whole movie can be processed jointly (e.g., by enforcing temporal constraints). As a teaser, this also enables searching within the video by redirecting the text query and searching within the plot.
  • Face tracks. In contrast to captioning tasks, many questions of MovieQA require understanding characters in detail. To help encourage this direction, we are releasing automatic face tracks.


One day, while fleeing from bullies, Forrest's leg braces break apart and he discovers that he can run very fast.


A few years later, Forrest inadvertently runs onto the field during a local high school football match and catches the attention of Coach Bryant from the University of Alabama who is scouting for players.




Test-set Analysis

Watch this space for information about which questions/movies were easiest, which questions/movies still really much more effort, and other fun tidbits. Coming soon!