We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The data set consists of almost 15,000 multiple choice question answers obtained from over 400 movies and features high semantic diversity.
Each question comes with a set of five highly plausible answers; only one of which is correct. The questions can be answered using multiple sources of information: movie clips, plots, subtitles, and for a subset scripts and DVS. Click here to see examples of the data set.
NEW: As part of the ICCV 2017 workshop, we are adding about 80 more movies for video-based answering. We are also introducing an intermediate task, that could be used to learn improved descriptors: alignment between video clips and plot synopses sentences, evaluated through retrieval.Cite this paper if you use the data: