Frequently Asked Questions

The MovieQA benchmark allows to answer questions about movies using several different story representations.
  • Movie: video clips + subtitles are the primary mode of answering.
  • Supporting texts (Subtitles, Scripts, DVS) are a secondary way to answer questions about movies. We provide this option for teams that wish to develop NLP techniques for QA.
  • Plot synopses are great representations of story summaries, and we collect QAs using them. As answering using plots is considerably easier, we will separate the leaderboard from the other methods.
  • Open ended provides an opportunity to perform answering using any scheme such as without using stories or a fusion of multiple answering sources.
We are preparing a codes package, and will upload it to GitHub once it is ready. We will inform all registered participants once it is ready.
The MovieQA benchmark on GitHub provides access to the list of movies, splits, and QAs as simple JSON files. Movies are referenced using their IMDb key and each question comes with a unique identifier "qid".
  • train: The 9,848 QAs (269 movies) whose qid starts with train may be used to train a model. We encourage people to make a further split into a train/dev to prevent overfitting, monitor training loss, try different hyperparameters, etc.
  • val: The 1,958 QAs (56 movies) whose qid starts with val can be used to report and compare results for several model configurations. The val set should not be used for training.
  • test: The 3,138 QAs (83 movies) whose qid starts with test are the held out test set. Test set evaluation is performed on the server, and results from the leaderboard should be reported in papers.
The ground truth answer for the test set is held out and evaluation is performed through our server. The val set may be used as your in house test set to compare several model configurations. Please note that result submissions to the server are limited to once every 72 hours, and we encourage participants to be sure of the results they wish to upload.
The video clips are encoded using libx264 (H.264 MPEG4 AVC encoder library v0.148.2) and FFMPEG-1.1.16. The maximum video width is 720px, and they have a variable height due to differences in aspect ratio. We noticed that some of the video clips are 0 bytes, and they have been removed from the QAs video_clips on April 6, 2016.
The shot boundaries and frame number to timestamp correspondences have been added to the download script on Sep. 10, 2016. Please note that the (frame number, timestamp) pairs are available for the entire video. The actual timestamps for a video clip can be found by using the start-frame and end-frame information in the video clip filename.
For example, in the video clip the first and last frame is 184529 and 186260. Their corresponding timestamps are 184529 - 7696.397 and the last frame is 186260 - 7768.594.
Many other domains have already been whitelisted (e.g.,,, etc.) you may want to try to register directly at first. Send us an email (makarand at cs dot toronto dot edu) mentioning the domain of your academic/industry email address, and we will add it to the whitelist.
Please contact us (makarand at cs dot toronto dot edu) and we will try to fix your problems as soon as possible.