The VQA Challenge Winners and Honorable Mentions were revealed at the VQA Challenge Workshop
where they were awarded GPUs sponsored by NVIDIA!

What is VQA?

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

  • 254,721 images (MSCOCO and abstract scenes)
  • 3 questions per image (764,163 total)
  • 10 ground truth answers per question
  • 3 plausible (but likely incorrect) answers per question
  • Open-ended and multiple-choice answering tasks
  • Automatic evaluation metric


9,934,119 total

Subscribe to our group for updates!


Details on downloading the latest dataset may be found on the download webpage.

October 2015: Full release (v1.0)

Real Images
  • 204,721 MSCOCO images
    (all of current train/val/test)
  • 614,163 questions
  • 6,141,630 ground truth answers
  • 1,842,489 plausible answers
Abstract Scenes
  • 50,000 abstract scenes
  • 150,000 questions
  • 1,500,000 ground truth answers
  • 450,000 plausible answers
  • 250,000 captions
  • July 2015: Beta v0.9 release

  • June 2015: Beta v0.1 release


Download the paper


Papers reporting results on the VQA dataset should --

1) Report test-standard accuracies, which can be calculated using either of the non-test-dev phases, i.e., "test2015" or "Challenge test2015" on the following links: [oe-real | oe-abstract | mc-real | mc-abstract].

2) Compare their test-standard accuracies with those on the corresponding test2015 leaderboards [oe-real-leaderboard | oe-abstract-leaderboard | mc-real-leaderboard | mc-abstract-leaderboard].

For more details, please see the challenge page.