Results Format Overview

This page describes the results format used by the VQA evaluation code.


Results Format

results = [result]

result{
"question_id": int,
"answer": str
}

We have provided an example result JSON file here.


Evaluation Code

We introduce a new evaluation metric which is robust to inter-human variability in phrasing the answers:



In order to be consistent with ‘human accuracies’, machine accuracies are averaged over all 10 choose 9 sets of human annotators.



Before evaluating machine generated answers, we do the following processing:

  • Making all characters lowercase
  • Removing periods except if it occurs as decimal
  • Converting number words to digits
  • Removing articles (a, an, the)
  • Adding apostrophe if a contraction is missing it (e.g., convert "dont" to "don't")
  • Replacing all punctuation (except apostrophe and colon) with a space character. We do not remove apostrophe because it can incorrectly change possessives to plural, e.g., “girl’s” to “girls” and colons because they often refer to time, e.g., 2:50 pm. In case of comma, no space is inserted if it occurs between digits, e.g., convert 100,978 to 100978. (This processing step is done for ground truth answers as well.)

A demo script of the evaluation code is available here.