Visual QnA

This was the final project done by us for the course Neural Networks. The dataset used here is the CLEVR dataset by Stanford. The aim of this post is to introduce the reader to one of the most intriguing problems in AI. This post is not a tutorial on VQA, but a gentle intro to the problem and our approach to solving the same.

Visual QnA is one of the most challenging problems in Deep Learning. Here is the basic summary of what is Visual QnA through an example :

INPUT TO COMPUTER IS AN IMAGE AND A QUESTION :
Image –>

Image result for CLEVR Dataset

Question 1 –>
What color is the cube that is behind the silver sphere and to the left of yellow cylinder ?

OUTPUT 1 should be :
Brown

Question 2 –>
How many  big spheres are there?

OUTPUT 2 should be :
2


We made a model that reached an accuracy of about 46%. This was pretty good actually, considering the best models in world have an accuracy of around 55% for numbered VQA.

Here are the links to the problem statement and our solution  :

Problem Statement

Here is the link to our model

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s