Skip to content

sarwiki Wiki Blog About

CTRL K

CTRL K

Wiki
Blog
About
More
About

Infra
EE
- Domestic
  - Earthing
To Read About
AI
CS
Economics
IOT
Maths
Physics
Signal Proc
Speech
More
About

On this page

Tags

Neural Networks

Vqa

Visual Question Answering

Combine visual with text sequence
- CNN + LSTM
- Generate text from images
  - Automatic scene description
- Cross-modal

cnn+lstm

Word embedding not character

Freeform

Encode facts with two text streams

Limitations

Repetitive answers
- Not much variation
No creativity
- Wont generalise beyond taught concepts