Main Article Content
In recent decades, enormous development has also been made in the fields of computer vision, object recognition, and natural language processing (NLP). Artificial Intelligence (AI) applications use NLP to provide the computer with a "comprehensive" capability, such as question-answering models. Natural language inquiries can be addressed by such a computer on any part of an unstructured document. An extension of this method is to integrate NLP with computer vision to perform the Visual Question Answering (VQA) mission, which is to construct systems that can respond to image questions in natural language. A variety of programs that use deep-learning frameworks and machine learning were developed for VQA. The research study implemented a VQA system that, by using a deep convolution neural network (CNN) that collects image attributes, achieves proper knowledge from images. More precisely, with this intent, functionality embedding from the output layer of the VGG19 models is used. Complex reasoning skills and comprehension of natural language are accomplished by our method so that the query can be understood accurately and an acceptable response can be returned. To acquire latent semantic embedding to retrieve information from the query, the Infer Sent model is used. To combine the picture and language models, different architectures are suggested. Our method ensures performance similar to the VQA dataset baseline programs.