Image Captioning Using Deep Learning
Main Article Content
Abstract
People today need captions for a variety of purposes, including sharing a picture to social media, creating news headlines based on an image, among other things. Instead of manually creating captions for each image, an image captioning system aims to generate them automatically. It provides a descriptive statement for an image that aids in the semantic interpretation of the visual. Picture understanding is a crucial method for decoding semantic image data, and VGG16 can use it. Natural language processing and computer vision are combined in the process of image captioning. Either a standard machine learning approach or a deep learning strategy can be used to achieve the goal. Identifying items and determining the relationships between them are essential for carrying out the planned job. Feature extraction is a technique for converting the image into a vector for further processing. The LSTM receives the items and visual material and connects the words to create a sentence that describes the objects. The implementation approach for Object Detection-based Picture Captioning using Deep Learning is presented in this work. we have employed the CIDEr metrics while evaluating; the accuracy of 94.8% is achieved for 30 epochs which is significantly good. For image classification, CNN and the Flickr dataset are utilized.