ImageCaptioning

This repo contains an implementation of a ImageCaptioning model. It was implemented as a part of 4 ECTS course Deep Learning of the Data Science Bachelor at the FHNW.

Architecture

The architecture is basically as follows:

A pretrained CNN-model (e.g. ResNet50) is used to generate features from the images.
With the help of an embedding, the dimension is adapted to the vocab size and the embedding dimension is selected based on available computing resources. Technically, a higher dimension should be better but it takes longer to train and requires more resources.
This vector is then passed as the first hidden state in a LSTM.

Futher details

Please have a look at main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ImageCaptioning

Architecture

Futher details

Files

README.md

Latest commit

History

README.md

File metadata and controls

ImageCaptioning

Architecture

Futher details