You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Thamme Gowda (JIRA)" <ji...@apache.org> on 2017/02/11 21:48:42 UTC

[jira] [Updated] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

     [ https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thamme Gowda updated TIKA-2262:
-------------------------------
    Description: 
h2. Background:
Image captions are a small piece of text, usually of one line, added to the metadata of images to provide a brief summary of the scenery in the image. 
It is a challenging and interesting problem in the domain of computer vision. Tika already has a support for image recognition via [Object Recognition Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
Captioning an image is a very useful feature since it helps text based Information Retrieval(IR) systems to "understand" the scenery in images.

h2. Technical details and references:
* Google has long back open sourced their 'show and tell' neural network and its model for autogenerating captions. [Source Code| https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
* Integrate it the same way as the ObjectRecognitionParser
** Create a RESTful API Service [similar to this| https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] 
** Extend or enhance ObjectRecognitionParser or one of its implementation

h2. {skills, learning, homework} for GSoC students
* Knowledge of languages: java AND python, and maven build system
* RESTful APIs 
* tensorflow/keras,
* deeplearning

----

Alternatively, a little more harder path for experienced:
[Import keras/tensorflow model to deeplearning4j|https://deeplearning4j.org/model-import-keras ] and run them natively inside JVM.

h4. Benefits
* no RESTful integration required. thus no external dependencies
* easy to distribute on hadoop/spark clusters

h4. Hurdles:
* This is a work in progress feature on deeplearning4j and hence expected to have lots of troubles on the way! 





  was:
h2. Background:
Image captions are a small piece of text, usually of one line, added to the metadata of images to provide a brief summary of the scenery in the image. 
It is a challenging and interesting problem in the domain of computer vision. Tika already has a support for image recognition via [Object Recognition Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
Captioning an image is a very useful feature since it helps text based Information Retrieval(IR) systems to "understand" the scenery in images.

h2. Technical details and references:
* Google has long back open sourced their 'show and tell' neural network and its model for autogenerating captions. [Source Code| https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
* Integrate it the same way as the ObjectRecognitionParser
** Create a RESTful API Service [similar to this| https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] 
** Extend or enhance ObjectRecognitionParser or one of its implementation

h2. {skills, learning, homework} for GSoC students
* Knowledge of languages: java AND python, and maven build system
* RESTful APIs 
* tensorflow/keras,
* deeplearning


> Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
> ------------------------------------------------------------------------
>
>                 Key: TIKA-2262
>                 URL: https://issues.apache.org/jira/browse/TIKA-2262
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Thamme Gowda
>              Labels: deeplearning, gsoc2017, machine_learning
>
> h2. Background:
> Image captions are a small piece of text, usually of one line, added to the metadata of images to provide a brief summary of the scenery in the image. 
> It is a challenging and interesting problem in the domain of computer vision. Tika already has a support for image recognition via [Object Recognition Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
> Captioning an image is a very useful feature since it helps text based Information Retrieval(IR) systems to "understand" the scenery in images.
> h2. Technical details and references:
> * Google has long back open sourced their 'show and tell' neural network and its model for autogenerating captions. [Source Code| https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
> * Integrate it the same way as the ObjectRecognitionParser
> ** Create a RESTful API Service [similar to this| https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] 
> ** Extend or enhance ObjectRecognitionParser or one of its implementation
> h2. {skills, learning, homework} for GSoC students
> * Knowledge of languages: java AND python, and maven build system
> * RESTful APIs 
> * tensorflow/keras,
> * deeplearning
> ----
> Alternatively, a little more harder path for experienced:
> [Import keras/tensorflow model to deeplearning4j|https://deeplearning4j.org/model-import-keras ] and run them natively inside JVM.
> h4. Benefits
> * no RESTful integration required. thus no external dependencies
> * easy to distribute on hadoop/spark clusters
> h4. Hurdles:
> * This is a work in progress feature on deeplearning4j and hence expected to have lots of troubles on the way! 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)