You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tika.apache.org by aravinth thangasami <ar...@gmail.com> on 2020/02/28 16:48:50 UTC

Identifying Document Containing Images

Hi all,

I am trying to identify whether the document contains an image embedded or
not using Tika.
Currently, we are using, EmbeddedDocumentExtractor  for identifying images.

Is there any other approach for identifying the images?
Any limitations in the current approach.
Please help me with this.

Thanks
Aravinth.

 class EmbeddedImageFinder implements EmbeddedDocumentExtractor
>     {
>         @Override
>         public boolean shouldParseEmbedded(Metadata metadata)
>         {
>             String mimeType = metadata.get("Content-Type");
>             if(mimeType.contains("image/"))
>             {
>                 containsImage = true;
>             }
>             return false; // Parsing is not necessary.
>         }
>         @Override
>         public void parseEmbedded(InputStream inputStream , ContentHandler
> contentHandler , Metadata metadata , boolean b) throws SAXException,
> IOException
>         {
>         }
>     }