You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by aravinth thangasami <ar...@gmail.com> on 2020/02/28 16:48:50 UTC
Identifying Document Containing Images
Hi all,
I am trying to identify whether the document contains an image embedded or
not using Tika.
Currently, we are using, EmbeddedDocumentExtractor for identifying images.
Is there any other approach for identifying the images?
Any limitations in the current approach.
Please help me with this.
Thanks
Aravinth.
class EmbeddedImageFinder implements EmbeddedDocumentExtractor
> {
> @Override
> public boolean shouldParseEmbedded(Metadata metadata)
> {
> String mimeType = metadata.get("Content-Type");
> if(mimeType.contains("image/"))
> {
> containsImage = true;
> }
> return false; // Parsing is not necessary.
> }
> @Override
> public void parseEmbedded(InputStream inputStream , ContentHandler
> contentHandler , Metadata metadata , boolean b) throws SAXException,
> IOException
> {
> }
> }