You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2015/01/07 02:43:37 UTC
[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata
extraction to Tesseract parser
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267161#comment-14267161 ]
Tim Allison commented on TIKA-1445:
-----------------------------------
Looking into this a bit more...we aren't even getting metadata out of regular images, for example, our testJPEG.jpg from tika-parser's test-documents yields no useful metadata with trunk, it looks like this isn't even being touched by the TesseractOCRParser:
{noformat}
Content-Length: 7686
Content-Type: image/jpeg
X-Parsed-By: org.apache.tika.parser.DefaultParser
resourceName: testJPEG.jpg
{noformat}
Again, my apologies if I need to make modifications to our config...
> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
> Key: TIKA-1445
> URL: https://issues.apache.org/jira/browse/TIKA-1445
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.8
>
> Attachments: 000003.doc, TIKA-1445.Mattmann.101214.patch.txt, TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, consider how to add back in the metadata extraction capabilities by the other Image parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)