You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/12/19 06:11:13 UTC

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

    [ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252952#comment-14252952 ] 

Nick Burch commented on TIKA-1445:
----------------------------------

For 1.7, how about we just have the Tesseract Parser call out to the "normal" image parser (as appropriate), so that you always get both ocr and metadata? (Hopefully very quick to do)

Then for 1.8, we can implement the config as described above, without that blocking the 1.7 release

> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>
>         Attachments: TIKA-1445.Mattmann.101214.patch.txt, TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, consider how to add back in the metadata extraction capabilities by the other Image parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)