You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jeremy Anderson (JIRA)" <ji...@apache.org> on 2014/04/30 02:00:20 UTC
[jira] [Commented] (TIKA-1268) Extract images from PDF documents
[ https://issues.apache.org/jira/browse/TIKA-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984984#comment-13984984 ]
Jeremy Anderson commented on TIKA-1268:
---------------------------------------
This fix will break when PDFBox 2.0.0 is released and upgraded to. I may add a new TIKA issue at some-point to reference a 2.0.0 upgrade, with a patch if I implement one rather than commenting out this code. (I'm currently building tika, pdfbox, and poi using daily snapshots.
See: PDFBOX-1893.
Essentially the org.apache.pdfbox.pdmodel.graphics.xobject package was removed and logic from its classes were refactored across various other classes. This TIKA fix heavily utilized classes from this package.
> Extract images from PDF documents
> ---------------------------------
>
> Key: TIKA-1268
> URL: https://issues.apache.org/jira/browse/TIKA-1268
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Fix For: 1.6
>
>
> It would be nice if images within PDF documents could be extracted much like embedded attachments are now being handled.
--
This message was sent by Atlassian JIRA
(v6.2#6252)