You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Jeremy Anderson (JIRA)" <ji...@apache.org> on 2014/04/30 02:00:20 UTC

[jira] [Commented] (TIKA-1268) Extract images from PDF documents

    [ https://issues.apache.org/jira/browse/TIKA-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984984#comment-13984984 ] 

Jeremy Anderson commented on TIKA-1268:
---------------------------------------

This fix will break when PDFBox 2.0.0 is released and upgraded to.  I may add a new TIKA issue at some-point to reference a 2.0.0 upgrade, with a patch if I implement one rather than commenting out this code.  (I'm currently building tika, pdfbox, and poi using daily snapshots.

See: PDFBOX-1893.

Essentially the org.apache.pdfbox.pdmodel.graphics.xobject package was removed and logic from its classes were refactored across various other classes.  This TIKA fix heavily utilized classes from this package.

> Extract images from PDF documents
> ---------------------------------
>
>                 Key: TIKA-1268
>                 URL: https://issues.apache.org/jira/browse/TIKA-1268
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 1.6
>
>
> It would be nice if images within PDF documents could be extracted much like embedded attachments are now being handled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)