You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2015/10/05 15:15:27 UTC

[jira] [Comment Edited] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

    [ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943339#comment-14943339 ] 

Tim Allison edited comment on TIKA-1285 at 10/5/15 1:14 PM:
------------------------------------------------------------

Thank you, [~ben@benmccann.com]!  The more eyes we have on this the better for both projects.

Updated working wrapper is available [here|https://github.com/tballison/tika/tree/pdfbox2_0].  Some clean up remains...

[~arkadyzalko] and [~jayesh_ag], would you be willing to run this on your batches of docs and let us know what you find?  Extra points if you can compare  memory usage and time to parse vs. 1.8.10! :)

Also extra points for running this with the extract embedded images parameter turned on.


was (Author: tallison@mitre.org):
Thank you, [~ben@benmccann.com]!  The more eyes we have on this the better for both projects.

Updated working wrapper is available [here|https://github.com/tballison/tika/tree/pdfbox2_0].  Some clean up remains...

[~arkadyzalko], would you be willing to run this on your batch of docs and let us know what you find?

> Upgrade to PDFBox 2.0.0 when available
> --------------------------------------
>
>                 Key: TIKA-1285
>                 URL: https://issues.apache.org/jira/browse/TIKA-1285
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Jeremy Anderson
>            Priority: Minor
>         Attachments: TIKA-1285.patch, TIKA-1285_rev1641423.patch, TIKA-1285v3.patch, pdfbox_reports_2_0_0_20150709.zip, testPDF_childAttachments.pdf
>
>
> This issue is to track fixes required when upgrading the PDFbox dependency to 2.0.0 Final once it's available, and using PDFBox's daily build before then.
> See TIKA-1268 comment.
> Relates to PDFBOX-1893



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)