You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/11/25 16:32:12 UTC

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

     [ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison updated TIKA-1442:
------------------------------
    Attachment: PDFBox_1_8_6VPDFBox_1_8_8-b145.xlsx

This is a comparison of PDFBox 1.8.6 and PDFBox 1.8.8-SNAPSHOT build 145.  This was run via Tika 1.7-SNAPSHOT which uses as default the classic parser.  I'll post a comparison file of 1.8.8-SNAPSHOT-145 with classic vs. nonSeq shortly.

It looks like there are only a few regressions, and many improvements.

> Upgrade to PDFBox 1.8.8
> -----------------------
>
>                 Key: TIKA-1442
>                 URL: https://issues.apache.org/jira/browse/TIKA-1442
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 1.8
>
>         Attachments: PDFBox_1_8_6VPDFBox_1_8_8-b145.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTb.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTc.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip
>
>
> Given the regressions we identified in PDFBox 1.8.7, we should upgrade to 1.8.8 as soon as it is ready.  I'm tempted to call this a blocker on Tika 1.7.  Let's use this issue to carry on the discussion of regression testing (if any further discussion is necessary) or any other prep that needs to happen before 1.8.8's release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)