You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2009/11/09 09:26:32 UTC

[jira] Resolved: (JCR-2388) Upgrade PDFBox to version 0.8.0

     [ https://issues.apache.org/jira/browse/JCR-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger resolved JCR-2388.
-----------------------------------

       Resolution: Invalid
    Fix Version/s:     (was: 2.0-beta2)

As of Jackrabbit 2.0 the module jackrabbit-text-extractors has been replaced with a dependency to Apache Tika 0.4, which includes PDFBox 0.7.3

If you are using Jackrabbit 1.x then I suggest you write your own text extractor that uses PDFBox 0.8.0 and configure it accordingly in the workspace.xml.

For Jackrabbit 2.0 we'd have to wait for Tika 0.5, which will include PDFBox 0.8.0 (http://issues.apache.org/jira/browse/TIKA-158)

> Upgrade PDFBox to version 0.8.0
> -------------------------------
>
>                 Key: JCR-2388
>                 URL: https://issues.apache.org/jira/browse/JCR-2388
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-text-extractors
>    Affects Versions: 2.0-beta1
>            Reporter: William Woodward
>
> The most recent version of PDFBox fixes a bug in their PDFParser class that caused a null pointer when attempting to extract text from documents created w/ Acrobat Pro version 9. see: https://issues.apache.org/jira/browse/PDFBOX-361. Since this is the first Apache incubator release they have also changed the package names. Therefore, simply getting the new PDFBox in not an option because the Jackrabbit text extractor references the old package names.
> This is a MAJOR problem for us since our user community recently updated to Acrobat 9 (and we have no control over this decision). Our users produce time sensitive reports. Without an updated Jackrabbit (w/ updated PDFBox) we can no longer extract and index text from the user's PDFs.
> Thank you for your consideration in this matter,
> Bill Woodward
> Developer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.