You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jeremy Anderson (JIRA)" <ji...@apache.org> on 2014/09/04 23:23:24 UTC

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

    [ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121991#comment-14121991 ] 

Jeremy Anderson commented on TIKA-1285:
---------------------------------------

Updated patch to include fixes as of revision 1621674 on Sept 4th.  Major fixes include syncing up to Snapshot of PDFBox post Jempbox replacement by XmpBox.

XmpBox still requires some refinement to properly handle all of the XMP packages encountered by Tika's unit tests.  Some of these cases have been commented out until DomXmpParser can resolve them.

Issues are not yet reported in JIRA for PDFBOX as I'm not familiar on how to proceed for them.  The common Dom Xmp Parser issues encountered:
* Invalid array definition, expecting Alt and found nothing [prefix=dc; name=title]
* Invalid array type, expecting Seq and found Bag [prefix=dc; name=creator]
* No type defined for {http://ns.adobe.com/pdf/1.3/}Trapped
* Cannot find a definition for the namespace http://ns.adobe.com/pdfx/1.3/
* xmp should start with a processing instruction


Patch works in conjunction with PDFBOX-2318


> Upgrade to PDFBox 2.0.0 when available
> --------------------------------------
>
>                 Key: TIKA-1285
>                 URL: https://issues.apache.org/jira/browse/TIKA-1285
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Jeremy Anderson
>            Priority: Minor
>         Attachments: TIKA-1285.patch
>
>
> This issue is to track fixes required when upgrading the PDFbox dependency to 2.0.0 Final once it's available, and using PDFBox's daily build before then.
> See TIKA-1268 comment.
> Relates to PDFBOX-1893



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)