You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "LynX (Commented) (JIRA)" <ji...@apache.org> on 2011/12/17 21:47:31 UTC
[jira] [Commented] (PDFBOX-720) Inconsistency in parsing PDFs between Windows and Linux

    [ https://issues.apache.org/jira/browse/PDFBOX-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171668#comment-13171668 ] 

LynX commented on PDFBOX-720:
-----------------------------

Dear Adam,

Unfortunately I was not able to reproduce this problem on my Debian. I've tried different PDBbox distrs (1.1.0, 1.2.0, 1.6.0) with different JVMs (same Sun JDK 1.5.0_06 as your were using and OpenJDK 1.6). In all cases I received "Document outline was not null" message. As David stated before it is "dependant on the implementation of HashMap on the host system" so I guess my hosts system is not appropriate for this :). 
Are you still able to reproduce this problem on your system? If yes could you please try apply the patch from PDFBOX-569. I believe it may fix the problem.

Regards,
LX 

                
> Inconsistency in parsing PDFs between Windows and Linux
> -------------------------------------------------------
>
>                 Key: PDFBOX-720
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-720
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>         Environment: Windows Vista 32-bit, Sun JDK 1.5.0_06, PDFBox HEAD tag (revision 941073)
> vs.
> Red Hat Linux, 2.6.9-67.ELsmp kernel, Java 1.5.0_06, PDFBox HEAD tag (revision 941073)
>            Reporter: Adam Nichols
>         Attachments: 238_Page_Report.pdf
>
>
> Run this same code using the same PDF and you'll get different results on Linux than on Windows.  Regardless of which one you consider "correct", it should be consistent.
> doc = PDDocument.load(inputFile);
> PDDocumentOutline outline = doc.getDocumentCatalog().getDocumentOutline();
> if(outline == null)
>     System.out.println("Document outline was null");
> else
>     System.out.println("Document outline was not null");
> Some interesting notes about this PDF: Seems that Acrobat Distiller 8.1.0 basically just concatenated two PDFs into one.  There are two trailers, they both refer to object "1600 0" as the root.  1600 0 appears multiple times, one time it doesn't have "Outlines" in the dictionary, the other time it has "Outlines 1667 0".  Windows picks up the latter and shows the outline correctly.  Linux picks up the former and thus returns null for the outline.  I tried debugging through PDFParser and BaseParser, but I'm not really sure how that code works and I quickly got lost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira