You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Andreas Lehmkuehler <an...@lehmi.de> on 2014/07/30 20:59:57 UTC

Broken XRef-links, looking for some sample pdfs

Hi,

I'm working on an advanced self healing mechanism for wrong xref offset values. 
I thought that I've enough sample pdfs but I can't find any.

Can anybody give me a pointer where to find some?

Thanks in advance!

BR
Andreas Lehmkühler

Re: Broken XRef-links, looking for some sample pdfs

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi Tim,

I'll check those pdfs too. Thanks!

Andreas

Am 31.07.2014 14:59, schrieb Allison, Timothy B.:
> Hi Andreas,
>
> Not sure if these types of xref issues are what you mean, but this is what we get on the Tika test PDFS (available here: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/):
>
>
> Now testing: testComment.pdf
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
> Now testing: testOptionalHyphen.pdf
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
> Now testing: testPageNumber.pdf
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
> Now testing: testPDFTwoTextBoxes.pdf
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
> Now testing: testPDFVarious.pdf
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
>   WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
> Now testing: testPDF_acroform3.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 26441
> Now testing: testPDF_childAttachments.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 2314576
> Now testing: testPDF_protected.pdf
>   INFO [main] (PDFParser.java:248) - Document is encrypted
> Now testing: testPDF_twoAuthors.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 12324
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5969
> Now testing: testPDF_Version.10.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5500
> Now testing: testPDF_Version.6.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
> Now testing: testPDF_Version.7.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
> Now testing: testPDF_Version.8.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
> Now testing: testPDF_Version.9.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5687
> Now testing: testPopupAnnotation.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 8777
>
> -----Original Message-----
> From: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
> Sent: Wednesday, July 30, 2014 3:54 PM
> To: dev@pdfbox.apache.org
> Subject: Re: Broken XRef-links, looking for some sample pdfs
>
> Thanks Tilman for the fast response and of course the pointers!
>
> Andreas
>
> Am 30.07.2014 21:14, schrieb Tilman Hausherr:
>> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
>>
>> file 24, 024064.pdf
>> file 26, 026779.pdf
>> file 27, 027266.pdf, 027613.pdf
>> file 28,  048872.pdf
>> file 59, 059849.pdf
>>
>> Additionally, there are the JIRA issues opened by William Palmer; and Tim
>> Allison had a long test once with a csv result file that had offset problems.
>> Don't remember the jira issue.
>>
>> Tilman
>>
>> Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
>>> Hi,
>>>
>>> I'm working on an advanced self healing mechanism for wrong xref offset
>>> values. I thought that I've enough sample pdfs but I can't find any.
>>>
>>> Can anybody give me a pointer where to find some?
>>>
>>> Thanks in advance!
>>>
>>> BR
>>> Andreas Lehmkühler
>>
>


RE: Broken XRef-links, looking for some sample pdfs

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Hi Andreas,

Not sure if these types of xref issues are what you mean, but this is what we get on the Tika test PDFS (available here: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/):


Now testing: testComment.pdf
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
Now testing: testOptionalHyphen.pdf
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
Now testing: testPageNumber.pdf
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
Now testing: testPDFTwoTextBoxes.pdf
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
Now testing: testPDFVarious.pdf
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
Now testing: testPDF_acroform3.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 26441
Now testing: testPDF_childAttachments.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 2314576
Now testing: testPDF_protected.pdf
 INFO [main] (PDFParser.java:248) - Document is encrypted
Now testing: testPDF_twoAuthors.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 12324
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5969
Now testing: testPDF_Version.10.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5500
Now testing: testPDF_Version.6.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
Now testing: testPDF_Version.7.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
Now testing: testPDF_Version.8.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
Now testing: testPDF_Version.9.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5687
Now testing: testPopupAnnotation.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 8777

-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de] 
Sent: Wednesday, July 30, 2014 3:54 PM
To: dev@pdfbox.apache.org
Subject: Re: Broken XRef-links, looking for some sample pdfs

Thanks Tilman for the fast response and of course the pointers!

Andreas

Am 30.07.2014 21:14, schrieb Tilman Hausherr:
> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
>
> file 24, 024064.pdf
> file 26, 026779.pdf
> file 27, 027266.pdf, 027613.pdf
> file 28,  048872.pdf
> file 59, 059849.pdf
>
> Additionally, there are the JIRA issues opened by William Palmer; and Tim
> Allison had a long test once with a csv result file that had offset problems.
> Don't remember the jira issue.
>
> Tilman
>
> Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
>> Hi,
>>
>> I'm working on an advanced self healing mechanism for wrong xref offset
>> values. I thought that I've enough sample pdfs but I can't find any.
>>
>> Can anybody give me a pointer where to find some?
>>
>> Thanks in advance!
>>
>> BR
>> Andreas Lehmkühler
>


Re: Broken XRef-links, looking for some sample pdfs

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Thanks Tilman for the fast response and of course the pointers!

Andreas

Am 30.07.2014 21:14, schrieb Tilman Hausherr:
> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
>
> file 24, 024064.pdf
> file 26, 026779.pdf
> file 27, 027266.pdf, 027613.pdf
> file 28,  048872.pdf
> file 59, 059849.pdf
>
> Additionally, there are the JIRA issues opened by William Palmer; and Tim
> Allison had a long test once with a csv result file that had offset problems.
> Don't remember the jira issue.
>
> Tilman
>
> Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
>> Hi,
>>
>> I'm working on an advanced self healing mechanism for wrong xref offset
>> values. I thought that I've enough sample pdfs but I can't find any.
>>
>> Can anybody give me a pointer where to find some?
>>
>> Thanks in advance!
>>
>> BR
>> Andreas Lehmkühler
>


Re: Broken XRef-links, looking for some sample pdfs

Posted by Tilman Hausherr <TH...@t-online.de>.
http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/

file 24, 024064.pdf
file 26, 026779.pdf
file 27, 027266.pdf, 027613.pdf
file 28,  048872.pdf
file 59, 059849.pdf

Additionally, there are the JIRA issues opened by William Palmer; and 
Tim Allison had a long test once with a csv result file that had offset 
problems. Don't remember the jira issue.

Tilman

Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
> Hi,
>
> I'm working on an advanced self healing mechanism for wrong xref 
> offset values. I thought that I've enough sample pdfs but I can't find 
> any.
>
> Can anybody give me a pointer where to find some?
>
> Thanks in advance!
>
> BR
> Andreas Lehmkühler