You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Andreas Lehmkuehler <an...@lehmi.de> on 2014/07/30 20:59:57 UTC
Broken XRef-links, looking for some sample pdfs
Hi,
I'm working on an advanced self healing mechanism for wrong xref offset values.
I thought that I've enough sample pdfs but I can't find any.
Can anybody give me a pointer where to find some?
Thanks in advance!
BR
Andreas Lehmkühler
Re: Broken XRef-links, looking for some sample pdfs
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi Tim,
I'll check those pdfs too. Thanks!
Andreas
Am 31.07.2014 14:59, schrieb Allison, Timothy B.:
> Hi Andreas,
>
> Not sure if these types of xref issues are what you mean, but this is what we get on the Tika test PDFS (available here: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/):
>
>
> Now testing: testComment.pdf
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
> Now testing: testOptionalHyphen.pdf
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
> Now testing: testPageNumber.pdf
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
> Now testing: testPDFTwoTextBoxes.pdf
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
> Now testing: testPDFVarious.pdf
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
> WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
> Now testing: testPDF_acroform3.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 26441
> Now testing: testPDF_childAttachments.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 2314576
> Now testing: testPDF_protected.pdf
> INFO [main] (PDFParser.java:248) - Document is encrypted
> Now testing: testPDF_twoAuthors.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 12324
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5969
> Now testing: testPDF_Version.10.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5500
> Now testing: testPDF_Version.6.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
> Now testing: testPDF_Version.7.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
> Now testing: testPDF_Version.8.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
> Now testing: testPDF_Version.9.x.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5687
> Now testing: testPopupAnnotation.pdf
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 8777
>
> -----Original Message-----
> From: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
> Sent: Wednesday, July 30, 2014 3:54 PM
> To: dev@pdfbox.apache.org
> Subject: Re: Broken XRef-links, looking for some sample pdfs
>
> Thanks Tilman for the fast response and of course the pointers!
>
> Andreas
>
> Am 30.07.2014 21:14, schrieb Tilman Hausherr:
>> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
>>
>> file 24, 024064.pdf
>> file 26, 026779.pdf
>> file 27, 027266.pdf, 027613.pdf
>> file 28, 048872.pdf
>> file 59, 059849.pdf
>>
>> Additionally, there are the JIRA issues opened by William Palmer; and Tim
>> Allison had a long test once with a csv result file that had offset problems.
>> Don't remember the jira issue.
>>
>> Tilman
>>
>> Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
>>> Hi,
>>>
>>> I'm working on an advanced self healing mechanism for wrong xref offset
>>> values. I thought that I've enough sample pdfs but I can't find any.
>>>
>>> Can anybody give me a pointer where to find some?
>>>
>>> Thanks in advance!
>>>
>>> BR
>>> Andreas Lehmkühler
>>
>
RE: Broken XRef-links, looking for some sample pdfs
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Hi Andreas,
Not sure if these types of xref issues are what you mean, but this is what we get on the Tika test PDFS (available here: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/):
Now testing: testComment.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
Now testing: testOptionalHyphen.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
Now testing: testPageNumber.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
Now testing: testPDFTwoTextBoxes.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
Now testing: testPDFVarious.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
Now testing: testPDF_acroform3.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 26441
Now testing: testPDF_childAttachments.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 2314576
Now testing: testPDF_protected.pdf
INFO [main] (PDFParser.java:248) - Document is encrypted
Now testing: testPDF_twoAuthors.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 12324
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5969
Now testing: testPDF_Version.10.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5500
Now testing: testPDF_Version.6.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
Now testing: testPDF_Version.7.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
Now testing: testPDF_Version.8.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
Now testing: testPDF_Version.9.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5687
Now testing: testPopupAnnotation.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 8777
-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
Sent: Wednesday, July 30, 2014 3:54 PM
To: dev@pdfbox.apache.org
Subject: Re: Broken XRef-links, looking for some sample pdfs
Thanks Tilman for the fast response and of course the pointers!
Andreas
Am 30.07.2014 21:14, schrieb Tilman Hausherr:
> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
>
> file 24, 024064.pdf
> file 26, 026779.pdf
> file 27, 027266.pdf, 027613.pdf
> file 28, 048872.pdf
> file 59, 059849.pdf
>
> Additionally, there are the JIRA issues opened by William Palmer; and Tim
> Allison had a long test once with a csv result file that had offset problems.
> Don't remember the jira issue.
>
> Tilman
>
> Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
>> Hi,
>>
>> I'm working on an advanced self healing mechanism for wrong xref offset
>> values. I thought that I've enough sample pdfs but I can't find any.
>>
>> Can anybody give me a pointer where to find some?
>>
>> Thanks in advance!
>>
>> BR
>> Andreas Lehmkühler
>
Re: Broken XRef-links, looking for some sample pdfs
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Thanks Tilman for the fast response and of course the pointers!
Andreas
Am 30.07.2014 21:14, schrieb Tilman Hausherr:
> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
>
> file 24, 024064.pdf
> file 26, 026779.pdf
> file 27, 027266.pdf, 027613.pdf
> file 28, 048872.pdf
> file 59, 059849.pdf
>
> Additionally, there are the JIRA issues opened by William Palmer; and Tim
> Allison had a long test once with a csv result file that had offset problems.
> Don't remember the jira issue.
>
> Tilman
>
> Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
>> Hi,
>>
>> I'm working on an advanced self healing mechanism for wrong xref offset
>> values. I thought that I've enough sample pdfs but I can't find any.
>>
>> Can anybody give me a pointer where to find some?
>>
>> Thanks in advance!
>>
>> BR
>> Andreas Lehmkühler
>
Re: Broken XRef-links, looking for some sample pdfs
Posted by Tilman Hausherr <TH...@t-online.de>.
http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
file 24, 024064.pdf
file 26, 026779.pdf
file 27, 027266.pdf, 027613.pdf
file 28, 048872.pdf
file 59, 059849.pdf
Additionally, there are the JIRA issues opened by William Palmer; and
Tim Allison had a long test once with a csv result file that had offset
problems. Don't remember the jira issue.
Tilman
Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
> Hi,
>
> I'm working on an advanced self healing mechanism for wrong xref
> offset values. I thought that I've enough sample pdfs but I can't find
> any.
>
> Can anybody give me a pointer where to find some?
>
> Thanks in advance!
>
> BR
> Andreas Lehmkühler