You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Tilman Hausherr <TH...@t-online.de> on 2014/07/31 23:49:59 UTC

Count in xref table is 0

Hi,

this is a malformed PDF. If you get a correct text extraction, then 
don't bother.

Btw it is better to use loadNonSeq(file, null) instead of load(file). An 
even better strategy is to use loadNonSeq() and then load() in the 
exception catch.

Tilman

Am 31.07.2014 22:46, schrieb -A:
> Hello;
>
> I am just going to jump in and ask about the following warning when 
> used with the default PDFTextStripper class:
>
> WARNING: Count in xref table is 0 at offset 96825
>
> Attached is the causing document. I thought it may have to do with the 
> Properties file that Tillman Hausherr pointed out to me, but didn't.
>
> This isn't a big issue as the program still functions, but if I could 
> get rid of the warning so I don't have to look at it - more the merrier!
>
> Also getting to the PDF spec. If there is anything I could assist with 
> if the properties file  becomes an active issue (even just testing), 
> let me know.
>
>


Re: Count in xref table is 0

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 02.08.2014 22:36, schrieb -A:
> Tilman,
>
> Ok thank you! Glad to hear it isn't anything on my end.
>
> I will take note of the preferred loading mechanism and swap out my old
> calls.
>
> What is the reasoning behind that, or is load just deprecated?

It isn't deprecated, but it should be :-)  loadNonSeq() is the more 
correct PDF parser, but load() will sometimes get better results with 
malformed PDFs.

Tilman

>
>
> Thanks!
>
> -Aaron
>
>
> On Thu, Jul 31, 2014 at 3:49 PM, Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> Hi,
>>
>> this is a malformed PDF. If you get a correct text extraction, then don't
>> bother.
>>
>> Btw it is better to use loadNonSeq(file, null) instead of load(file). An
>> even better strategy is to use loadNonSeq() and then load() in the
>> exception catch.
>>
>> Tilman
>>
>> Am 31.07.2014 22:46, schrieb -A:
>>
>>> Hello;
>>>
>>> I am just going to jump in and ask about the following warning when used
>>> with the default PDFTextStripper class:
>>>
>>> WARNING: Count in xref table is 0 at offset 96825
>>>
>>> Attached is the causing document. I thought it may have to do with the
>>> Properties file that Tillman Hausherr pointed out to me, but didn't.
>>>
>>> This isn't a big issue as the program still functions, but if I could get
>>> rid of the warning so I don't have to look at it - more the merrier!
>>>
>>> Also getting to the PDF spec. If there is anything I could assist with if
>>> the properties file  becomes an active issue (even just testing), let me
>>> know.
>>>
>>>
>>>


Re: Count in xref table is 0

Posted by -A <aa...@hrtmn.net>.
Tilman,

Ok thank you! Glad to hear it isn't anything on my end.

I will take note of the preferred loading mechanism and swap out my old
calls.

What is the reasoning behind that, or is load just deprecated?


Thanks!

-Aaron


On Thu, Jul 31, 2014 at 3:49 PM, Tilman Hausherr <TH...@t-online.de>
wrote:

> Hi,
>
> this is a malformed PDF. If you get a correct text extraction, then don't
> bother.
>
> Btw it is better to use loadNonSeq(file, null) instead of load(file). An
> even better strategy is to use loadNonSeq() and then load() in the
> exception catch.
>
> Tilman
>
> Am 31.07.2014 22:46, schrieb -A:
>
>> Hello;
>>
>> I am just going to jump in and ask about the following warning when used
>> with the default PDFTextStripper class:
>>
>> WARNING: Count in xref table is 0 at offset 96825
>>
>> Attached is the causing document. I thought it may have to do with the
>> Properties file that Tillman Hausherr pointed out to me, but didn't.
>>
>> This isn't a big issue as the program still functions, but if I could get
>> rid of the warning so I don't have to look at it - more the merrier!
>>
>> Also getting to the PDF spec. If there is anything I could assist with if
>> the properties file  becomes an active issue (even just testing), let me
>> know.
>>
>>
>>
>