You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Christopher Begley <ch...@outlook.com> on 2016/10/28 18:12:11 UTC

cosDocument.getXrefTable() returns negative offsets

When I retrieve the xreftable from a pdf with cross-reference-streams (as opposed to a cross reference table) some of the objects in the returned table map have a negative offset.  I've looked in the documentation (both pdf reference & pdfbox docs) and haven't found anything referring to this.

For example:

OBJ_NUMBER , OFFSET
49,12769
50,25217
51,25502
52,26034
53,116
54,-36
55,-36
56,-36

When looking at the PDF in a hex editor, the objects and their offsets are correct when they are positive. But, what do I need to do to resolve object 54? How can I retrieve the byte offset?

Thanks in advance for your time.


Re: cosDocument.getXrefTable() returns negative offsets

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 01.11.2016 um 15:58 schrieb christopher.begley@outlook.com:
>
> On 2016-10-31 12:44 (-0400), Andreas Lehmkuehler <an...@lehmi.de> wrote:
>> Am 31.10.2016 um 16:10 schrieb christopher.begley@outlook.com:
>>>
>>>>> OBJ_NUMBER , OFFSET
>>>>> 49,12769
>>>>> 50,25217
>>>>> 51,25502
>>>>> 52,26034
>>>>> 53,116
>>>>> 54,-36
>>>>> 55,-36
>>>>> 56,-36
>>>
>>>> A negative offset indicates that the given object is part of a compressed object
>>>> stream. In your case that stream is object 36 0.
>>>>
>>>>> Thanks in advance for your time.
>>>>
>>>> BR
>>>> Andreas
>>>
>>> Thanks Andreas. Where can I find more documentation on compressed objects with respect to negative offsets?
>> PDF spec, chapter "7.5.7 Object Streams"
>>
>> BR
>> Andreas
>>
>> Thanks. I read that section. I also read about cross-reference streams. Nowhere in the entire PDF Spec does it mention negative offsets. I'm relatively new to pdf parsing and where I'm stumped is I don't know exactly how to handle these negative offsets.
>
>>>>> OBJ_NUMBER , OFFSET
>>>>> 49,12769
>>>>> 50,25217
>>>>> 51,25502
>>>>> 52,26034
>>>>> 53,116
>>>>> 54,-36
>>>>> 55,-36
>>>>> 56,-36
>
> Let's take Object Number 54 for example. In your answer you stated that this would be Object [36,0] and that it is compressed. How did you know it's object 36,0? Where did  you find this information?How - using the PDFBox API, would I retrieve this object and decode it? I wish I could find documentation (somewhere) on how to handle this scenario - maybe I'm missing something or not searching for the right keywords.
>
> I appreciate your patience in helping me out. I'm more than willing to read/research anything necessary, but I just got through reading 3 sections of the PDF Reference and while it discussed at length the structure of objects, object streams, cross reference streams , it did not help me in dealing with how to handle/ locate / parse objects with a negative offset.
OK, I understand your confusion. There is no negative offset within the spec. We 
are using negative values to distinguish "direct" xref entries from those which 
point to a compressed object stream.

Have a look at COSParser#parseObjectStream to see how PDFBox handles those 
object streams.

BR
Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: cosDocument.getXrefTable() returns negative offsets

Posted by "christopher.begley@outlook.com" <ch...@outlook.com>.
On 2016-10-31 12:44 (-0400), Andreas Lehmkuehler <an...@lehmi.de> wrote: 
> Am 31.10.2016 um 16:10 schrieb christopher.begley@outlook.com:
> >
> >>> OBJ_NUMBER , OFFSET
> >>> 49,12769
> >>> 50,25217
> >>> 51,25502
> >>> 52,26034
> >>> 53,116
> >>> 54,-36
> >>> 55,-36
> >>> 56,-36
> >
> >> A negative offset indicates that the given object is part of a compressed object
> >> stream. In your case that stream is object 36 0.
> >>
> >>> Thanks in advance for your time.
> >>
> >> BR
> >> Andreas
> >
> > Thanks Andreas. Where can I find more documentation on compressed objects with respect to negative offsets?
> PDF spec, chapter "7.5.7 Object Streams"
> 
> BR
> Andreas
> 
> Thanks. I read that section. I also read about cross-reference streams. Nowhere in the entire PDF Spec does it mention negative offsets. I'm relatively new to pdf parsing and where I'm stumped is I don't know exactly how to handle these negative offsets.  

> >>> OBJ_NUMBER , OFFSET
> >>> 49,12769
> >>> 50,25217
> >>> 51,25502
> >>> 52,26034
> >>> 53,116
> >>> 54,-36
> >>> 55,-36
> >>> 56,-36

Let's take Object Number 54 for example. In your answer you stated that this would be Object [36,0] and that it is compressed. How did you know it's object 36,0? Where did  you find this information?How - using the PDFBox API, would I retrieve this object and decode it? I wish I could find documentation (somewhere) on how to handle this scenario - maybe I'm missing something or not searching for the right keywords.

I appreciate your patience in helping me out. I'm more than willing to read/research anything necessary, but I just got through reading 3 sections of the PDF Reference and while it discussed at length the structure of objects, object streams, cross reference streams , it did not help me in dealing with how to handle/ locate / parse objects with a negative offset.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: cosDocument.getXrefTable() returns negative offsets

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 31.10.2016 um 16:10 schrieb christopher.begley@outlook.com:
>
>>> OBJ_NUMBER , OFFSET
>>> 49,12769
>>> 50,25217
>>> 51,25502
>>> 52,26034
>>> 53,116
>>> 54,-36
>>> 55,-36
>>> 56,-36
>
>> A negative offset indicates that the given object is part of a compressed object
>> stream. In your case that stream is object 36 0.
>>
>>> Thanks in advance for your time.
>>
>> BR
>> Andreas
>
> Thanks Andreas. Where can I find more documentation on compressed objects with respect to negative offsets?
PDF spec, chapter "7.5.7 Object Streams"

BR
Andreas



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: cosDocument.getXrefTable() returns negative offsets

Posted by "christopher.begley@outlook.com" <ch...@outlook.com>.
> > OBJ_NUMBER , OFFSET
> > 49,12769
> > 50,25217
> > 51,25502
> > 52,26034
> > 53,116
> > 54,-36
> > 55,-36
> > 56,-36

> A negative offset indicates that the given object is part of a compressed object 
> stream. In your case that stream is object 36 0.
> 
> > Thanks in advance for your time.
> 
> BR
> Andreas

Thanks Andreas. Where can I find more documentation on compressed objects with respect to negative offsets?





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: cosDocument.getXrefTable() returns negative offsets

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 28.10.2016 um 20:12 schrieb Christopher Begley:
> When I retrieve the xreftable from a pdf with cross-reference-streams (as opposed to a cross reference table) some of the objects in the returned table map have a negative offset.  I've looked in the documentation (both pdf reference & pdfbox docs) and haven't found anything referring to this.
>
> For example:
>
> OBJ_NUMBER , OFFSET
> 49,12769
> 50,25217
> 51,25502
> 52,26034
> 53,116
> 54,-36
> 55,-36
> 56,-36
>
> When looking at the PDF in a hex editor, the objects and their offsets are correct when they are positive. But, what do I need to do to resolve object 54? How can I retrieve the byte offset?
A negative offset indicates that the given object is part of a compressed object 
stream. In your case that stream is object 36 0.

> Thanks in advance for your time.

BR
Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org