You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Malcolm Vincent <ma...@gmail.com> on 2017/11/09 14:05:26 UTC

Dictionary Issue

Hi,

After more testing I can confirm the issue occurs when PDFBox is
parsing a stream where the token splits across this stream and the
next one is the problem.

i.e. the whole token does not occur in the stream being parsed

Perhaps there is a way to get all the tokens in the page content and
PDFBox reads the streams as necessary rather than using the individual
streams the way I am doing at the minute.

In this excerpt you can clearly see where the COSDictionary is split
across the stream boundary

/Span <</Lang (en-GB)/MCID 8 >>BDC
BT
9 0 0 9 99.3376 555.6879 Tm
(Some text)Tj
ET
EMC
/Span <</Lang
endstream
endobj
19 0 obj
<<
/Length 2852
>>
stream
(en-GB)/MCID 9 >>BDC
BT
9 0 0 9 145.7323 555.6879 Tm
(Some more text)Tj
ET
EMC



Best Wishes,
Malcolm.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org