You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by giancarlo <gc...@gmail.com> on 2009/01/16 13:03:18 UTC

extracting indirect object stream...

I've been trying to extract an indirect object stream from a pdf file 
with lots of libraries with no success.
I've discovered pdfbox and it seems the best for my aim.
Here a snippet from my pdf file:

558 0 obj
<</Contents 583 0 R/CropBox[0 0 595.22 842]/MediaBox[0 0 595.22 
842]/Parent 29 0 R/Resources
  <</ColorSpace <</CS0 563 0 R>>
    /ExtGState <</GS0 568 0 R>>
    /Font<</TT0 559 0 R/TT1 560 0 R/TT2 561 0 R/TT3 562 0 R>>
    /ProcSet[/PDF/Text/ImageC]
    /Properties<</MC0<</*MYKEY* 584 0 R>>/MC1<</SubKey 582 0 R>> >>
    /XObject<</Im0 578 0 R>>>>
  /Rotate 0/StructParents 0/Type/Page>>
endobj

...
...

584 0 obj
<</Length 8>>stream

1_22_4_1    ->>> I NEED THIS!

endstream

so, I have to extract the string contained by MYKEY indirect object. How 
can I do that?
Ok, I can get it working on python, for example using the pypdf library, 
but I got stuck when I tryed to decrypt pdf files!
The only ones I can decrypt on python are the files encrypted by oldest 
(very very very old) versions of acrobat.

Can anybody help me?
Thanks in advance
Giancarlo F.

p.s.: sorry for my bad english