You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by giancarlo <gc...@gmail.com> on 2009/01/16 13:03:18 UTC
extracting indirect object stream...
I've been trying to extract an indirect object stream from a pdf file
with lots of libraries with no success.
I've discovered pdfbox and it seems the best for my aim.
Here a snippet from my pdf file:
558 0 obj
<</Contents 583 0 R/CropBox[0 0 595.22 842]/MediaBox[0 0 595.22
842]/Parent 29 0 R/Resources
<</ColorSpace <</CS0 563 0 R>>
/ExtGState <</GS0 568 0 R>>
/Font<</TT0 559 0 R/TT1 560 0 R/TT2 561 0 R/TT3 562 0 R>>
/ProcSet[/PDF/Text/ImageC]
/Properties<</MC0<</*MYKEY* 584 0 R>>/MC1<</SubKey 582 0 R>> >>
/XObject<</Im0 578 0 R>>>>
/Rotate 0/StructParents 0/Type/Page>>
endobj
...
...
584 0 obj
<</Length 8>>stream
1_22_4_1 ->>> I NEED THIS!
endstream
so, I have to extract the string contained by MYKEY indirect object. How
can I do that?
Ok, I can get it working on python, for example using the pypdf library,
but I got stuck when I tryed to decrypt pdf files!
The only ones I can decrypt on python are the files encrypted by oldest
(very very very old) versions of acrobat.
Can anybody help me?
Thanks in advance
Giancarlo F.
p.s.: sorry for my bad english