You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Axel Rose <ax...@googlemail.com> on 2013/08/14 11:17:56 UTC
dump COSDictionary recursively
Hello all,
has anybody a working solution or a guide for me how to achieve this:
Given a COSDictionary object how would I get a recursive text dump of
all its subobject?
I tried something like:
PDDocument doc = PDDocument.load(file);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
List<PDPage> allPages = catalog.getAllPages();
for (PDPage page : allPages) {
COSDictionary pageDict = page.getCOSDictionary();
for (Entry<COSName, COSBase> entry : pageDict.entrySet()) {
if (value instanceof COSArray)
// process array
else if (value instanceof COSString)
// process String
else if (value instanceof COSObject)
// stuck
}
}
I'm stuck how to recurse into the COSObject.
pageDict.toString() gives me some overview:
COSDictionary{
(COSName{Annots}:COSArray{[COSObject{19, 0}, COSObject{20, 0}]})
but also doesn't go deeper into the "Annots" object.
Thanks for your help
Axel
Re: dump COSDictionary recursively
Posted by Axel Rose <ax...@googlemail.com>.
Hi Andreas,
thanks for your nice reply.
> Try something like this to get the encapsulated object:
>
> COSBase baseObject = ((COSObject)value).getObject();
This got me going.
> Do you need the information for a further process or are you just interested to
> see those information for debugging purposes or something similar? Maybe you
> should try the PDFDebugger.
I want to provide the structure as an XML output file.
PDFDebugger is nice but interactive. I need to traverse the
tree model programatically and think to manage the task myself now.
Just thought that somebody already did it.
Best regards,
Axel.
Re: dump COSDictionary recursively
Posted by Axel Rose <ax...@googlemail.com>.
One tiny problem I'm still fighting with is endless recursion.
It is reproducible with PDFDebugger. If I open e.g. the "Catalog" node
with Alt-Click, thereby opening all subnodes at once, the action never
stops. Presumable this is because of the "Parent" nodes.
I thought I can stop recursion by checking object.getName() for the
String "Parent". Is there any better method?
Thanks for your help
Axel.
Re: dump COSDictionary recursively
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 14.08.2013 11:17, schrieb Axel Rose:
> Hello all,
>
> has anybody a working solution or a guide for me how to achieve this:
>
> Given a COSDictionary object how would I get a recursive text dump of
> all its subobject?
>
> I tried something like:
>
> PDDocument doc = PDDocument.load(file);
> PDDocumentCatalog catalog = doc.getDocumentCatalog();
> List<PDPage> allPages = catalog.getAllPages();
> for (PDPage page : allPages) {
> COSDictionary pageDict = page.getCOSDictionary();
> for (Entry<COSName, COSBase> entry : pageDict.entrySet()) {
> if (value instanceof COSArray)
> // process array
> else if (value instanceof COSString)
> // process String
> else if (value instanceof COSObject)
> // stuck
> }
> }
>
> I'm stuck how to recurse into the COSObject.
Try something like this to get the encapsulated object:
COSBase baseObject = ((COSObject)value).getObject();
>
> pageDict.toString() gives me some overview:
>
> COSDictionary{
> (COSName{Annots}:COSArray{[COSObject{19, 0}, COSObject{20, 0}]})
>
> but also doesn't go deeper into the "Annots" object.
>
>
> Thanks for your help
Do you need the information for a further process or are you just interested to
see those information for debugging purposes or something similar? Maybe you
should try the PDFDebugger.
>
> Axel
BR
Andreas Lehmkühler