You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Axel Rose <ax...@googlemail.com> on 2013/08/14 11:17:56 UTC

dump COSDictionary recursively

Hello all,

has anybody a working solution or a guide for me how to achieve this:

Given a COSDictionary object how would I get a recursive text dump of
all its subobject?

I tried something like:

  PDDocument doc = PDDocument.load(file);
  PDDocumentCatalog catalog = doc.getDocumentCatalog();
  List<PDPage> allPages = catalog.getAllPages();
  for (PDPage page : allPages) {
    COSDictionary pageDict = page.getCOSDictionary();
    for (Entry<COSName, COSBase> entry : pageDict.entrySet()) {
      if (value instanceof COSArray)
        // process array
      else if (value instanceof COSString)
        // process String
      else if (value instanceof COSObject)
        // stuck
    }
  }

I'm stuck how to recurse into the COSObject.

pageDict.toString() gives me some overview:

COSDictionary{
  (COSName{Annots}:COSArray{[COSObject{19, 0}, COSObject{20, 0}]})

but also doesn't go deeper into the "Annots" object.


Thanks for your help

Axel


Re: dump COSDictionary recursively

Posted by Axel Rose <ax...@googlemail.com>.
Hi Andreas,

thanks for your nice reply.

> Try something like this to get the encapsulated object:
> 
> COSBase baseObject = ((COSObject)value).getObject();

This got me going.


> Do you need the information for a further process or are you just interested to
> see those information for debugging purposes or something similar? Maybe you
> should try the PDFDebugger.

I want to provide the structure as an XML output file.
PDFDebugger is nice but interactive. I need to traverse the
tree model programatically and think to manage the task myself now.

Just thought that somebody already did it.


Best regards,
Axel.

Re: dump COSDictionary recursively

Posted by Axel Rose <ax...@googlemail.com>.
One tiny problem I'm still fighting with is endless recursion.

It is reproducible with PDFDebugger. If I open e.g. the "Catalog" node
with Alt-Click, thereby opening all subnodes at once, the action never
stops. Presumable this is because of the "Parent" nodes.

I thought I can stop recursion by checking object.getName() for the
String "Parent". Is there any better method?


Thanks for your help

Axel.


Re: dump COSDictionary recursively

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 14.08.2013 11:17, schrieb Axel Rose:
> Hello all,
>
> has anybody a working solution or a guide for me how to achieve this:
>
> Given a COSDictionary object how would I get a recursive text dump of
> all its subobject?
>
> I tried something like:
>
>    PDDocument doc = PDDocument.load(file);
>    PDDocumentCatalog catalog = doc.getDocumentCatalog();
>    List<PDPage> allPages = catalog.getAllPages();
>    for (PDPage page : allPages) {
>      COSDictionary pageDict = page.getCOSDictionary();
>      for (Entry<COSName, COSBase> entry : pageDict.entrySet()) {
>        if (value instanceof COSArray)
>          // process array
>        else if (value instanceof COSString)
>          // process String
>        else if (value instanceof COSObject)
>          // stuck
>      }
>    }
>
> I'm stuck how to recurse into the COSObject.
Try something like this to get the encapsulated object:

COSBase baseObject = ((COSObject)value).getObject();

>
> pageDict.toString() gives me some overview:
>
> COSDictionary{
>    (COSName{Annots}:COSArray{[COSObject{19, 0}, COSObject{20, 0}]})
>
> but also doesn't go deeper into the "Annots" object.
>
>
> Thanks for your help
Do you need the information for a further process or are you just interested to
see those information for debugging purposes or something similar? Maybe you
should try the PDFDebugger.

>
> Axel

BR
Andreas Lehmkühler