You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Chris Bamford <cb...@mimecast.com> on 2015/01/16 18:33:19 UTC

Reading CompObj

Hi folks,

I have an xlsx spreadsheet embedded in a traditional doc file, but I don't know how to programatically get at it.  POIFSLister shows the doc file:

Root Entry -
  SummaryInformation <(0x05)SummaryInformation>
  DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
  WordDocument
  1Table
  ObjectPool -
    _1356693908 -
      Package
      CompObj <(0x01)CompObj>         <<<<----  In here
      ObjInfo <(0x03)ObjInfo>
      Ole <(0x01)Ole>
  CompObj <(0x01)CompObj>
  Data

A colleague who works with Windows has examined the file and determined that the embedded file lies in the marked "CompObj" entry.
Does POI have an API for getting hold of it?

Thanks

- Chris

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/







Re: Reading CompObj

Posted by Chris Bamford <cb...@mimecast.com>.
Nick,

You're right - it works perfectly.  From reading the Tika code it appears that if an object pool contains a DocumentEntry called "Package" it is safe to assume it is an OOXML document which is embedded as is i.e. as a PKZip blob.
So to access it you just need to:

                if (name.equals("Package")) {
                    InputStream stream = new DocumentInputStream((DocumentEntry) entry);

                    try {
                        flushStreamToFile(stream, "/tmp/ooxml-file", ((DocumentEntry) entry).getSize());

                    } finally {
                        stream.close();
                    }
                }

Thanks for your help!

- Chris

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/

On 18 Jan 2015, at 18:36, Nick Burch <ap...@gagravarr.org> wrote:

> On Fri, 16 Jan 2015, Chris Bamford wrote:
>> A colleague who works with Windows has examined the file and determined that the embedded file lies in the marked "CompObj" entry. Does POI have an API for getting hold of it?
> 
> Your best example is probably in AbstractPOIFSExtractor from Apache Tika - that has the exact code you need to read a CompObj entry from a given directory within an OLE2 filesystem
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 




Re: Reading CompObj

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 16 Jan 2015, Chris Bamford wrote:
> A colleague who works with Windows has examined the file and determined 
> that the embedded file lies in the marked "CompObj" entry. Does POI have 
> an API for getting hold of it?

Your best example is probably in AbstractPOIFSExtractor from Apache Tika - 
that has the exact code you need to read a CompObj entry from a given 
directory within an OLE2 filesystem

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org