You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Chris Bamford <cb...@mimecast.com> on 2015/01/16 18:33:19 UTC
Reading CompObj
Hi folks,
I have an xlsx spreadsheet embedded in a traditional doc file, but I don't know how to programatically get at it. POIFSLister shows the doc file:
Root Entry -
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
WordDocument
1Table
ObjectPool -
_1356693908 -
Package
CompObj <(0x01)CompObj> <<<<---- In here
ObjInfo <(0x03)ObjInfo>
Ole <(0x01)Ole>
CompObj <(0x01)CompObj>
Data
A colleague who works with Windows has examined the file and determined that the embedded file lies in the marked "CompObj" entry.
Does POI have an API for getting hold of it?
Thanks
- Chris
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
Re: Reading CompObj
Posted by Chris Bamford <cb...@mimecast.com>.
Nick,
You're right - it works perfectly. From reading the Tika code it appears that if an object pool contains a DocumentEntry called "Package" it is safe to assume it is an OOXML document which is embedded as is i.e. as a PKZip blob.
So to access it you just need to:
if (name.equals("Package")) {
InputStream stream = new DocumentInputStream((DocumentEntry) entry);
try {
flushStreamToFile(stream, "/tmp/ooxml-file", ((DocumentEntry) entry).getSize());
} finally {
stream.close();
}
}
Thanks for your help!
- Chris
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
On 18 Jan 2015, at 18:36, Nick Burch <ap...@gagravarr.org> wrote:
> On Fri, 16 Jan 2015, Chris Bamford wrote:
>> A colleague who works with Windows has examined the file and determined that the embedded file lies in the marked "CompObj" entry. Does POI have an API for getting hold of it?
>
> Your best example is probably in AbstractPOIFSExtractor from Apache Tika - that has the exact code you need to read a CompObj entry from a given directory within an OLE2 filesystem
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
Re: Reading CompObj
Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 16 Jan 2015, Chris Bamford wrote:
> A colleague who works with Windows has examined the file and determined
> that the embedded file lies in the marked "CompObj" entry. Does POI have
> an API for getting hold of it?
Your best example is probably in AbstractPOIFSExtractor from Apache Tika -
that has the exact code you need to read a CompObj entry from a given
directory within an OLE2 filesystem
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org