You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Chris Bamford <cb...@mimecast.com> on 2014/07/16 12:16:34 UTC

Trouble extracting embedded bin files from CDF

Hi there,

Apologies for the screenshots, but I think they are the easiest way to explain my problem.
I need to extract embedded OLE10Native files from CDF Word docs.  I thought my code was working, but someone recently reported it broken when .bin files are embedded (PBrush files?).
My approach is this:

Open the Word doc as a stream, then:

        npoifsFileSystem = new NPOIFSFileSystem(bis);
        scanForEmbeddedOleDocs(npoifsFileSystem.getRoot());

In scanForEmbeddedOleDocs() I iterate through the structure (recursing when other DirectoryNodes are found), looking for entry names of "\u0001Ole10Native”.  When found, I call

        byte[] imageData = Ole10Native.createFromEmbeddedOleObject(dirNode).getDataBuffer();

to get the image data.

Now, this works in some cases (embedded MP3s for example) but fails for others (BIN files).  The 2 screenshots below taken from the debugger show the state of 2 DirectoryNodes at the point of extraction.

The first one (Embedded_MP3_OK.png) shows the success case with an MP3:

[cid:CD9D5A3B-665D-45B2-943F-5F8E176972A3]

The second (Embedded_BIN_Fails.png) show the problem case with the BIN file:

[cid:C5CEF6D2-E1FC-4CC3-9236-899D2426AC3F]

For further validation I converted the doc containing the BIN to docx and unzipped it and that successfully extracts a bin file (so I know it can be done!):

find . -ls
1807158        0 drwxr-xr-x    6 cbamford         790807719             204 16 Jul 09:28 .
1807163        8 -rw-r--r--    1 cbamford         790807719            1701  1 Jan  1980 ./[Content_Types].xml
…..
1807170        0 drwxr-xr-x    3 cbamford         790807719             102 16 Jul 09:28 ./word/embeddings
1807171        8 -rw-r--r--    1 cbamford         790807719            3072  1 Jan  1980 ./word/embeddings/oleObject1.bin
….

Can anyone tell me what I am doing wrong in my code?  I am using POI-3.10-FINAL.

Thanks!

- Chris

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/