You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Chris Bamford <cb...@mimecast.com> on 2014/07/16 12:16:34 UTC
Trouble extracting embedded bin files from CDF
Hi there,
Apologies for the screenshots, but I think they are the easiest way to explain my problem.
I need to extract embedded OLE10Native files from CDF Word docs. I thought my code was working, but someone recently reported it broken when .bin files are embedded (PBrush files?).
My approach is this:
Open the Word doc as a stream, then:
npoifsFileSystem = new NPOIFSFileSystem(bis);
scanForEmbeddedOleDocs(npoifsFileSystem.getRoot());
In scanForEmbeddedOleDocs() I iterate through the structure (recursing when other DirectoryNodes are found), looking for entry names of "\u0001Ole10Native”. When found, I call
byte[] imageData = Ole10Native.createFromEmbeddedOleObject(dirNode).getDataBuffer();
to get the image data.
Now, this works in some cases (embedded MP3s for example) but fails for others (BIN files). The 2 screenshots below taken from the debugger show the state of 2 DirectoryNodes at the point of extraction.
The first one (Embedded_MP3_OK.png) shows the success case with an MP3:
[cid:CD9D5A3B-665D-45B2-943F-5F8E176972A3]
The second (Embedded_BIN_Fails.png) show the problem case with the BIN file:
[cid:C5CEF6D2-E1FC-4CC3-9236-899D2426AC3F]
For further validation I converted the doc containing the BIN to docx and unzipped it and that successfully extracts a bin file (so I know it can be done!):
find . -ls
1807158 0 drwxr-xr-x 6 cbamford 790807719 204 16 Jul 09:28 .
1807163 8 -rw-r--r-- 1 cbamford 790807719 1701 1 Jan 1980 ./[Content_Types].xml
…..
1807170 0 drwxr-xr-x 3 cbamford 790807719 102 16 Jul 09:28 ./word/embeddings
1807171 8 -rw-r--r-- 1 cbamford 790807719 3072 1 Jan 1980 ./word/embeddings/oleObject1.bin
….
Can anyone tell me what I am doing wrong in my code? I am using POI-3.10-FINAL.
Thanks!
- Chris
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/