You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Erik Sundin <su...@gmail.com> on 2008/01/25 08:27:34 UTC

Extracting embedding objects in Word Documents

Hi,

I am using POI to "unpack" a Word Document which has embedded objects like
other Word Documents and pictures. The embedded objects can be in several
layers, by that I mean that an embedded Word Doucument can also have an
embedded document and so on.

My intention is to extract all these objects to a flat structure. I have
suceeded to do so by first using POIFSFileSystem to get an image of the
original document, I then get the DirectoryEntry "ObjectPool" and recurively
look for entries like "WordDocument", "WorkBook" and "PowerPoint Document".
If I find one of these I create a new POIFSFileSystem and copy the whole
structure from the original embedded object and write it to disk.

All objects get extracted ok, though it seems that embedded objects with
another embedded object gets damaged in the process. If I open an extracted
"layer-n" (n>1) document which has another document embedded I cannot open
the embedded document. Word just gives me an error saying it can't find the
file.

Am I missing some records I need to copy from the original document which
are not located in the ObjectPool?

I'm thankful for all responses.
Erik

Re: Extracting embedding objects in Word Documents

Posted by Rainer Schwarze <rs...@admadic.de>.
Erik Sundin wrote:
> Hi,
[...extracting embedded objects...]
> All objects get extracted ok, though it seems that embedded objects with
> another embedded object gets damaged in the process. If I open an extracted
> "layer-n" (n>1) document which has another document embedded I cannot open
> the embedded document. Word just gives me an error saying it can't find the
> file.

Hi Erik,

I can offer to take a look at a file which fails to open. If you send it 
to my email address, I might see what causes the problem.

Best wishes, Rainer
-- 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org