You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by stigman <st...@yahoo.com> on 2009/07/24 14:35:14 UTC

read embedded objects in a word doc

I'm trying to read the embedded objects in a word doc and do not see any
methods within hwpf to access these objects like hssf and hslf have. Is
there a way to extract these objects, know what kind of objects they are?
Will I need to use the poifs directly and scrap my hwpf code? I've looked on
the internet and have come up empty.
-- 
View this message in context: http://www.nabble.com/read-embedded-objects-in-a-word-doc-tp24643756p24643756.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: read embedded objects in a word doc

Posted by stigman <st...@yahoo.com>.
Got my version of the HSSF working. I had the preserveNodes set to false.


MSB wrote:
> 
> 
> PS Have you managed to extract embedded objects from a worksheet using
> HSSF?
> 
> 

-- 
View this message in context: http://www.nabble.com/read-embedded-objects-in-a-word-doc-tp24643756p24651153.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: read embedded objects in a word doc

Posted by stigman <st...@yahoo.com>.
Yes, I was able to get your code to work with HSSF, which only works for one
level of embedded objects. 

Since my data may have more, I created a class with an initial HSSFWorkbook
assignment like below prior to getting the embedded object list and the
workbook created from the FileInputStream constructor uses the
fileSys.getRoot for the DirectoryNode, which then fails when it calls the
HSSFDataObject obj.getDirectory() to get the current directory node with a
null value.

HSSFWorkbook workbook = new HSSFWorkbook(dirNode, fileSys, false);

Using this constructor when creating the initial workbook with your code
also failed when assigning the dirNode to fileSys.getRoot(). What I'm trying
to do is instead of creating a second workbook in the code like your
embeddedWorkbook line, I am instantiating another version of my class which
has a constructor taking the DirectoryNode and filesystem as inputs to
create the embedded workbook and have access to it's content.
 


MSB wrote:
> 
> 
> PS Have you managed to extract embedded objects from a worksheet using
> HSSF?
> 
> 

-- 
View this message in context: http://www.nabble.com/read-embedded-objects-in-a-word-doc-tp24643756p24650714.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: read embedded objects in a word doc

Posted by MSB <ma...@tiscali.co.uk>.
HWPF is still very much in the development phase and I do not think that it
is possible to extract embedded objects from a Word document using the API;
though I could very well be wrong. You do have a few options however
depending upon your platform and requirement.

If you are running stand alone on a windows PC, then OLE could be a viable
option I believe. OLE allows you to manipulate an 'instance' of Word and
control it by executing VBA commands. Virtually evertything you can do with
Word can be accomplished through OLE.

OpenOffice can read and render all but the more complex Word documents. It
has an interface - UNO - that you can use to perform similar operations to
those you can through OLE. However, the interface is quite complex, the
learning curve steep, speed of execution is quite slow and there are limits
on the type of architecture your application could have.

Yours

Mark B

PS Have you managed to extract embedded objects from a worksheet using HSSF?


stigman wrote:
> 
> I'm trying to read the embedded objects in a word doc and do not see any
> methods within hwpf to access these objects like hssf and hslf have. Is
> there a way to extract these objects, know what kind of objects they are?
> Will I need to use the poifs directly and scrap my hwpf code? I've looked
> on the internet and have come up empty.
> 

-- 
View this message in context: http://www.nabble.com/read-embedded-objects-in-a-word-doc-tp24643756p24645115.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org