You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Mark Beardsley <ma...@tiscali.co.uk> on 2010/12/01 08:53:36 UTC

Re: How to access data from Ole objects named

Now I am completely confused by what you are attempting to achieve. If you
call the createExtractor() method then you will have to be passing either an
InputStream or File to it as the parameter - as you are targetting the
binary file format. Those methods return POITextExtractors only and I can
see no way to gain access to any embedded OLE documents from those objects.
Having said that, I could very well be wrong as I have never had to recover
references to embedded objects from Word documents only Excel workbooks.

Further, the only mentions of Packages that I can see are OPCPackages and
these relate - I believe - just to the newer xml based file format and not
the older binary one. Are you talking about PackageParts - these do I
believe relate to embedded documents but again, I have never accessed them
from Word documents only Excel workbooks.

Off of the top of my head, I think that you will have to try this;

Create a POIFSFileSystem object from an InputStream connected to the Word
document.
Pass that object to the static POIOLE2TextExtractor
createExtractor(POIFSFileSystem fs) method and capture the
POIOLE2TextExtractor object.
Pass the POIOLE2TextExtractor to the static POITextExtractor[] 
getEmbededDocsTextExtractors(POIOLE2TextExtractor ext) method.
Iterate through the array of POITextExtractor objects this method will
return and call the getText() method on each to recover the contents of the
files.

Without digging round a little more, I cannot determine whether it is
possible to work down into the individual text extractors and uncover the
underlying document so that more detailed processing can be performed.
Having said that, I am not at all certain if the above code would work
successfully but it should be a reasonable starting point I think.

I do not think that it is possible to get at the POIFSFileSystem from the
HWPFDocument even though the latter does encapsulate the former. As far as I
can tell from a quick glance, the reference to the file system is held
within a protected field within HWPFDocument and I cannot see a method that
exposes it.

All the best and can I ask you please to let us know how you get along.

Yours

Mark B
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-access-data-from-Ole-objects-named-package-tp3285867p3287306.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org