You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Mark Beardsley <ma...@tiscali.co.uk> on 2010/11/30 14:19:27 UTC

Re: How to access data from Ole objects named

Which file format are you targetting - XML (.docx) or binary (.doc)? Either
way I admit to being a little confused as the documentation indicates that
you should get an array of text extractors from this method call and that
all you need to do is call the getText() metyhod on each to recover the
contents of the embedded document(s) as a String.

Yours

Mark B
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-access-data-from-Ole-objects-named-package-tp3285867p3286054.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to access data from Ole objects named

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Now I am completely confused by what you are attempting to achieve. If you
call the createExtractor() method then you will have to be passing either an
InputStream or File to it as the parameter - as you are targetting the
binary file format. Those methods return POITextExtractors only and I can
see no way to gain access to any embedded OLE documents from those objects.
Having said that, I could very well be wrong as I have never had to recover
references to embedded objects from Word documents only Excel workbooks.

Further, the only mentions of Packages that I can see are OPCPackages and
these relate - I believe - just to the newer xml based file format and not
the older binary one. Are you talking about PackageParts - these do I
believe relate to embedded documents but again, I have never accessed them
from Word documents only Excel workbooks.

Off of the top of my head, I think that you will have to try this;

Create a POIFSFileSystem object from an InputStream connected to the Word
document.
Pass that object to the static POIOLE2TextExtractor
createExtractor(POIFSFileSystem fs) method and capture the
POIOLE2TextExtractor object.
Pass the POIOLE2TextExtractor to the static POITextExtractor[] 
getEmbededDocsTextExtractors(POIOLE2TextExtractor ext) method.
Iterate through the array of POITextExtractor objects this method will
return and call the getText() method on each to recover the contents of the
files.

Without digging round a little more, I cannot determine whether it is
possible to work down into the individual text extractors and uncover the
underlying document so that more detailed processing can be performed.
Having said that, I am not at all certain if the above code would work
successfully but it should be a reasonable starting point I think.

I do not think that it is possible to get at the POIFSFileSystem from the
HWPFDocument even though the latter does encapsulate the former. As far as I
can tell from a quick glance, the reference to the file system is held
within a protected field within HWPFDocument and I cannot see a method that
exposes it.

All the best and can I ask you please to let us know how you get along.

Yours

Mark B
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-access-data-from-Ole-objects-named-package-tp3285867p3287306.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to access data from Ole objects named

Posted by Maxim Valyanskiy <ma...@jet.msk.su>.
Hello!

30.11.2010 18:53, randeel wimalagunarathne пишет:
> I am targeting binary (.doc) files. If you go into the createExtractor()
> method, you could see
> it returns only names with Excel,word,power point,visio from what i
> understand. It doesn't
> handle names with "Package". But if I am wrong please tell.
>
Can you send me your document?

best wishes, Max

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to access data from Ole objects named

Posted by randeel wimalagunarathne <ra...@gmail.com>.
Hi Mark,

I am targeting binary (.doc) files. If you go into the createExtractor()
method, you could see
it returns only names with Excel,word,power point,visio from what i
understand. It doesn't
handle names with "Package". But if I am wrong please tell.

Thank you,
Randeel.

On Tue, Nov 30, 2010 at 7:19 PM, Mark Beardsley <ma...@tiscali.co.uk>wrote:

>
> Which file format are you targetting - XML (.docx) or binary (.doc)? Either
> way I admit to being a little confused as the documentation indicates that
> you should get an array of text extractors from this method call and that
> all you need to do is call the getText() metyhod on each to recover the
> contents of the embedded document(s) as a String.
>
> Yours
>
> Mark B
> --
> View this message in context:
> http://apache-poi.1045710.n5.nabble.com/How-to-access-data-from-Ole-objects-named-package-tp3285867p3286054.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>