You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Spencer <da...@tropo.com> on 2004/02/17 14:53:11 UTC
ppt text extraction - Re: SearchBlox J2EE Search Component Version
1.2 released
Eric Jain wrote:
>>- Support for PowerPoint documents
>>
>>
>
>May I ask how you extract text from PowerPoint documents? Any open
>source tool, or your own code?
>
>
FYI I recently discovered "ppthtml" in this package:
http://chicago.sourceforge.net/xlhtml/
Also "antiword" seems to work well for word docs.
Also also also....I use a utility from xpdf
(http://www.foolabs.com/xpdf/) for pdf text
extraction.
When you get down to it, I have found that "portable c" tools (above)
work better
than the pure java ones avail. To be fair however I have found that POI
does work fine
for XLS docs.
- Dave
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: ppt text extraction - Re: SearchBlox J2EE Search Component Version 1.2 released
Posted by Ryan Ackley <sa...@cfl.rr.com>.
> When you get down to it, I have found that "portable c" tools (above)
> work better
> than the pure java ones avail. To be fair however I have found that POI
> does work fine
> for XLS docs.
Gee thanks, your so generous with your praise.
I would recommend the OpenOffice SDK if you don't mind "portable c". It
supports all the possible MS Office formats going back to the dark ages. It
has a built-in Java programming interface, you don't have to compile it
yourself using cygwin, and it has a huge team of developers working 40+
hours a week to squash any bugs.
-Ryan
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org