You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bill Janssen <ja...@parc.com> on 2004/10/17 20:09:18 UTC

Re: Google Desktop Could be Better

Bill Tschumy writes:
> I've looked at pdfBox, but the jar file is so big that I 
> hate to burden my users by incorporating it.

Bill,

My system (see http://www.parc.com/janssen/pubs/TR-03-16.pdf) uses
pdftotext underneath.  I've been very satisfied with that.  Another
Java solution would be to use Multivalent
(multivalent.sourceforge.net).  Multivalent, by the way, advertises
the following:

"Extract text from all formats. Full-text search with Lucene."

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org