You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bill Janssen <ja...@parc.com> on 2004/10/17 20:09:18 UTC
Re: Google Desktop Could be Better
Bill Tschumy writes:
> I've looked at pdfBox, but the jar file is so big that I
> hate to burden my users by incorporating it.
Bill,
My system (see http://www.parc.com/janssen/pubs/TR-03-16.pdf) uses
pdftotext underneath. I've been very satisfied with that. Another
Java solution would be to use Multivalent
(multivalent.sourceforge.net). Multivalent, by the way, advertises
the following:
"Extract text from all formats. Full-text search with Lucene."
Bill
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org