You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Da...@sybase.com on 2006/01/11 01:27:43 UTC

Lucene and Nutch

I am using lucene to index local HTML files.  The requirement just changed
to index remote HTML files.  Can I use Nutch to crawl for the remote HTML
files and use the index for the Lucene code I have already written?  Or do
I have to redo the whole thing using the Nutch API?  I am using boosting
during the indexing.  I hope Nutch can boost fields, too.  Any help would
be appreciated.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniel Clark, Senior Consultant
Sybase Federal Professional Services
6550 Rock Spring Drive, Suite 800
Bethesda, MD  20817
Office - (301) 896-1103
Office Fax - (301) 896-1604
Mobile - (703) 403-0340
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene and Nutch

Posted by Chris Hostetter <ho...@fucit.org>.
: to index remote HTML files.  Can I use Nutch to crawl for the remote HTML
: files and use the index for the Lucene code I have already written?  Or do
: I have to redo the whole thing using the Nutch API?  I am using boosting
: during the indexing.  I hope Nutch can boost fields, too.  Any help would
: be appreciated.

thebest place to start with a question like this is the Nutch
documentation and user community -- between hose two information sources,
you should be able to determine what constraints nutch puts on the
fields of the index it creates, and what flexability you have to affect
field/document boosts at index time.

With that information in hand, you can make an informed choice about using
nutch in conjunction with your direct lucene access code, re-writing your
code to use whatever api nutch has, or using a third party crawler to
fetch documents for your lucene based code and ignoring nutch.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene and Nutch

Posted by Koji Sekiguchi <ko...@m4.dion.ne.jp>.
FYI:

open source web crawler:
http://java-source.net/open-source/crawlers


Thanks,

Koji


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org