You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Camilo Abel Monreal <km...@matcom.uh.cu> on 2005/09/05 13:15:37 UTC
separate Crawler from nutch
Hi :
I try to separate the nutch crawler from entire project. I need to
download the page to a file.Please if someone have that please help me.
thanks kmilo
link analysis in OC
Posted by Michael Ji <fj...@yahoo.com>.
hi Kelvin:
Did OC compute page score same as Nutch crawling?
I found Nutch/index compute document boost value based
on the score/anchor data in segment/fetchlist data
structure.
I guess OC won't generate this boost score by itself
or use its' own data structure. So if we want to have
this score saved in lucene index, we need to use
nutch/generate.. to get the fetchlist and generate
webdb.
That means OC will live with Nutch's webdb and other
data structures.
Is my though right?
thanks,
Michael Ji
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: separate Crawler from nutch
Posted by Stefan Groschupf <sg...@media-style.com>.
Hi,
There is a set of standalone crawler available,
the coolst one from my point of view is crawler.archive.org
Stefan
Am 05.09.2005 um 13:15 schrieb Camilo Abel Monreal:
> Hi :
>
> I try to separate the nutch crawler from entire project. I need to
> download the page to a file.Please if someone have that please
> help me.
>
> thanks kmilo
>
>