You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "tamanjit.bindra@yahoo.co.in" <ta...@yahoo.co.in> on 2011/06/15 13:36:06 UTC

Crawl algo

Perhaps a naive question:

During crawl if i state topN as say 100, does that mean the first 100 links
that nutch gets on a particular page? Or does it fetch as per the page rank?

Either ways does it mean that it would always fetch the same links from a
page?

--
View this message in context: http://lucene.472066.n3.nabble.com/Crawl-algo-tp3066930p3066930.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Crawl algo

Posted by "tamanjit.bindra@yahoo.co.in" <ta...@yahoo.co.in>.
Thanks for the link. Things are much clearer now.

--
View this message in context: http://lucene.472066.n3.nabble.com/Crawl-algo-tp3066930p3084831.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Crawl algo

Posted by Markus Jelsma <ma...@openindex.io>.
Maybe this thread can help you out:
http://search.lucidimagination.com/search/document/63f7e3e24a6106ee/topn_value_in_crawl#2a4b71e4d876fa72

> Perhaps a naive question:
> 
> During crawl if i state topN as say 100, does that mean the first 100 links
> that nutch gets on a particular page? Or does it fetch as per the page
> rank?
> 
> Either ways does it mean that it would always fetch the same links from a
> page?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Crawl-algo-tp3066930p3066930.html Sent
> from the Nutch - User mailing list archive at Nabble.com.