You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Rod Taylor <rb...@sitesell.com> on 2005/11/10 19:03:33 UTC

Max Per Host and topN

It seems maxPerHost could cause us not to fill each segment to topN even
when there are more than enough URLs for this job.

We should only count URLs we keep instead of all URLs considered.

There were also two variables named count which is probably bad form
(not a Java person, but it certainly looked odd).

-- 
Rod Taylor <rb...@sitesell.com>

Re: Max Per Host and topN

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> It seems maxPerHost could cause us not to fill each segment to topN even
> when there are more than enough URLs for this job.
> 
> We should only count URLs we keep instead of all URLs considered.
> 
> There were also two variables named count which is probably bad form
> (not a Java person, but it certainly looked odd).

I just committed this patch.  Thanks!

Doug

Re: Max Per Host and topN

Posted by Stefan Groschupf <sg...@media-style.com>.
+1

Am 10.11.2005 um 19:03 schrieb Rod Taylor:

> <Generator.java.patch>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net