You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Bill Goffe <go...@Oswego.EDU> on 2005/04/22 23:19:31 UTC
Re: [Nutch-dev] Re: How to manage fetching?
This brings to mind a minor suggestion -- rather than topN, why not have
the top percentage? Each time I use topN I think think in terms of a
percentage of sites. Seems easier to have the machine do such a simple
calculation...
- Bill
Tim said:
> Thanks. I made the changes you suggested but the problem persisted.
> After about 5 rounds of 1000 URLs one site would "take over." I made
> the attached small change to get around this problem. It allows you to
> specific the maximum number of URLs you want from any single host. I
> now use -topN 1000 -maxSite 500 and things are going as I had hoped.
>
> Thanks,
> Tim
--
*------------------------------------------------------*
| Bill Goffe goffe@oswego.edu |
| Department of Economics voice: (315) 312-3444 |
| SUNY Oswego fax: (315) 312-5444 |
| 416 Mahar Hall <wuecon.wustl.edu/~goffe> |
| Oswego, NY 13126 |
*--------*------------------------------------------------------*-----------*
| "Two physics majors, Justin Kasper and Fred Niell, gathered up some |
| spare junk from their physics labs and dorm rooms and built a |
| plutonium-producing reactor. |
| "`It's kind of scary how easy it was to do,' said Niell, assuring |
| onlookers that there was only a trace of plutonium -- nothing harmful. |
| `It only took us about a day to build it. We've been thinking about it |
| for a few days and we gathered the parts, and last night we assembled |
| it. In Justin's room -- he lost the coin toss.'" |
| -- A description of part of the University of Chicago Scavenger Hunt, |
| where making a reactor was one of the possible projects. New York |
| Times, May 19, 1999. |
*---------------------------------------------------------------------------*