You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Enzo Michelangeli <en...@gmail.com> on 2007/06/03 18:17:54 UTC

Is fetcher.throttle.bandwidth known to work?

In my case (with Nutch 0.8), it seems not: I set it to 500, and the fetcher 
still saturates the 1.5 Mbit/s link... Is it supposed to work for the total 
bandwidth, or for each thread?

Enzo


Re: Is fetcher.throttle.bandwidth known to work?

Posted by Matthias Jaekle <ja...@eventax.com>.
Hello Enzo,

we never developed a patch for this issue.

I believe back in 2004 and nutch 0.4 version, there was an other fetcher 
modul which was replaced in 0.5 version.

This fetcher was able to throttle bandwith, but it was also very buggy.

So the wiki description would be obsolete.

I am not familar with all the changes since version 0.7
So, it might be good, if somebody could change the wiki.

If you are interested to see, how this option was implemented, maybe you 
could find the old version in cvs.

Regards,

Matthias




Enzo Michelangeli schrieb:
 > Hi Matthias,
 >
 > I'm writing you about the Nutch config file option
 > "fetcher.throttle.bandwidth" , referenced by you at
 > http://wiki.apache.org/nutch/FetchOptions . According to Andrzej
 > Bialecki in
 > the thread
 > 
http://www.nabble.com/Is--fetcher.throttle.bandwidth-known-to-work--t3861057.html 

 > ,
 > that refers to a private patch not part of Nutch' mainline code base. Is
 > that patch available from you for submission to the Nutch team?
 >
 > Thanks,
 >
 > Enzo
 >
 >


Enzo Michelangeli schrieb:
> ----- Original Message ----- From: "Andrzej Bialecki" <ab...@getopt.org>
> Sent: Tuesday, June 05, 2007 4:56 PM
> 
> [...]
>> You can achieve a somewhat similar effect by controlling the number of 
>> fetcher threads. I realize this is not as accurate as a specific 
>> control mechanism, but so far it was sufficient for most users.
>>
>> If this feature is important to you, please provide a patch that 
>> implements it, and we'll consider it for inclusion.
> 
> I think that for the time being I'll just channel the traffic through a 
> Squid proxy, and use its "delay pools" feature to throttle the bandwidth 
> (and also its DNS caching, which, as I mentioned a few days ago, I also 
> need...). For Nutch, it might make sense to find the original patch. 
> I'll try to get n touch with Matthias Jaekle, who authored that wiki 
> page where fetcher.throttle.bandwidth was referenced.
> 
> Thanks anyway,
> 
> Enzo
> 
> 
> 

Re: Is fetcher.throttle.bandwidth known to work?

Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message ----- 
From: "Andrzej Bialecki" <ab...@getopt.org>
Sent: Tuesday, June 05, 2007 4:56 PM

[...]
> You can achieve a somewhat similar effect by controlling the number of 
> fetcher threads. I realize this is not as accurate as a specific control 
> mechanism, but so far it was sufficient for most users.
>
> If this feature is important to you, please provide a patch that 
> implements it, and we'll consider it for inclusion.

I think that for the time being I'll just channel the traffic through a 
Squid proxy, and use its "delay pools" feature to throttle the bandwidth 
(and also its DNS caching, which, as I mentioned a few days ago, I also 
need...). For Nutch, it might make sense to find the original patch. I'll 
try to get n touch with Matthias Jaekle, who authored that wiki page where 
fetcher.throttle.bandwidth was referenced.

Thanks anyway,

Enzo
 


Re: Is fetcher.throttle.bandwidth known to work?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Enzo Michelangeli wrote:
> ----- Original Message ----- From: "Andrzej Bialecki" <ab...@getopt.org>
> Sent: Monday, June 04, 2007 2:05 PM
> 
>>> Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions 
>>> , so I thought it was for real...
>>
>> Sorry, this page is wrong and should be corrected - some of the 
>> options listed there were either a part of older version of Fetcher 
>> (and have been replaced), or they were a part of a private patch (as 
>> was the case with throttling).
> 
> Don't you think that throttling would be a valuable feature to retain? 
> Is there anything to prevent saturation of the link to the Internet, 
> either in the release 0.9 or in the current nightly builds code?
>

You can achieve a somewhat similar effect by controlling the number of 
fetcher threads. I realize this is not as accurate as a specific control 
mechanism, but so far it was sufficient for most users.

If this feature is important to you, please provide a patch that 
implements it, and we'll consider it for inclusion.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Is fetcher.throttle.bandwidth known to work?

Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message ----- 
From: "Andrzej Bialecki" <ab...@getopt.org>
Sent: Monday, June 04, 2007 2:05 PM

>> Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions , 
>> so I thought it was for real...
>
> Sorry, this page is wrong and should be corrected - some of the options 
> listed there were either a part of older version of Fetcher (and have been 
> replaced), or they were a part of a private patch (as was the case with 
> throttling).

Don't you think that throttling would be a valuable feature to retain? Is 
there anything to prevent saturation of the link to the Internet, either in 
the release 0.9 or in the current nightly builds code?

Enzo


Re: Is fetcher.throttle.bandwidth known to work?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Enzo Michelangeli wrote:
> ----- Original Message ----- From: "Andrzej Bialecki" <ab...@getopt.org>
> Sent: Monday, June 04, 2007 1:31 AM
> 
>> Enzo Michelangeli wrote:
>>> In my case (with Nutch 0.8), it seems not: I set it to 500, and the
>>> fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work 
>>> for
>>> the total bandwidth, or for each thread?
>>
>> There's nothing in the current code base to support this, neither 
>> there is
>> a config property with such name ... Is this perhaps a part of your local
>> code base?
> 
> Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions , 
> so I
> thought it was for real...

Sorry, this page is wrong and should be corrected - some of the options 
listed there were either a part of older version of Fetcher (and have 
been replaced), or they were a part of a private patch (as was the case 
with throttling).


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Is fetcher.throttle.bandwidth known to work?

Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message ----- 
From: "Andrzej Bialecki" <ab...@getopt.org>
Sent: Monday, June 04, 2007 1:31 AM

> Enzo Michelangeli wrote:
>> In my case (with Nutch 0.8), it seems not: I set it to 500, and the
>> fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work for
>> the total bandwidth, or for each thread?
>
> There's nothing in the current code base to support this, neither there is
> a config property with such name ... Is this perhaps a part of your local
> code base?

Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions , so I
thought it was for real...

Enzo


Re: Is fetcher.throttle.bandwidth known to work?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Enzo Michelangeli wrote:
> In my case (with Nutch 0.8), it seems not: I set it to 500, and the 
> fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work 
> for the total bandwidth, or for each thread?

There's nothing in the current code base to support this, neither there 
is a config property with such name ... Is this perhaps a part of your 
local code base?



-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com