You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Enzo Michelangeli <en...@gmail.com> on 2007/06/03 18:17:54 UTC
Is fetcher.throttle.bandwidth known to work?
In my case (with Nutch 0.8), it seems not: I set it to 500, and the fetcher
still saturates the 1.5 Mbit/s link... Is it supposed to work for the total
bandwidth, or for each thread?
Enzo
Re: Is fetcher.throttle.bandwidth known to work?
Posted by Matthias Jaekle <ja...@eventax.com>.
Hello Enzo,
we never developed a patch for this issue.
I believe back in 2004 and nutch 0.4 version, there was an other fetcher
modul which was replaced in 0.5 version.
This fetcher was able to throttle bandwith, but it was also very buggy.
So the wiki description would be obsolete.
I am not familar with all the changes since version 0.7
So, it might be good, if somebody could change the wiki.
If you are interested to see, how this option was implemented, maybe you
could find the old version in cvs.
Regards,
Matthias
Enzo Michelangeli schrieb:
> Hi Matthias,
>
> I'm writing you about the Nutch config file option
> "fetcher.throttle.bandwidth" , referenced by you at
> http://wiki.apache.org/nutch/FetchOptions . According to Andrzej
> Bialecki in
> the thread
>
http://www.nabble.com/Is--fetcher.throttle.bandwidth-known-to-work--t3861057.html
> ,
> that refers to a private patch not part of Nutch' mainline code base. Is
> that patch available from you for submission to the Nutch team?
>
> Thanks,
>
> Enzo
>
>
Enzo Michelangeli schrieb:
> ----- Original Message ----- From: "Andrzej Bialecki" <ab...@getopt.org>
> Sent: Tuesday, June 05, 2007 4:56 PM
>
> [...]
>> You can achieve a somewhat similar effect by controlling the number of
>> fetcher threads. I realize this is not as accurate as a specific
>> control mechanism, but so far it was sufficient for most users.
>>
>> If this feature is important to you, please provide a patch that
>> implements it, and we'll consider it for inclusion.
>
> I think that for the time being I'll just channel the traffic through a
> Squid proxy, and use its "delay pools" feature to throttle the bandwidth
> (and also its DNS caching, which, as I mentioned a few days ago, I also
> need...). For Nutch, it might make sense to find the original patch.
> I'll try to get n touch with Matthias Jaekle, who authored that wiki
> page where fetcher.throttle.bandwidth was referenced.
>
> Thanks anyway,
>
> Enzo
>
>
>
Re: Is fetcher.throttle.bandwidth known to work?
Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message -----
From: "Andrzej Bialecki" <ab...@getopt.org>
Sent: Tuesday, June 05, 2007 4:56 PM
[...]
> You can achieve a somewhat similar effect by controlling the number of
> fetcher threads. I realize this is not as accurate as a specific control
> mechanism, but so far it was sufficient for most users.
>
> If this feature is important to you, please provide a patch that
> implements it, and we'll consider it for inclusion.
I think that for the time being I'll just channel the traffic through a
Squid proxy, and use its "delay pools" feature to throttle the bandwidth
(and also its DNS caching, which, as I mentioned a few days ago, I also
need...). For Nutch, it might make sense to find the original patch. I'll
try to get n touch with Matthias Jaekle, who authored that wiki page where
fetcher.throttle.bandwidth was referenced.
Thanks anyway,
Enzo
Re: Is fetcher.throttle.bandwidth known to work?
Posted by Andrzej Bialecki <ab...@getopt.org>.
Enzo Michelangeli wrote:
> ----- Original Message ----- From: "Andrzej Bialecki" <ab...@getopt.org>
> Sent: Monday, June 04, 2007 2:05 PM
>
>>> Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions
>>> , so I thought it was for real...
>>
>> Sorry, this page is wrong and should be corrected - some of the
>> options listed there were either a part of older version of Fetcher
>> (and have been replaced), or they were a part of a private patch (as
>> was the case with throttling).
>
> Don't you think that throttling would be a valuable feature to retain?
> Is there anything to prevent saturation of the link to the Internet,
> either in the release 0.9 or in the current nightly builds code?
>
You can achieve a somewhat similar effect by controlling the number of
fetcher threads. I realize this is not as accurate as a specific control
mechanism, but so far it was sufficient for most users.
If this feature is important to you, please provide a patch that
implements it, and we'll consider it for inclusion.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Is fetcher.throttle.bandwidth known to work?
Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message -----
From: "Andrzej Bialecki" <ab...@getopt.org>
Sent: Monday, June 04, 2007 2:05 PM
>> Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions ,
>> so I thought it was for real...
>
> Sorry, this page is wrong and should be corrected - some of the options
> listed there were either a part of older version of Fetcher (and have been
> replaced), or they were a part of a private patch (as was the case with
> throttling).
Don't you think that throttling would be a valuable feature to retain? Is
there anything to prevent saturation of the link to the Internet, either in
the release 0.9 or in the current nightly builds code?
Enzo
Re: Is fetcher.throttle.bandwidth known to work?
Posted by Andrzej Bialecki <ab...@getopt.org>.
Enzo Michelangeli wrote:
> ----- Original Message ----- From: "Andrzej Bialecki" <ab...@getopt.org>
> Sent: Monday, June 04, 2007 1:31 AM
>
>> Enzo Michelangeli wrote:
>>> In my case (with Nutch 0.8), it seems not: I set it to 500, and the
>>> fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work
>>> for
>>> the total bandwidth, or for each thread?
>>
>> There's nothing in the current code base to support this, neither
>> there is
>> a config property with such name ... Is this perhaps a part of your local
>> code base?
>
> Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions ,
> so I
> thought it was for real...
Sorry, this page is wrong and should be corrected - some of the options
listed there were either a part of older version of Fetcher (and have
been replaced), or they were a part of a private patch (as was the case
with throttling).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Is fetcher.throttle.bandwidth known to work?
Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message -----
From: "Andrzej Bialecki" <ab...@getopt.org>
Sent: Monday, June 04, 2007 1:31 AM
> Enzo Michelangeli wrote:
>> In my case (with Nutch 0.8), it seems not: I set it to 500, and the
>> fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work for
>> the total bandwidth, or for each thread?
>
> There's nothing in the current code base to support this, neither there is
> a config property with such name ... Is this perhaps a part of your local
> code base?
Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions , so I
thought it was for real...
Enzo
Re: Is fetcher.throttle.bandwidth known to work?
Posted by Andrzej Bialecki <ab...@getopt.org>.
Enzo Michelangeli wrote:
> In my case (with Nutch 0.8), it seems not: I set it to 500, and the
> fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work
> for the total bandwidth, or for each thread?
There's nothing in the current code base to support this, neither there
is a config property with such name ... Is this perhaps a part of your
local code base?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com