You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Dan Kinder <dk...@turnitin.com> on 2014/11/25 00:09:16 UTC

fetcher.throttle.bandwidth

Hi, I'm having trouble finding documentation about how bandwidth throttling
is actually implemented in Nutch. Is it implemented, and if so how? Or do
most people just use squid proxies, etc.?

-dan

Re: fetcher.throttle.bandwidth

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Dan,

we'll (hopefully soon) move the wiki to a new system (see NUTCH-1858).
That's "the" opportunity to update outdated stuff (or remove it).

Thanks,
Sebastian

2014-11-25 23:40 GMT+01:00 Dan Kinder <dk...@turnitin.com>:

> Thanks Sebastien. Sounds like somebody should update or remove that page...
>
> On Tue, Nov 25, 2014 at 12:08 PM, Sebastian Nagel <
> wastl.nagel@googlemail.com> wrote:
>
> > Hi,
> >
> > if it's about a recent Nutch version: there is no such property.
> > (sorry, if it's taken from http://wiki.apache.org/nutch/FetchOptions:
> >  this information is really outdated)
> >
> > With Nutch 1.9 the following properties are available
> > which will cause threads to be started and stopped
> > to come close to the configured bandwidth:
> >
> > <property>
> >   <name>fetcher.bandwidth.target</name>
> >   <value>-1</value>
> >   <description>Target bandwidth in kilobits per sec for each mapper
> > instance. This is used to adjust
> > the number of
> >   fetching threads automatically (up to fetcher.maxNum.threads). A value
> > of -1 deactivates the
> > functionality, in which case
> >   the number of fetching threads is fixed (see
> > fetcher.threads.fetch).</description>
> > </property>
> >
> > <property>
> >   <name>fetcher.maxNum.threads</name>
> >   <value>25</value>
> >   <description>Max number of fetch threads allowed when using
> > fetcher.bandwidth.target. Defaults to
> > fetcher.threads.fetch if unspecified or
> >   set to a value lower than it. </description>
> > </property>
> >
> > <property>
> >   <name>fetcher.bandwidth.target.check.everyNSecs</name>
> >   <value>30</value>
> >   <description>(EXPERT) Value in seconds which determines how frequently
> > we should reassess the
> > optimal number of fetch threads when using
> >    fetcher.bandwidth.target. Defaults to 30 and must be at least
> > 1.</description>
> > </property>
> >
> >
> > Best,
> > Sebastian
> >
> > On 11/25/2014 12:09 AM, Dan Kinder wrote:
> > > Hi, I'm having trouble finding documentation about how bandwidth
> > throttling
> > > is actually implemented in Nutch. Is it implemented, and if so how? Or
> do
> > > most people just use squid proxies, etc.?
> > >
> > > -dan
> > >
> >
> >
>
>
> --
> Dan Kinder
> Senior Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>

Re: fetcher.throttle.bandwidth

Posted by Dan Kinder <dk...@turnitin.com>.
Thanks Sebastien. Sounds like somebody should update or remove that page...

On Tue, Nov 25, 2014 at 12:08 PM, Sebastian Nagel <
wastl.nagel@googlemail.com> wrote:

> Hi,
>
> if it's about a recent Nutch version: there is no such property.
> (sorry, if it's taken from http://wiki.apache.org/nutch/FetchOptions:
>  this information is really outdated)
>
> With Nutch 1.9 the following properties are available
> which will cause threads to be started and stopped
> to come close to the configured bandwidth:
>
> <property>
>   <name>fetcher.bandwidth.target</name>
>   <value>-1</value>
>   <description>Target bandwidth in kilobits per sec for each mapper
> instance. This is used to adjust
> the number of
>   fetching threads automatically (up to fetcher.maxNum.threads). A value
> of -1 deactivates the
> functionality, in which case
>   the number of fetching threads is fixed (see
> fetcher.threads.fetch).</description>
> </property>
>
> <property>
>   <name>fetcher.maxNum.threads</name>
>   <value>25</value>
>   <description>Max number of fetch threads allowed when using
> fetcher.bandwidth.target. Defaults to
> fetcher.threads.fetch if unspecified or
>   set to a value lower than it. </description>
> </property>
>
> <property>
>   <name>fetcher.bandwidth.target.check.everyNSecs</name>
>   <value>30</value>
>   <description>(EXPERT) Value in seconds which determines how frequently
> we should reassess the
> optimal number of fetch threads when using
>    fetcher.bandwidth.target. Defaults to 30 and must be at least
> 1.</description>
> </property>
>
>
> Best,
> Sebastian
>
> On 11/25/2014 12:09 AM, Dan Kinder wrote:
> > Hi, I'm having trouble finding documentation about how bandwidth
> throttling
> > is actually implemented in Nutch. Is it implemented, and if so how? Or do
> > most people just use squid proxies, etc.?
> >
> > -dan
> >
>
>


-- 
Dan Kinder
Senior Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re: fetcher.throttle.bandwidth

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

if it's about a recent Nutch version: there is no such property.
(sorry, if it's taken from http://wiki.apache.org/nutch/FetchOptions:
 this information is really outdated)

With Nutch 1.9 the following properties are available
which will cause threads to be started and stopped
to come close to the configured bandwidth:

<property>
  <name>fetcher.bandwidth.target</name>
  <value>-1</value>
  <description>Target bandwidth in kilobits per sec for each mapper instance. This is used to adjust
the number of
  fetching threads automatically (up to fetcher.maxNum.threads). A value of -1 deactivates the
functionality, in which case
  the number of fetching threads is fixed (see fetcher.threads.fetch).</description>
</property>

<property>
  <name>fetcher.maxNum.threads</name>
  <value>25</value>
  <description>Max number of fetch threads allowed when using fetcher.bandwidth.target. Defaults to
fetcher.threads.fetch if unspecified or
  set to a value lower than it. </description>
</property>

<property>
  <name>fetcher.bandwidth.target.check.everyNSecs</name>
  <value>30</value>
  <description>(EXPERT) Value in seconds which determines how frequently we should reassess the
optimal number of fetch threads when using
   fetcher.bandwidth.target. Defaults to 30 and must be at least 1.</description>
</property>


Best,
Sebastian

On 11/25/2014 12:09 AM, Dan Kinder wrote:
> Hi, I'm having trouble finding documentation about how bandwidth throttling
> is actually implemented in Nutch. Is it implemented, and if so how? Or do
> most people just use squid proxies, etc.?
> 
> -dan
>