You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by qi wu <ch...@gmail.com> on 2007/04/02 18:15:20 UTC

Fetcher2 too many spinWaiting, How to tune?

Hi,
  I am using  Fetcher2 with 200 threads started. I get a satisify speed(about 20pages/s)  at the beginning stage ,but after no more than one hour,there are many spinWaiting threads. 
    "2007-04-02 15:32:07,350 WARN  fetcher.Fetcher2 - -activeThreads=199, spinWaiting=198, fetchQueues.totalSize=10000"
Below are the configurations:
Seed URLs:  2000,Only one URL for one website.
OS: Linux 2.4.21
CPU : Intel(R) Xeon(TM) CPU 3.00GHz *2
Memory: 2G

CPU usage is very low,no more than 50%.Memory usage is very high,only 10% memory left during fetch,start crawl with "-xmx1000M".The thread dump thread show they all are waiting at "Thread.sleep(500)".
      spinWaiting.incrementAndGet(500);
       try {
        Thread.sleep(500);
       } catch (Exception e) {
       }
       spinWaiting.decrementAndGet();
Where might be the bottleneck? network, memory or anyplace else? Could you also give me some hints on how to get more detailed debug info?

Thanks
-Qi

Re: Fetcher2 too many spinWaiting, How to tune?

Posted by qi wu <ch...@gmail.com>.
Currently  I use default setting for generate.max.per.host, no limit for the fetchlist size for hosts...This might be the point,I'll run a test .I have one question about Fetcher2:
I begined my fetch with 200 threads. After some time,199 active threads left with 198 threads spinWaiting. What's the difference between the 
died one thread and 198 spinWaiting thread ?

In my understanding, If a thread "A" start with fetching website "A", then threadA can only be used to fetch websiteA during the whole life cycle. If the thread A cost too much time(longer than 0.5s)  to finish  fetching and parsing a page, the thread A will be set spinWaiting. Thread A will died if no pages left for Website A.Pls. refer the code below :
if (feeder.isAlive() || fetchQueues.getTotalSize() > 0) {
       LOG.debug(getName() + " spin-waiting ...");
       // spin-wait.
       spinWaiting.incrementAndGet();
       try {
        Thread.sleep(500);
       } catch (Exception e) {
       }
       spinWaiting.decrementAndGet();
       continue;
      } else {
       // all done, finish this thread
       return;
      }

Is my understanding right ?


----- Original Message ----- 
From: "Sami Siren" <ss...@gmail.com>
To: <nu...@lucene.apache.org>
Sent: Tuesday, April 03, 2007 12:29 AM
Subject: Re: Fetcher2 too many spinWaiting, How to tune?


> hi,
> 
> 
> qi wu wrote:
>> Hi, I am using  Fetcher2 with 200 threads started. I get a satisify
>> speed(about 20pages/s)  at the beginning stage ,but after no more
>> than one hour,there are many spinWaiting threads. Where might be the
>> bottleneck? network, memory or anyplace else? Could you also give me
>> some hints on how to get more detailed debug info?
> 
> Not specific to fetcher2, but how are the pages distributed among
> different hosts in fetchlist? Have you configured reasonable setting for
> generate.max.per.host in nutch conf?
> 
> If you generate too many pages for too few hosts there's no way
> fetcher|fetcher2 can fetch them fast unless you make it non polite.
> 
> --
> Sami Siren
> 
>

Re: Fetcher2 too many spinWaiting, How to tune?

Posted by Sami Siren <ss...@gmail.com>.
hi,


qi wu wrote:
> Hi, I am using  Fetcher2 with 200 threads started. I get a satisify
> speed(about 20pages/s)  at the beginning stage ,but after no more
> than one hour,there are many spinWaiting threads. Where might be the
> bottleneck? network, memory or anyplace else? Could you also give me
> some hints on how to get more detailed debug info?

Not specific to fetcher2, but how are the pages distributed among
different hosts in fetchlist? Have you configured reasonable setting for
generate.max.per.host in nutch conf?

If you generate too many pages for too few hosts there's no way
fetcher|fetcher2 can fetch them fast unless you make it non polite.

--
 Sami Siren