You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Matt Zytaruk <ma...@wavefire.com> on 2005/10/06 17:59:15 UTC

Fetcher Speed Issues

Hi there, I just started working on a search engine based on the nutch 
project, but we are finding that the fetcher is crawling extremely slow. 
I've seen posts talking about people maxing out their 5mb lines with the 
fetcher, but we can't seem to get anymore than about 20k/s or 1.5 
pages/second, which isnt even a smidgen of our capacity, even with 
-threads set to 200 . This is using the mapred branch, in freebsd 4.

Are there any settings we might be missing that would cause this 
slowdown? or are there certain network configurations that could be 
causing this?

Also, is the -numFetchers option in 'nutch generate' broken in the 
mapred branch? it worked fine in 0.7, but doesn't seem to do anything in 
0.8-dev.

Thanks a lot for your help.

Matt Zytaruk

RE: Fetcher Speed Issues

Posted by Fuad Efendi <fu...@efendi.ca>.
>We are finding that the fetcher is crawling extremely slow. 

I am going to run some performance tests during this long weekend,
in-home network with Apache HTTPD, and with browsable copy of
www.apache.org 


1. Nutch-0.7.1 with protocol-http plugin

2. Nutch-0.7.1 with protocol-httpclient
 
2. Modified protocol-http plugin with added "Connection: close" before
Socket.close() (I suspect we overload Web Servers) 

3. with new plugin based on
http://www.innovation.ch/java/HTTPClient/advanced_info.html#pers_con,
with Keep-Alive

I'll publish results

-Fuad