You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by WebExpertsAmerica <ex...@WebExpertsAmerica.com> on 2005/09/23 21:25:49 UTC

Your Nutch Crawler is Out of Control - Apache Notified

You crawler is ignoring our robots.txt file.

http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"
128.95.1.189

You are eating bandwidth at our domain in incredible amounts. This is
rude. 

Please stop or we will be forced to block your IP and the crawler you
are using.

Best Regards,

Web Experts America

>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<
WebExpertsAmerica.com
Whole Lot More for a Whole Lot Less�
$6/hr Professional Web Services
http://www.WebExpertsAmerica.com

Testimonials:
http://www.WebExpertsAmerica.com/testimonials.htm

Website Solutions:
http://www.WebExpertsAmerica.com/services.htm

Chat:
WebExpertsNOW
AOL, MSN (Hotmail), and Yahoo
*Contact us anytime via chat. However, we DENY, BLOCK, and BAN anyone
that adds us to their Friend/Buddy list. Nothing personal, a security
policy to protect our chat connectivity from competitor abuse.

Terms of Service:
http://www.WebExpertsAmerica.com/tos.htm

Confidential:
The information contained in this message is privileged and confidential
and protected from disclosure. If the reader of this message is not the
intended recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify us
immediately by replying to this message and then delete it from your
computer.



RE: Your Nutch Crawler is Out of Control - Apache Notified

Posted by Wild Dancer <wi...@rogers.com>.
Obviously, Web Experts have very bad UPload bandwidth.

Frankly, classic installation of Apache with 150 "connections" will fail
against 15 threads of Nutch, nothing related to a bandwidth, even if it
is 8Mbps/800kbps for home-based sites.

May be Web Experts need to tune Apache Web Server, and use "worker"
model instead of "pre-fork"? It allows to handle 6000 concurrent users
(1024 RAM)... It saves memory using threads instead of processes...


-----Original Message-----
From: WebExpertsAmerica [mailto:expert@WebExpertsAmerica.com] 
Sent: Friday, September 23, 2005 3:26 PM
To: abuse@cac.washington.edu; noc@cac.washington.edu
Cc: nutch-agent@lucene.apache.org
Subject: Your Nutch Crawler is Out of Control - Apache Notified
Importance: High



You crawler is ignoring our robots.txt file.

http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"
128.95.1.189

You are eating bandwidth at our domain in incredible amounts. This is
rude. 

Please stop or we will be forced to block your IP and the crawler you
are using.

Best Regards,

Web Experts America

>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<
WebExpertsAmerica.com
Whole Lot More for a Whole Lot LessC
$6/hr Professional Web Services http://www.WebExpertsAmerica.com

Testimonials:
http://www.WebExpertsAmerica.com/testimonials.htm

Website Solutions: http://www.WebExpertsAmerica.com/services.htm

Chat:
WebExpertsNOW
AOL, MSN (Hotmail), and Yahoo
*Contact us anytime via chat. However, we DENY, BLOCK, and BAN anyone
that adds us to their Friend/Buddy list. Nothing personal, a security
policy to protect our chat connectivity from competitor abuse.

Terms of Service:
http://www.WebExpertsAmerica.com/tos.htm

Confidential:
The information contained in this message is privileged and confidential
and protected from disclosure. If the reader of this message is not the
intended recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited. If
you have received this communication in error, please notify us
immediately by replying to this message and then delete it from your
computer.