You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by Nathan Zipfel <nz...@atlanticbb.net> on 2007/08/30 02:51:24 UTC

Nutch not obeying robots.txt

Hi,

 

Was looking at my log files and I had the Nutch spider blocked many many
months ago, but it doesn't appear to be obeying..

 

http://www.pa-roots.org/robots.txt   is my robots file

 

Here is examples from my log files

 

132.178.248.36 - - [29/Aug/2007:06:11:21 -0400] "GET
/~randolph/history/history.html HTTP/1.0" 200 5340 "-" "Kishore Sajja
Research Crawler/Nutch-0.9 (Masters Project Research Crawler; Masters
Project; kishoresajja@mail.boisestate.edu)"

132.178.248.36 - - [29/Aug/2007:06:36:55 -0400] "GET
/~randolph/civilwar/index.html HTTP/1.0" 200 1551 "-" "Kishore Sajja
Research Crawler/Nutch-0.9 (Masters Project Research Crawler; Masters
Project; kishoresajja@mail.boisestate.edu)"

 

 

Nathan Zipfel
PA-Roots.com
 <http://www.pa-roots.com/> http://www.pa-roots.com/