You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by "georgiosi ..." <ow...@georgiosi.com> on 2008/01/17 14:26:25 UTC

stop spider

please can you STOP sitesell from leaching and crawling all over my site
www.georgiosi.com , i am receiving false statistics and this is NOT good.
just take it off my site.  : (

Re: stop spider

Posted by Dennis Kubes <ku...@apache.org>.
You would need to contact them directly.  Nutch is an open source 
project and does NOT run crawlers of its own.  You would need to contact 
the organization that is running the crawlers and/or modify your 
robots.txt file to block (well behaved) robots.

Dennis Kubes

georgiosi ... wrote:
> please can you STOP sitesell from leaching and crawling all over my site
> www.georgiosi.com , i am receiving false statistics and this is NOT good.
> just take it off my site.  : (
> 

Re: stop spider

Posted by Martin Kuen <ma...@gmail.com>.
Hi,

Nutch is a software project and does not host/store a search index.
Furthermore no websites are crawled by the software project itself.
You are observing somebody USING nutch to crawl your site. The people
using/maintaining/developing the software called nutch are indeed interested
in misbehaving crawlers.

However, I just tried to access http://www.georgiosi.com/robots.txt and
could not find anything. If you don't want webspiders to crawl your site you
should/have to maintain a "robots.txt" file. The nutch spider does
by-default obey the robots exclusion protocol.

adding:
User-agent: Nutch
disallow: /*
to robots.txt blocks the nutchspider


Best Regards,

Martin

On Jan 17, 2008 2:26 PM, georgiosi ... <ow...@georgiosi.com> wrote:

> please can you STOP sitesell from leaching and crawling all over my site
> www.georgiosi.com , i am receiving false statistics and this is NOT good.
> just take it off my site.  : (
>

Re: stop spider

Posted by Andrzej Bialecki <ab...@getopt.org>.
georgiosi ... wrote:
> please can you STOP sitesell from leaching and crawling all over my site
> www.georgiosi.com , i am receiving false statistics and this is NOT good.
> just take it off my site.  : (
> 

Please contact the admins at Sitesell. This mailing list concerns the 
Nuch software project - we are not doing any crawling, we just develop 
the software. The user agent string that they report is a generic value 
in the default Nutch configuration.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com