You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by "georgiosi ..." <ow...@georgiosi.com> on 2008/01/17 14:26:25 UTC
stop spider
please can you STOP sitesell from leaching and crawling all over my site
www.georgiosi.com , i am receiving false statistics and this is NOT good.
just take it off my site. : (
Re: stop spider
Posted by Dennis Kubes <ku...@apache.org>.
You would need to contact them directly. Nutch is an open source
project and does NOT run crawlers of its own. You would need to contact
the organization that is running the crawlers and/or modify your
robots.txt file to block (well behaved) robots.
Dennis Kubes
georgiosi ... wrote:
> please can you STOP sitesell from leaching and crawling all over my site
> www.georgiosi.com , i am receiving false statistics and this is NOT good.
> just take it off my site. : (
>
Re: stop spider
Posted by Martin Kuen <ma...@gmail.com>.
Hi,
Nutch is a software project and does not host/store a search index.
Furthermore no websites are crawled by the software project itself.
You are observing somebody USING nutch to crawl your site. The people
using/maintaining/developing the software called nutch are indeed interested
in misbehaving crawlers.
However, I just tried to access http://www.georgiosi.com/robots.txt and
could not find anything. If you don't want webspiders to crawl your site you
should/have to maintain a "robots.txt" file. The nutch spider does
by-default obey the robots exclusion protocol.
adding:
User-agent: Nutch
disallow: /*
to robots.txt blocks the nutchspider
Best Regards,
Martin
On Jan 17, 2008 2:26 PM, georgiosi ... <ow...@georgiosi.com> wrote:
> please can you STOP sitesell from leaching and crawling all over my site
> www.georgiosi.com , i am receiving false statistics and this is NOT good.
> just take it off my site. : (
>
Re: stop spider
Posted by Andrzej Bialecki <ab...@getopt.org>.
georgiosi ... wrote:
> please can you STOP sitesell from leaching and crawling all over my site
> www.georgiosi.com , i am receiving false statistics and this is NOT good.
> just take it off my site. : (
>
Please contact the admins at Sitesell. This mailing list concerns the
Nuch software project - we are not doing any crawling, we just develop
the software. The user agent string that they report is a generic value
in the default Nutch configuration.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com