You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bozhao Tan <bo...@gmail.com> on 2008/07/02 16:32:04 UTC

Question about Nutch crawling

Hello, I do not know why Nutch can not crawl anying from some internet
sites?
Has anyone met this problem?
Thanks!

NewGuyInNutch

Re: Question about Nutch crawling

Posted by John Martyniak <jo...@beforedawn.com>.
Are you using crawl?  you might need to change the crawl-urlfilters.txt file
in the conf directory.
-John

On Wed, Jul 2, 2008 at 8:32 AM, Bozhao Tan <bo...@gmail.com> wrote:

> Hello, I do not know why Nutch can not crawl anying from some internet
> sites?
> Has anyone met this problem?
> Thanks!
>
> NewGuyInNutch
>



-- 
John Martyniak
Before Dawn Solutions, Inc.
9457 S. University Blvd. #266
Highlands Ranch, CO 80126
o: 1-877-499-1562 x707 (Toll Free)
c: 303-522-1756
e: john@beforedawn.com
w: http://www.beforedawn.com

Re: Question about Nutch crawling

Posted by kevin chen <ke...@bdsing.com>.
Can be any number of reasons.
- disabled by robots.txt, this probably most common.
- session controlled.
- authentication.

On Wed, 2008-07-02 at 10:32 -0400, Bozhao Tan wrote:
> Hello, I do not know why Nutch can not crawl anying from some internet
> sites?
> Has anyone met this problem?
> Thanks!
> 
> NewGuyInNutch


Re: Question about Nutch crawling

Posted by Kunthar <ku...@gmail.com>.
Hello,

I don't know why i am living for. Sometimes, i getting nothing from someone.

Has anyone met this problem?

Thank you
OldPalinLivingRoom



On Wed, Jul 2, 2008 at 5:32 PM, Bozhao Tan <bo...@gmail.com> wrote:

> Hello, I do not know why Nutch can not crawl anying from some internet
> sites?
> Has anyone met this problem?
> Thanks!
>
> NewGuyInNutch
>