You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by martin <ma...@gmail.com> on 2006/09/23 10:20:46 UTC

Why i can't get any data?

I followed the tutorial part 1:Intranet Crawling(
http://lucene.apache.org/nutch/tutorial8.html) step by step,and finally i
enter the line:

bin/nutch crawl urls -dir crawl -depth 3 -topN 50

but i can't get any data.I saw some errors in logs/hadoop.log
I don't know why i got such an error:

2006-09-23 15:22:05,010 INFO  net.UrlNormalizerFactory - Using URL
> normalizer: org.apache.nutch.net.BasicUrlNormalizer
> 2006-09-23 15:22:05,011 INFO  fetcher.Fetcher - fetching
> http://lucene.apache.org/nutch/
> 2006-09-23 15:22:05,050 FATAL api.HttpBase - No User-Agent string set (
> http.agent.name)!
> 2006-09-23 15:22:05,050 FATAL api.RobotRulesParser - Agent we advertise
> (null) not listed first in 'http.robots.agents' property!
> 2006-09-23 15:22:05,050 INFO  fetcher.Fetcher - fetch of
> http://lucene.apache.org/nutch/ failed with:
> java.lang.NullPointerException
>

Re: Why i can't get any data?

Posted by martin <ma...@gmail.com>.
Thanks for your help.

I saw the nutch-defualt.xml contains the http.agent.name that is required
property and  i set it in nutch-site.xml ,it works!

Thanks again:)

On 9/23/06, Frank Kempf <fl...@2112portals.com> wrote:
>
> Have a look into the nutch-site.xml  and  nutch-default.xml file
> in the conf directory and set the properties.
> There is some information you will find in the files themselves.
>
>   Kind Regards
>
>     Frank
>

Re: Why i can't get any data?

Posted by Frank Kempf <fl...@2112portals.com>.
Have a look into the nutch-site.xml  and  nutch-default.xml file
in the conf directory and set the properties.
There is some information you will find in the files themselves.

  Kind Regards

    Frank