You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Phạm Hải Thanh <ph...@vasc.com.vn> on 2007/01/10 08:54:01 UTC
fetcher fails with NullPointerException
Hi everybody,
I have worked on Nutch for some days but can not make it work. Below is some output when crawling with nutch crawl. I have no idea why the fetcher failed with NullPointerException. I have made some searching but find no answer with this fail. Anyone can help me ?
Thanks for reading.
I’m using Solaris 10 Sparc, running with SF V440. Here’s my configs:
The url dir (/export/home/nutch/urls) have 2 file:
Netmode: contains: http://netmode.vietnamnet.vn
Localhost: contains: http://localhost:8080
The crawl-urlfilter.txt contains this:
# accept hosts in MY.DOMAIN.NAME
+^http://netmode.vietnamnet.vn
+^http://localhost:8080/nutch
When running with this shell script:
crawls=/export/home/nutch/crawls
urldir=/export/home/nutch/urls
rm -r $crawls
nutch crawl $urldir -dir $crawls -depth 1
It shows:
crawl started in: /export/home/nutch/crawls
rootUrlDir = /export/home/nutch/urls
threads = 10
depth = 1
Injector: starting
Injector: crawlDb: /export/home/nutch/crawls/crawldb
Injector: urlDir: /export/home/nutch/urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: /export/home/nutch/crawls/segments/20070110144113
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: /export/home/nutch/crawls/segments/20070110144113
Fetcher: threads: 10
fetching http://localhost:8080/nutch
fetching http://netmode.vietnamnet.vn/
fetch of http://localhost:8080/nutch failed with: java.lang.NullPointerException
fetch of http://netmode.vietnamnet.vn/ failed with: java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: /export/home/nutch/crawls/crawldb
CrawlDb update: segment: /export/home/nutch/crawls/segments/20070110144113
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: /export/home/nutch/crawls/linkdb
LinkDb: adding segment: /export/home/nutch/crawls/segments/20070110144113
LinkDb: done
Indexer: starting
Indexer: linkdb: /export/home/nutch/crawls/linkdb
Indexer: adding segment: /export/home/nutch/crawls/segments/20070110144113
Optimizing index.
Indexer: done
Dedup: starting
Dedup: adding indexes in: /export/home/nutch/crawls/indexes
Dedup: done
Adding /export/home/nutch/crawls/indexes/part-00000
crawl finished: /export/home/nutch/crawls
Re: fetcher fails with NullPointerException
Posted by shrinivas patwardhan <sh...@gmail.com>.
please check if you have entered a value for the property
http.agent.namethat might be one of the resons if u are using nutch
version
0.8x
in your nutch-default.xml or nutch-site.xml
example
<!-- HTTP properties -->
<property>
<name>http.agent.name</name>
<value>NutchCVS</value>
<description>Our HTTP 'User-Agent' request header.</description>
</property>
Thanks & Regards
Shrinivas Patwardhan