You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Phạm Hải Thanh <ph...@vasc.com.vn> on 2007/01/10 08:54:01 UTC

fetcher fails with NullPointerException

Hi everybody, 

I have worked on Nutch for some days but can not make it work. Below is some output when crawling with nutch crawl. I have no idea why the fetcher failed with NullPointerException. I have made some searching but find no answer with this fail. Anyone can help me ?

 

Thanks for reading.

 

I’m using Solaris 10 Sparc, running with SF V440. Here’s my configs:

 

The url dir (/export/home/nutch/urls) have 2 file:

 

      Netmode: contains: http://netmode.vietnamnet.vn

      Localhost: contains: http://localhost:8080

 

The crawl-urlfilter.txt contains this:

 

# accept hosts in MY.DOMAIN.NAME

+^http://netmode.vietnamnet.vn

+^http://localhost:8080/nutch

 

When running with this shell script:

 

crawls=/export/home/nutch/crawls

urldir=/export/home/nutch/urls

 

rm -r $crawls

nutch crawl $urldir -dir $crawls -depth 1

 

It shows:

 

crawl started in: /export/home/nutch/crawls

rootUrlDir = /export/home/nutch/urls

threads = 10

depth = 1

Injector: starting

Injector: crawlDb: /export/home/nutch/crawls/crawldb

Injector: urlDir: /export/home/nutch/urls

Injector: Converting injected urls to crawl db entries.

Injector: Merging injected urls into crawl db.

Injector: done

Generator: starting

Generator: segment: /export/home/nutch/crawls/segments/20070110144113

Generator: Selecting best-scoring urls due for fetch.

Generator: Partitioning selected urls by host, for politeness.

Generator: done.

Fetcher: starting

Fetcher: segment: /export/home/nutch/crawls/segments/20070110144113

Fetcher: threads: 10

fetching http://localhost:8080/nutch

fetching http://netmode.vietnamnet.vn/

fetch of http://localhost:8080/nutch failed with: java.lang.NullPointerException

fetch of http://netmode.vietnamnet.vn/ failed with: java.lang.NullPointerException

Fetcher: done

CrawlDb update: starting

CrawlDb update: db: /export/home/nutch/crawls/crawldb

CrawlDb update: segment: /export/home/nutch/crawls/segments/20070110144113

CrawlDb update: Merging segment data into db.

CrawlDb update: done

LinkDb: starting

LinkDb: linkdb: /export/home/nutch/crawls/linkdb

LinkDb: adding segment: /export/home/nutch/crawls/segments/20070110144113

LinkDb: done

Indexer: starting

Indexer: linkdb: /export/home/nutch/crawls/linkdb

Indexer: adding segment: /export/home/nutch/crawls/segments/20070110144113

Optimizing index.

Indexer: done

Dedup: starting

Dedup: adding indexes in: /export/home/nutch/crawls/indexes

Dedup: done

Adding /export/home/nutch/crawls/indexes/part-00000

crawl finished: /export/home/nutch/crawls


Re: fetcher fails with NullPointerException

Posted by shrinivas patwardhan <sh...@gmail.com>.
please check if you have entered a value for the property
http.agent.namethat might be one of the resons if u are using nutch
version
0.8x

in your nutch-default.xml or nutch-site.xml
example
<!-- HTTP properties -->
<property>
  <name>http.agent.name</name>
  <value>NutchCVS</value>
  <description>Our HTTP 'User-Agent' request header.</description>
</property>

Thanks & Regards
Shrinivas Patwardhan