You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/12 23:16:27 UTC
[jira] Resolved: (NUTCH-428) NullPointerException
[ https://issues.apache.org/jira/browse/NUTCH-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sami Siren resolved NUTCH-428.
------------------------------
Resolution: Fixed
Fix Version/s: 0.9.0
Most propably you dont have agent name configured in nutch-site.xml. I changed this situation to emit RuntimeException in trunk instead so it's easier to diagnose.
> NullPointerException
> --------------------
>
> Key: NUTCH-428
> URL: https://issues.apache.org/jira/browse/NUTCH-428
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.8.1
> Environment: Windows XP
> Reporter: Piyush
> Fix For: 0.9.0
>
>
> I am using the NUTCH.Bat provided in one one of the thread. (i am not using CYGWIN) Whenever I try to fetch the Item, I am getting fetching failed "nullpointerexception"
> I have a URL Directory. which has urls.txt file. there is only one entry in the file which is http://www.winzip.com/land_about.htm.
> I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/.
> Is there any other settings I am missing?? Any help is greatly appreciated.
> The command i used to start the crawl is
> nutch crawl urls -dir crawlResults -depth 1
> Here is my log
> crawl started in: crawlResult
> rootUrlDir = urls
> threads = 10
> depth = 1
> Injector: starting
> Injector: crawlDb: crawlResult/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: starting
> Generator: segment: crawlResult/segments/20070110085314
> Generator: Selecting best-scoring urls due for fetch.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: crawlResult/segments/20070110085314
> Fetcher: threads: 10
> fetching http://www.winzip.com/land_about.htm
> fetch of http://www.winzip.com/land_about.htm failed with: java.lang.NullPointerException
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: crawlResult/crawldb
> CrawlDb update: segment: crawlResult/segments/20070110085314
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> LinkDb: starting
> LinkDb: linkdb: crawlResult/linkdb
> LinkDb: adding segment: crawlResult/segments/20070110085314
> LinkDb: done
> Indexer: starting
> Indexer: linkdb: crawlResult/linkdb
> Indexer: adding segment: crawlResult/segments/20070110085314
> Optimizing index.
> Indexer: done
> Dedup: starting
> Dedup: adding indexes in: crawlResult/indexes
> Dedup: done
> Adding crawlResult/indexes/part-00000
> crawl finished: crawlResult
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira