You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Mengying Wang (JIRA)" <ji...@apache.org> on 2014/10/31 05:57:33 UTC

[jira] [Created] (NUTCH-1884) Java.lang.NullPointerException when using the parsechecker and indexchecker

Mengying Wang created NUTCH-1884:
------------------------------------

             Summary: Java.lang.NullPointerException when using the parsechecker and indexchecker
                 Key: NUTCH-1884
                 URL: https://issues.apache.org/jira/browse/NUTCH-1884
             Project: Nutch
          Issue Type: Bug
          Components: indexer, parser
    Affects Versions: 1.9
         Environment: Mac OS X 10.9.2
Apache Maven 2.2.1
Java version: 1.7.0_51
            Reporter: Mengying Wang
            Priority: Minor


I have downloaded the Nutch source code from github (https://github.com/apache/nutch), applied the patches (NUTCH-1879 and NUTCH-1880), and then reinstalled the Nutch.  Now the good news is that all urls contain only 1 slash. But unfortunately, the java.lang.NullPointerException warning/error still exists for both of the parsechecker and indexchecker commands.

Below is the running log:

(1) $ ./nutch parsechecker "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
fetching: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
parsing: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
contentType: text/html
signature: 17bdb44990391c96bb8d48d1802ff11c
Couldn't pass score, url file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ (java.lang.NullPointerException)
---------
Url
---------------

file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
---------
ParseData
---------

Version: 5
Status: success(1,0)
Title: Index of /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
Outlinks: 2
  outlink: toUrl: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/ anchor: ../
  outlink: toUrl: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml anchor: monitor.xml
Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue, 14 Oct 2014 20:05:50 GMT Content-Type=text/html 
Parse Metadata: CharEncodingForConversion=windows-1252 OriginalCharEncoding=windows-1252 

(2) $ ./nutch indexchecker "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
fetching: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
parsing: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
contentType: text/html
Exception in thread "main" java.lang.NullPointerException
	at org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)