You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Stefan Neufeind (JIRA)" <ji...@apache.org> on 2006/05/24 21:31:29 UTC
[jira] Created: (NUTCH-284) NullPointerException during index
NullPointerException during index
---------------------------------
Key: NUTCH-284
URL: http://issues.apache.org/jira/browse/NUTCH-284
Project: Nutch
Type: Bug
Components: indexer
Versions: 0.8-dev
Reporter: Stefan Neufeind
For quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
060524 212613 reduce > sort
060524 212614 reduce > sort
060524 212615 reduce > sort
060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
060524 212619 Optimizing index.
060524 212619 job_jlbhhm
java.lang.NullPointerException
at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-284) NullPointerException during index
Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12414453 ]
Stefan Groschupf commented on NUTCH-284:
----------------------------------------
Please try discuss such things first in the user mailing list than open a issue.
Maintaining the issue tracking is very time consuming. But if there is a bug please continue open bug reports. :)
Thanks.
> NullPointerException during index
> ---------------------------------
>
> Key: NUTCH-284
> URL: http://issues.apache.org/jira/browse/NUTCH-284
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Reporter: Stefan Neufeind
>
> For quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
> at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
> at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
> at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-284) NullPointerException during index
Posted by "Marko Bauhardt (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12413227 ]
Marko Bauhardt commented on NUTCH-284:
--------------------------------------
I think the index-basic plugin is not included? Because
Line 111: .... doc.getField("url").stringValue() ....
The BasicIndexingFilter index the field "url".
Verify your Logfile or the nutch-default.xml (or nutch-site.xml).
Marko
> NullPointerException during index
> ---------------------------------
>
> Key: NUTCH-284
> URL: http://issues.apache.org/jira/browse/NUTCH-284
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Reporter: Stefan Neufeind
>
> For quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
> at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
> at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
> at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-284) NullPointerException during index
Posted by "Stefan Neufeind (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12413240 ]
Stefan Neufeind commented on NUTCH-284:
---------------------------------------
Yes, I was missing index-basic. Please apologize. I needed the extra fields of index-more and thought it would do the basic fields as well.
The same thing occured in NUTCH-51.
Would it be possible to maybe demand that index-basic is loaded (same like "well, you need a scoring-plugin" etc.)? What if somebody writes his own index-basic2-plugin - then he'd have to be able to put an "provides index-basic" into his plugin to notify that he indexes the basic fields or so. Maybe something like this could avoid trouble / searching for some people like me :-)
> NullPointerException during index
> ---------------------------------
>
> Key: NUTCH-284
> URL: http://issues.apache.org/jira/browse/NUTCH-284
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Reporter: Stefan Neufeind
>
> For quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
> at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
> at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
> at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-284) NullPointerException during index
Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12413231 ]
Gal Nitzan commented on NUTCH-284:
----------------------------------
I just had somthing similar.
Try the following:
run ant on each of your tasktrackers machines:
% ant
than restart your nutch and try again.
I think there is a problem with the classpath
> NullPointerException during index
> ---------------------------------
>
> Key: NUTCH-284
> URL: http://issues.apache.org/jira/browse/NUTCH-284
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Reporter: Stefan Neufeind
>
> For quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
> at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
> at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
> at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Closed: (NUTCH-284) NullPointerException during index
Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-284?page=all ]
Stefan Groschupf closed NUTCH-284:
----------------------------------
Resolution: Won't Fix
>Yes, I was missing index-basic.
> NullPointerException during index
> ---------------------------------
>
> Key: NUTCH-284
> URL: http://issues.apache.org/jira/browse/NUTCH-284
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Reporter: Stefan Neufeind
>
> For quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
> at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
> at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
> at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira