You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Binoy Dalal <bi...@gmail.com> on 2016/02/13 17:01:37 UTC

runtime exception during nutch generate

Hello everyone,
I'm trying to run nutch for the first time and while executing
*/bin/nutch generate -topN 5*
I get the following exception:
GeneratorJob: starting at 2016-02-13 21:01:42
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 5
GeneratorJob: java.lang.RuntimeException: job failed:
name=apache-nutch-2.3.1.jar, jobid=job_local1061440919_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330)

Here is the stacktrace from *hadoop.log*:
2016-02-13 21:01:44,541 ERROR mapreduce.GoraRecordReader - Error reading
Gora records: null
2016-02-13 21:01:44,557 WARN  mapred.LocalJobRunner -
job_local1061440919_0001
java.lang.Exception: java.lang.RuntimeException:
java.util.NoSuchElementException
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.util.NoSuchElementException
        at
org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122)
        at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
        at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException
        at
java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
        at org.apache.gora.memory.store.MemStore.execute(MemStore.java:128)
        at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:73)
        at
org.apache.gora.mapreduce.GoraRecordReader.executeQuery(GoraRecordReader.java:67)
        at
org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:109)
        ... 12 more

I've been following the tutorial here:
https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup for
setting up nutch.

I've seen a few posts on stackoverflow and the nutch archives with similar
exceptions, and they've suggested that I might be running out of disk space
in my /tmp directory but the /tmp directory only has about 8MB worth of
data on it.
Other than this, I'm clueless about what is causing this exception

What could be the cause of this exception?

I'm using Nutch 2.3.1 along with HBase 1.1.3 as the datastore and I'm
running it on Ubuntu 15.10

Thanks
-- 
Regards,
Binoy Dalal