You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Yossi Tamari <yo...@pipl.com> on 2017/05/16 11:48:58 UTC
IllegalStateException in CleaningJob on ElasticSearch 2.3.3

Hi,

 

When running 'crawl -i', I get the following exception in the second
iteration, during the CleaningJob:

 

Cleaning up index if possible

/data/apache-nutch-1.13/runtime/deploy/bin/nutch clean crawl-inbar/crawldb

17/05/16 05:40:32 INFO indexer.CleaningJob: CleaningJob: starting at
2017-05-16 05:40:32

17/05/16 05:40:33 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032

17/05/16 05:40:33 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032

17/05/16 05:40:34 INFO mapred.FileInputFormat: Total input paths to process
: 1

17/05/16 05:40:34 INFO mapreduce.JobSubmitter: number of splits:2

17/05/16 05:40:34 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1493910246747_0030

17/05/16 05:40:34 INFO impl.YarnClientImpl: Submitted application
application_1493910246747_0030

17/05/16 05:40:34 INFO mapreduce.Job: The url to track the job:
http://crawler001.pipl.com:8088/proxy/application_1493910246747_0030/

17/05/16 05:40:34 INFO mapreduce.Job: Running job: job_1493910246747_0030

17/05/16 05:40:43 INFO mapreduce.Job: Job job_1493910246747_0030 running in
uber mode : false

17/05/16 05:40:43 INFO mapreduce.Job:  map 0% reduce 0%

17/05/16 05:40:48 INFO mapreduce.Job:  map 50% reduce 0%

17/05/16 05:40:52 INFO mapreduce.Job:  map 100% reduce 0%

17/05/16 05:40:53 INFO mapreduce.Job: Task Id :
attempt_1493910246747_0030_r_000000_0, Status : FAILED

Error: java.lang.IllegalStateException: bulk process already closed

        at
org.elasticsearch.action.bulk.BulkProcessor.ensureOpen(BulkProcessor.java:27
8)

        at
org.elasticsearch.action.bulk.BulkProcessor.flush(BulkProcessor.java:329)

        at
org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.commit(ElasticIndexW
riter.java:200)

        at
org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:127)

        at
org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:1
25)

        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)

        at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)

        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1657)

        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 

This happens in all the reduce tasks for this job. In the first iteration
the CleaningJob finished successfully.

Any ideas what may be causing this?

 

Thanks,

               Yossi.