You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Yossi Tamari <yo...@pipl.com> on 2017/05/16 11:48:58 UTC
IllegalStateException in CleaningJob on ElasticSearch 2.3.3
Hi,
When running 'crawl -i', I get the following exception in the second
iteration, during the CleaningJob:
Cleaning up index if possible
/data/apache-nutch-1.13/runtime/deploy/bin/nutch clean crawl-inbar/crawldb
17/05/16 05:40:32 INFO indexer.CleaningJob: CleaningJob: starting at
2017-05-16 05:40:32
17/05/16 05:40:33 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
17/05/16 05:40:33 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
17/05/16 05:40:34 INFO mapred.FileInputFormat: Total input paths to process
: 1
17/05/16 05:40:34 INFO mapreduce.JobSubmitter: number of splits:2
17/05/16 05:40:34 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1493910246747_0030
17/05/16 05:40:34 INFO impl.YarnClientImpl: Submitted application
application_1493910246747_0030
17/05/16 05:40:34 INFO mapreduce.Job: The url to track the job:
http://crawler001.pipl.com:8088/proxy/application_1493910246747_0030/
17/05/16 05:40:34 INFO mapreduce.Job: Running job: job_1493910246747_0030
17/05/16 05:40:43 INFO mapreduce.Job: Job job_1493910246747_0030 running in
uber mode : false
17/05/16 05:40:43 INFO mapreduce.Job: map 0% reduce 0%
17/05/16 05:40:48 INFO mapreduce.Job: map 50% reduce 0%
17/05/16 05:40:52 INFO mapreduce.Job: map 100% reduce 0%
17/05/16 05:40:53 INFO mapreduce.Job: Task Id :
attempt_1493910246747_0030_r_000000_0, Status : FAILED
Error: java.lang.IllegalStateException: bulk process already closed
at
org.elasticsearch.action.bulk.BulkProcessor.ensureOpen(BulkProcessor.java:27
8)
at
org.elasticsearch.action.bulk.BulkProcessor.flush(BulkProcessor.java:329)
at
org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.commit(ElasticIndexW
riter.java:200)
at
org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:127)
at
org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:1
25)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
This happens in all the reduce tasks for this job. In the first iteration
the CleaningJob finished successfully.
Any ideas what may be causing this?
Thanks,
Yossi.