You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Abhishek Ramachandran <ab...@mstack.com> on 2017/11/16 09:39:49 UTC

Nutch indexing fails with java.lang.NoSuchFieldError: INSTANCE

Hello,

I'm using Nutch 1.13 to crawl datas and store them to elasticsearch. I have
created some custom parse filter and index filter plugins too. Everything
was working fine.

I updated elasticsearch to version 5. Then, indexer-elastic plugin stopped
working due to version mismatch. Also, from some documentations I came to
know that elasticsearch version 5 will only support in nutch 2+ versions.

But, I stick with this nutch version and found a plugin to index to
elasticsearch over rest from here
<https://github.com/apache/nutch/tree/master/src/plugin/indexer-elastic-rest>.
Made changes in nutch to include this plugin.

Tried crawling and indexing and it worked in local mode of nutch. When I
tried the same in deployed mode, I got the following exception at indexing
phase:

17/11/16 10:53:37 INFO mapreduce.Job: Running job:
job_1510809462003_001017/11/16 10:53:44 INFO mapreduce.Job: Job
job_1510809462003_0010 running in uber mode : false17/11/16 10:53:44
INFO mapreduce.Job:  map 0% reduce 0%17/11/16 10:53:48 INFO
mapreduce.Job:  map 20% reduce 0%17/11/16 10:53:52 INFO mapreduce.Job:
 map 40% reduce 0%17/11/16 10:53:56 INFO mapreduce.Job:  map 60%
reduce 0%17/11/16 10:53:59 INFO mapreduce.Job:  map 80% reduce
20%17/11/16 10:54:02 INFO mapreduce.Job:  map 100% reduce 100%17/11/16
10:54:02 INFO mapreduce.Job: Task Id :
attempt_1510809462003_0010_r_000000_0, Status : FAILEDError:
INSTANCE17/11/16 10:54:03 INFO mapreduce.Job:  map 100% reduce
0%17/11/16 10:54:06 INFO mapreduce.Job: Task Id :
attempt_1510809462003_0010_r_000000_1, Status : FAILEDError:
INSTANCE17/11/16 10:54:10 INFO mapreduce.Job: Task Id :
attempt_1510809462003_0010_r_000000_2, Status : FAILEDError:
INSTANCE17/11/16 10:54:15 INFO mapreduce.Job:  map 100% reduce
100%17/11/16 10:54:15 INFO mapreduce.Job: Job job_1510809462003_0010
failed with state FAILED due to: Task failed
task_1510809462003_0010_r_000000Job failed as tasks failed.
failedMaps:0 failedReduces:1
17/11/16 10:54:15 INFO mapreduce.Job: Counters: 38File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=804602
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=44204
HDFS: Number of bytes written=0
HDFS: Number of read operations=20
HDFS: Number of large read operations=0
HDFS: Number of write operations=0Job Counters Failed reduce
tasks=4Killed map tasks=1Launched map tasks=5Launched reduce
tasks=4Data-local map tasks=5Total time spent by all maps in occupied
slots (ms)=39484Total time spent by all reduces in occupied slots
(ms)=16866Total time spent by all map tasks (ms)=9871Total time spent
by all reduce tasks (ms)=16866Total vcore-milliseconds taken by all
map tasks=9871Total time spent by all reduce tasks (ms)=16866Total
vcore-milliseconds taken by all map tasks=9871Total vcore-milliseconds
taken by all reduce tasks=16866Total megabyte-milliseconds taken by
all map tasks=40431616Total megabyte-milliseconds taken by all reduce
tasks=17270784Map-Reduce FrameworkMap input records=436Map output
records=436Map output bytes=55396Map output materialized
bytes=56302Input split bytes=698Combine input records=0Spilled
Records=436Failed Shuffles=0Merged Map outputs=0
GC time elapsed (ms)=246
CPU time spent (ms)=3840Physical memory (bytes)
snapshot=1559916544Virtual memory (bytes) snapshot=25255698432Total
committed heap usage (bytes)=1503657984File Input Format Counters
Bytes Read=4350617/11/16 10:54:15 ERROR impl.JobWorker: Cannot run job
worker!
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:94)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:87)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:352)
at org.apache.nutch.service.impl.JobWorker.run(JobWorker.java:71)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Hadoop log is :

2017-11-16 10:54:13,731 INFO [main]
org.apache.nutch.indexer.IndexWriters: Adding
org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter2017-11-16
10:54:13,801 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error
running child : java.lang.NoSuchFieldError: INSTANCE
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
    at org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter.open(ElasticRestIndexWriter.java:133)
    at org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:75)
    at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:39)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:484)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

After searching about this, I came to know that it is due to some version
issue with http jars. The hadoop version I used is 2.7.2. I tried the same
with hadoop version 2.8.2 and the result was the same.

Looking for solutions.

-- 
Regards,
*Abhishek Ramachandran*
*abhishek.r@mstack.com <ab...@mstack.com>*
* <http://www.mstack.com/>*

-- 
 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you should not disseminate, distribute or copy this 
e-mail. Please notify the sender immediately and destroy all copies of this 
message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient 
should check this email and any attachments for the presence of viruses. 
The company accepts no liability for any damage caused by any virus 
transmitted by this email.