You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Sachin Shaju <sa...@mstack.com> on 2016/10/07 10:44:04 UTC

Unknown issue in Nutch indexer with REST api

Hi,
    I was trying to expose nutch using REST endpoints and ran into an issue
in indexer phase. I'm using elasticsearch index writer to index docs to ES.
I've used $NUTCH_HOME/runtime/deploy/bin/nutch startserver command. While
indexing an unknown exception is thrown.

Error:
com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
16/10/07 16:01:47 INFO mapreduce.Job:  map 100% reduce 0%
16/10/07 16:01:49 INFO mapreduce.Job: Task Id :
attempt_1475748314769_0107_r_000000_1, Status : FAILED
Error:
com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
16/10/07 16:01:53 INFO mapreduce.Job: Task Id :
attempt_1475748314769_0107_r_000000_2, Status : FAILED
Error:
com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
16/10/07 16:01:58 INFO mapreduce.Job:  map 100% reduce 100%
16/10/07 16:01:59 INFO mapreduce.Job: Job job_1475748314769_0107 failed
with state FAILED due to: Task failed task_1475748314769_0107_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1

ERROR indexer.IndexingJob: Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

Failed with exit code 255.

Any help would be appreciated.

PS : After debugging using stack trace I think the issue is due to mismatch
in guava version. I've tried changing build.xml of plugins(parse-tika and
parsefilter-naivebayes) but it didn't work.


Regards,
Sachin Shaju

sachin.s@mstack.com

-- 
 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you should not disseminate, distribute or copy this 
e-mail. Please notify the sender immediately and destroy all copies of this 
message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient 
should check this email and any attachments for the presence of viruses. 
The company accepts no liability for any damage caused by any virus 
transmitted by this email.

www.mStack.com

Re: Unknown issue in Nutch indexer with REST api

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

in distributed mode the Guava is probably taken from the Hadoop installation, e.g. from
 /usr/lib/hadoop-mapreduce/
(same for other Hadoop libs and their dependencies)

Ev., you need to change the Hadoop version used to compile Nutch (in ivy/ivy.xml) to that of your
Hadoop distribution. Then run
 ant clean runtime test
Getting everything properly compiled may take some trials, in case there are happen to appear other
conflicts.

But first you should test local mode, as recommended by MrSrivastavaRK. However, for the recent
master of Nutch there is no mismatch:
 lib/guava-18.0.jar
 plugins/indexer-elastic/guava-18.0.jar

Please, don't forget to report the versions of Nutch and Hadoop used. Thanks!

Best,
Sebastian

On 10/09/2016 05:54 PM, MrSrivastavaRK . wrote:
> Just a thought.... Try too run in local mode also,  to test dependencies.
> 
> On Oct 7, 2016 4:14 PM, "Sachin Shaju" <sa...@mstack.com> wrote:
> 
>> Hi,
>>     I was trying to expose nutch using REST endpoints and ran into an issue
>> in indexer phase. I'm using elasticsearch index writer to index docs to ES.
>> I've used $NUTCH_HOME/runtime/deploy/bin/nutch startserver command. While
>> indexing an unknown exception is thrown.
>>
>> Error:
>> com.google.common.util.concurrent.MoreExecutors.
>> directExecutor()Ljava/util/concurrent/Executor;
>> 16/10/07 16:01:47 INFO mapreduce.Job:  map 100% reduce 0%
>> 16/10/07 16:01:49 INFO mapreduce.Job: Task Id :
>> attempt_1475748314769_0107_r_000000_1, Status : FAILED
>> Error:
>> com.google.common.util.concurrent.MoreExecutors.
>> directExecutor()Ljava/util/concurrent/Executor;
>> 16/10/07 16:01:53 INFO mapreduce.Job: Task Id :
>> attempt_1475748314769_0107_r_000000_2, Status : FAILED
>> Error:
>> com.google.common.util.concurrent.MoreExecutors.
>> directExecutor()Ljava/util/concurrent/Executor;
>> 16/10/07 16:01:58 INFO mapreduce.Job:  map 100% reduce 100%
>> 16/10/07 16:01:59 INFO mapreduce.Job: Job job_1475748314769_0107 failed
>> with state FAILED due to: Task failed task_1475748314769_0107_r_000000
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> ERROR indexer.IndexingJob: Indexer: java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
>> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
>> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
>>
>> Failed with exit code 255.
>>
>> Any help would be appreciated.
>>
>> PS : After debugging using stack trace I think the issue is due to mismatch
>> in guava version. I've tried changing build.xml of plugins(parse-tika and
>> parsefilter-naivebayes) but it didn't work.
>>
>>
>> Regards,
>> Sachin Shaju
>>
>> sachin.s@mstack.com
>>
>> --
>>
>>
>> The information contained in this electronic message and any attachments to
>> this message are intended for the exclusive use of the addressee(s) and may
>> contain proprietary, confidential or privileged information. If you are not
>> the intended recipient, you should not disseminate, distribute or copy this
>> e-mail. Please notify the sender immediately and destroy all copies of this
>> message and any attachments.
>>
>> WARNING: Computer viruses can be transmitted via email. The recipient
>> should check this email and any attachments for the presence of viruses.
>> The company accepts no liability for any damage caused by any virus
>> transmitted by this email.
>>
>> www.mStack.com
>>
> 


Re: Unknown issue in Nutch indexer with REST api

Posted by "MrSrivastavaRK ." <sr...@gmail.com>.
Just a thought.... Try too run in local mode also,  to test dependencies.

On Oct 7, 2016 4:14 PM, "Sachin Shaju" <sa...@mstack.com> wrote:

> Hi,
>     I was trying to expose nutch using REST endpoints and ran into an issue
> in indexer phase. I'm using elasticsearch index writer to index docs to ES.
> I've used $NUTCH_HOME/runtime/deploy/bin/nutch startserver command. While
> indexing an unknown exception is thrown.
>
> Error:
> com.google.common.util.concurrent.MoreExecutors.
> directExecutor()Ljava/util/concurrent/Executor;
> 16/10/07 16:01:47 INFO mapreduce.Job:  map 100% reduce 0%
> 16/10/07 16:01:49 INFO mapreduce.Job: Task Id :
> attempt_1475748314769_0107_r_000000_1, Status : FAILED
> Error:
> com.google.common.util.concurrent.MoreExecutors.
> directExecutor()Ljava/util/concurrent/Executor;
> 16/10/07 16:01:53 INFO mapreduce.Job: Task Id :
> attempt_1475748314769_0107_r_000000_2, Status : FAILED
> Error:
> com.google.common.util.concurrent.MoreExecutors.
> directExecutor()Ljava/util/concurrent/Executor;
> 16/10/07 16:01:58 INFO mapreduce.Job:  map 100% reduce 100%
> 16/10/07 16:01:59 INFO mapreduce.Job: Job job_1475748314769_0107 failed
> with state FAILED due to: Task failed task_1475748314769_0107_r_000000
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> ERROR indexer.IndexingJob: Indexer: java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
>
> Failed with exit code 255.
>
> Any help would be appreciated.
>
> PS : After debugging using stack trace I think the issue is due to mismatch
> in guava version. I've tried changing build.xml of plugins(parse-tika and
> parsefilter-naivebayes) but it didn't work.
>
>
> Regards,
> Sachin Shaju
>
> sachin.s@mstack.com
>
> --
>
>
> The information contained in this electronic message and any attachments to
> this message are intended for the exclusive use of the addressee(s) and may
> contain proprietary, confidential or privileged information. If you are not
> the intended recipient, you should not disseminate, distribute or copy this
> e-mail. Please notify the sender immediately and destroy all copies of this
> message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
>
> www.mStack.com
>