You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "Meraj A. Khan" <me...@gmail.com> on 2014/09/28 21:41:11 UTC

bin/crawl script going out of synch with the Hadoop job.

Hi All,

I am running the bin/crawl script on Apache Nutch 1.7 and Apache Hadoop
YARN 2.3.0 , however I see that the status of the submitted Fetch job is
not updated on the console and as a result the script does not execute any
subsequent steps i.e updatedb and generate etc.

Essentially it seems that the heartbeat of the submitted Hadoop job is not
reaching the bin/crawl script and it is terminating immediately after the
first fetch , does any know what this is symptomatic of ?

This is what I see in the console and after this the contact with the
Hadoop job is lost from the script.

14/08/28 08:36:19 INFO mapreduce.Job:  map 54% reduce 0%
14/08/28 08:44:13 INFO mapreduce.Job:  map 55% reduce 0%
14/08/28 08:52:16 INFO mapreduce.Job:  map 56% reduce 0%
14/08/28 08:59:22 INFO mapreduce.Job:  map 57% reduce 0%
14/08/28 09:07:33 INFO mapreduce.Job:  map 58% reduce 0%



Thanks.