You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Meraj A. Khan" <me...@gmail.com> on 2014/09/28 21:41:11 UTC
bin/crawl script going out of synch with the Hadoop job.
Hi All,
I am running the bin/crawl script on Apache Nutch 1.7 and Apache Hadoop
YARN 2.3.0 , however I see that the status of the submitted Fetch job is
not updated on the console and as a result the script does not execute any
subsequent steps i.e updatedb and generate etc.
Essentially it seems that the heartbeat of the submitted Hadoop job is not
reaching the bin/crawl script and it is terminating immediately after the
first fetch , does any know what this is symptomatic of ?
This is what I see in the console and after this the contact with the
Hadoop job is lost from the script.
14/08/28 08:36:19 INFO mapreduce.Job: map 54% reduce 0%
14/08/28 08:44:13 INFO mapreduce.Job: map 55% reduce 0%
14/08/28 08:52:16 INFO mapreduce.Job: map 56% reduce 0%
14/08/28 08:59:22 INFO mapreduce.Job: map 57% reduce 0%
14/08/28 09:07:33 INFO mapreduce.Job: map 58% reduce 0%
Thanks.