You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jinchun Kim <ci...@gmail.com> on 2013/03/22 13:02:48 UTC

MapReduce Failed and Killed

Hi, All.

I'm trying to create category-based splits of Wikipedia dataset(41GB) and
the training data set(5GB) using Mahout.
I'm using following command.

$MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o
wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt

I had no problem with the training data set, but Hadoop showed following
messages
when I tried to do a same job with Wikipedia dataset,

.........
13/03/21 22:31:00 INFO mapred.JobClient:  map 27% reduce 1%
13/03/21 22:40:31 INFO mapred.JobClient:  map 27% reduce 2%
13/03/21 22:58:49 INFO mapred.JobClient:  map 27% reduce 3%
13/03/21 23:22:57 INFO mapred.JobClient:  map 27% reduce 4%
13/03/21 23:46:32 INFO mapred.JobClient:  map 27% reduce 5%
13/03/22 00:27:14 INFO mapred.JobClient:  map 27% reduce 6%
13/03/22 01:06:55 INFO mapred.JobClient:  map 27% reduce 7%
13/03/22 01:14:06 INFO mapred.JobClient:  map 27% reduce 3%
13/03/22 01:15:35 INFO mapred.JobClient: Task Id :
attempt_201303211339_0002_r_000000_1, Status : FAILED
Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200
seconds. Killing!
13/03/22 01:20:09 INFO mapred.JobClient:  map 27% reduce 4%
13/03/22 01:33:35 INFO mapred.JobClient: Task Id :
attempt_201303211339_0002_m_000037_1, Status : FAILED
Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228
seconds. Killing!
13/03/22 01:35:12 INFO mapred.JobClient:  map 27% reduce 5%
13/03/22 01:40:38 INFO mapred.JobClient:  map 27% reduce 6%
13/03/22 01:52:28 INFO mapred.JobClient:  map 27% reduce 7%
13/03/22 02:16:27 INFO mapred.JobClient:  map 27% reduce 8%
13/03/22 02:19:02 INFO mapred.JobClient: Task Id :
attempt_201303211339_0002_m_000018_1, Status : FAILED
Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204
seconds. Killing!
13/03/22 02:49:03 INFO mapred.JobClient:  map 27% reduce 9%
13/03/22 02:52:04 INFO mapred.JobClient:  map 28% reduce 9%
........

Because I just started to learn how to run Hadoop, I have no idea how to
solve
this problem...
Does anyone have an idea how to handle this weird thing?

-- 
*Jinchun Kim*

Re: MapReduce Failed and Killed

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Any MapReduce task needs to communicate with the tasktracker that launched
it periodically in order to let the tasktracker know it is still alive and
active. The time for which silence is tolerated is controlled by a
configuration property mapred.task.timeout.

It looks like in your case, this has already been bumped up to 20 minutes
(from the default 10 minutes). It also looks like this is not sufficient.
You could bump this value even further up. However, the correct approach
could be to see what the reducer is actually doing to become inactive
during this time. Can you look at the reducer attempt's logs (which you can
access from the web UI of the Jobtracker) and post them here ?

Thanks
hemanth


On Fri, Mar 22, 2013 at 5:32 PM, Jinchun Kim <ci...@gmail.com> wrote:

> Hi, All.
>
> I'm trying to create category-based splits of Wikipedia dataset(41GB) and
> the training data set(5GB) using Mahout.
> I'm using following command.
>
> $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o
> wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt
>
> I had no problem with the training data set, but Hadoop showed following
> messages
> when I tried to do a same job with Wikipedia dataset,
>
> .........
> 13/03/21 22:31:00 INFO mapred.JobClient:  map 27% reduce 1%
> 13/03/21 22:40:31 INFO mapred.JobClient:  map 27% reduce 2%
> 13/03/21 22:58:49 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/21 23:22:57 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/21 23:46:32 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 00:27:14 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:06:55 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 01:14:06 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/22 01:15:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_r_000000_1, Status : FAILED
> Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200
> seconds. Killing!
> 13/03/22 01:20:09 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/22 01:33:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000037_1, Status : FAILED
> Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228
> seconds. Killing!
> 13/03/22 01:35:12 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 01:40:38 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:52:28 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 02:16:27 INFO mapred.JobClient:  map 27% reduce 8%
> 13/03/22 02:19:02 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000018_1, Status : FAILED
> Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204
> seconds. Killing!
> 13/03/22 02:49:03 INFO mapred.JobClient:  map 27% reduce 9%
> 13/03/22 02:52:04 INFO mapred.JobClient:  map 28% reduce 9%
> ........
>
> Because I just started to learn how to run Hadoop, I have no idea how to
> solve
> this problem...
> Does anyone have an idea how to handle this weird thing?
>
> --
> *Jinchun Kim*
>

Re: MapReduce Failed and Killed

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Any MapReduce task needs to communicate with the tasktracker that launched
it periodically in order to let the tasktracker know it is still alive and
active. The time for which silence is tolerated is controlled by a
configuration property mapred.task.timeout.

It looks like in your case, this has already been bumped up to 20 minutes
(from the default 10 minutes). It also looks like this is not sufficient.
You could bump this value even further up. However, the correct approach
could be to see what the reducer is actually doing to become inactive
during this time. Can you look at the reducer attempt's logs (which you can
access from the web UI of the Jobtracker) and post them here ?

Thanks
hemanth


On Fri, Mar 22, 2013 at 5:32 PM, Jinchun Kim <ci...@gmail.com> wrote:

> Hi, All.
>
> I'm trying to create category-based splits of Wikipedia dataset(41GB) and
> the training data set(5GB) using Mahout.
> I'm using following command.
>
> $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o
> wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt
>
> I had no problem with the training data set, but Hadoop showed following
> messages
> when I tried to do a same job with Wikipedia dataset,
>
> .........
> 13/03/21 22:31:00 INFO mapred.JobClient:  map 27% reduce 1%
> 13/03/21 22:40:31 INFO mapred.JobClient:  map 27% reduce 2%
> 13/03/21 22:58:49 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/21 23:22:57 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/21 23:46:32 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 00:27:14 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:06:55 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 01:14:06 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/22 01:15:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_r_000000_1, Status : FAILED
> Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200
> seconds. Killing!
> 13/03/22 01:20:09 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/22 01:33:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000037_1, Status : FAILED
> Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228
> seconds. Killing!
> 13/03/22 01:35:12 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 01:40:38 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:52:28 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 02:16:27 INFO mapred.JobClient:  map 27% reduce 8%
> 13/03/22 02:19:02 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000018_1, Status : FAILED
> Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204
> seconds. Killing!
> 13/03/22 02:49:03 INFO mapred.JobClient:  map 27% reduce 9%
> 13/03/22 02:52:04 INFO mapred.JobClient:  map 28% reduce 9%
> ........
>
> Because I just started to learn how to run Hadoop, I have no idea how to
> solve
> this problem...
> Does anyone have an idea how to handle this weird thing?
>
> --
> *Jinchun Kim*
>

Re: MapReduce Failed and Killed

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Any MapReduce task needs to communicate with the tasktracker that launched
it periodically in order to let the tasktracker know it is still alive and
active. The time for which silence is tolerated is controlled by a
configuration property mapred.task.timeout.

It looks like in your case, this has already been bumped up to 20 minutes
(from the default 10 minutes). It also looks like this is not sufficient.
You could bump this value even further up. However, the correct approach
could be to see what the reducer is actually doing to become inactive
during this time. Can you look at the reducer attempt's logs (which you can
access from the web UI of the Jobtracker) and post them here ?

Thanks
hemanth


On Fri, Mar 22, 2013 at 5:32 PM, Jinchun Kim <ci...@gmail.com> wrote:

> Hi, All.
>
> I'm trying to create category-based splits of Wikipedia dataset(41GB) and
> the training data set(5GB) using Mahout.
> I'm using following command.
>
> $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o
> wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt
>
> I had no problem with the training data set, but Hadoop showed following
> messages
> when I tried to do a same job with Wikipedia dataset,
>
> .........
> 13/03/21 22:31:00 INFO mapred.JobClient:  map 27% reduce 1%
> 13/03/21 22:40:31 INFO mapred.JobClient:  map 27% reduce 2%
> 13/03/21 22:58:49 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/21 23:22:57 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/21 23:46:32 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 00:27:14 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:06:55 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 01:14:06 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/22 01:15:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_r_000000_1, Status : FAILED
> Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200
> seconds. Killing!
> 13/03/22 01:20:09 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/22 01:33:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000037_1, Status : FAILED
> Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228
> seconds. Killing!
> 13/03/22 01:35:12 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 01:40:38 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:52:28 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 02:16:27 INFO mapred.JobClient:  map 27% reduce 8%
> 13/03/22 02:19:02 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000018_1, Status : FAILED
> Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204
> seconds. Killing!
> 13/03/22 02:49:03 INFO mapred.JobClient:  map 27% reduce 9%
> 13/03/22 02:52:04 INFO mapred.JobClient:  map 28% reduce 9%
> ........
>
> Because I just started to learn how to run Hadoop, I have no idea how to
> solve
> this problem...
> Does anyone have an idea how to handle this weird thing?
>
> --
> *Jinchun Kim*
>

Re: MapReduce Failed and Killed

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Any MapReduce task needs to communicate with the tasktracker that launched
it periodically in order to let the tasktracker know it is still alive and
active. The time for which silence is tolerated is controlled by a
configuration property mapred.task.timeout.

It looks like in your case, this has already been bumped up to 20 minutes
(from the default 10 minutes). It also looks like this is not sufficient.
You could bump this value even further up. However, the correct approach
could be to see what the reducer is actually doing to become inactive
during this time. Can you look at the reducer attempt's logs (which you can
access from the web UI of the Jobtracker) and post them here ?

Thanks
hemanth


On Fri, Mar 22, 2013 at 5:32 PM, Jinchun Kim <ci...@gmail.com> wrote:

> Hi, All.
>
> I'm trying to create category-based splits of Wikipedia dataset(41GB) and
> the training data set(5GB) using Mahout.
> I'm using following command.
>
> $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o
> wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt
>
> I had no problem with the training data set, but Hadoop showed following
> messages
> when I tried to do a same job with Wikipedia dataset,
>
> .........
> 13/03/21 22:31:00 INFO mapred.JobClient:  map 27% reduce 1%
> 13/03/21 22:40:31 INFO mapred.JobClient:  map 27% reduce 2%
> 13/03/21 22:58:49 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/21 23:22:57 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/21 23:46:32 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 00:27:14 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:06:55 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 01:14:06 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/22 01:15:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_r_000000_1, Status : FAILED
> Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200
> seconds. Killing!
> 13/03/22 01:20:09 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/22 01:33:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000037_1, Status : FAILED
> Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228
> seconds. Killing!
> 13/03/22 01:35:12 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 01:40:38 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:52:28 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 02:16:27 INFO mapred.JobClient:  map 27% reduce 8%
> 13/03/22 02:19:02 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000018_1, Status : FAILED
> Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204
> seconds. Killing!
> 13/03/22 02:49:03 INFO mapred.JobClient:  map 27% reduce 9%
> 13/03/22 02:52:04 INFO mapred.JobClient:  map 28% reduce 9%
> ........
>
> Because I just started to learn how to run Hadoop, I have no idea how to
> solve
> this problem...
> Does anyone have an idea how to handle this weird thing?
>
> --
> *Jinchun Kim*
>