You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Arko Provo Mukherjee <ar...@gmail.com> on 2011/10/27 10:22:26 UTC

Mappers getting killed

Hi,

I have a situation where I have to read a large file into every mapper.

Since its a large HDFS file that is needed to work on each input to the
mapper, it is taking a lot of time to read the data into the memory from
HDFS.

Thus the system is killing all my Mappers with the following message:

11/10/26 22:54:52 INFO mapred.JobClient: Task Id :
attempt_201106271322_12504_m_000000_0, Status : FAILED
Task attempt_201106271322_12504_m_000000_0 failed to report status for 601
seconds. Killing!

The cluster is not entirely owned by me and hence I cannot change the *
mapred.task.timeout* so as to be able to read the entire file.

Any suggestions?

Also, is there a way such that a Mapper instance reads the file once for all
the inputs that it receives.
Currently, since the file reading code is in the map method, I guess its
reading the entire file for each and every input leading to a lot of
overhead.

Please help!

Many thanks in advance!!

Warm regards
Arko

Re: Mappers getting killed

Posted by Arko Provo Mukherjee <ar...@gmail.com>.
Hi,

I used the setStatus method and now my mappers are not getting killed
anymore.

Thanks a lot!

Warm regards
Arko

On Thu, Oct 27, 2011 at 4:31 AM, Lucian Iordache <
lucian.george.iordache@gmail.com> wrote:

> Hi,
>
> Probably your map method takes too long to process the data. You could add
> some context.progress() or context.setStatus("status") in your map method
> from time to time (at least once every 600 seconds, to not get the timeout).
>
> Regards,
> Lucian
>
>
> On Thu, Oct 27, 2011 at 11:22 AM, Arko Provo Mukherjee <
> arkoprovomukherjee@gmail.com> wrote:
>
>> Hi,
>>
>> I have a situation where I have to read a large file into every mapper.
>>
>> Since its a large HDFS file that is needed to work on each input to the
>> mapper, it is taking a lot of time to read the data into the memory from
>> HDFS.
>>
>> Thus the system is killing all my Mappers with the following message:
>>
>> 11/10/26 22:54:52 INFO mapred.JobClient: Task Id :
>> attempt_201106271322_12504_m_000000_0, Status : FAILED
>> Task attempt_201106271322_12504_m_000000_0 failed to report status for
>> 601 seconds. Killing!
>>
>> The cluster is not entirely owned by me and hence I cannot change the *
>> mapred.task.timeout* so as to be able to read the entire file.
>>
>> Any suggestions?
>>
>> Also, is there a way such that a Mapper instance reads the file once for
>> all the inputs that it receives.
>> Currently, since the file reading code is in the map method, I guess its
>> reading the entire file for each and every input leading to a lot of
>> overhead.
>>
>> Please help!
>>
>> Many thanks in advance!!
>>
>> Warm regards
>> Arko
>>
>
>
>
> --
> Numai bine,
> Lucian
>

Re: Mappers getting killed

Posted by Lucian Iordache <lu...@gmail.com>.
Hi,

Probably your map method takes too long to process the data. You could add
some context.progress() or context.setStatus("status") in your map method
from time to time (at least once every 600 seconds, to not get the timeout).

Regards,
Lucian

On Thu, Oct 27, 2011 at 11:22 AM, Arko Provo Mukherjee <
arkoprovomukherjee@gmail.com> wrote:

> Hi,
>
> I have a situation where I have to read a large file into every mapper.
>
> Since its a large HDFS file that is needed to work on each input to the
> mapper, it is taking a lot of time to read the data into the memory from
> HDFS.
>
> Thus the system is killing all my Mappers with the following message:
>
> 11/10/26 22:54:52 INFO mapred.JobClient: Task Id :
> attempt_201106271322_12504_m_000000_0, Status : FAILED
> Task attempt_201106271322_12504_m_000000_0 failed to report status for 601
> seconds. Killing!
>
> The cluster is not entirely owned by me and hence I cannot change the *
> mapred.task.timeout* so as to be able to read the entire file.
>
> Any suggestions?
>
> Also, is there a way such that a Mapper instance reads the file once for
> all the inputs that it receives.
> Currently, since the file reading code is in the map method, I guess its
> reading the entire file for each and every input leading to a lot of
> overhead.
>
> Please help!
>
> Many thanks in advance!!
>
> Warm regards
> Arko
>



-- 
Numai bine,
Lucian

Re: Mappers getting killed

Posted by Arko Provo Mukherjee <ar...@gmail.com>.
Thanks!

I will try and let know.

Warm regards
Arko

On Oct 27, 2011, at 8:19 AM, Brock Noland <br...@cloudera.com> wrote:

> Hi,
> 
> On Thu, Oct 27, 2011 at 3:22 AM, Arko Provo Mukherjee
> <ar...@gmail.com> wrote:
>> Hi,
>> 
>> I have a situation where I have to read a large file into every mapper.
>> 
>> Since its a large HDFS file that is needed to work on each input to the
>> mapper, it is taking a lot of time to read the data into the memory from
>> HDFS.
>> 
>> Thus the system is killing all my Mappers with the following message:
>> 
>> 11/10/26 22:54:52 INFO mapred.JobClient: Task Id :
>> attempt_201106271322_12504_m_000000_0, Status : FAILED
>> Task attempt_201106271322_12504_m_000000_0 failed to report status for 601
>> seconds. Killing!
>> 
>> The cluster is not entirely owned by me and hence I cannot change
>> the mapred.task.timeout so as to be able to read the entire file.
>> Any suggestions?
>> Also, is there a way such that a Mapper instance reads the file once for all
>> the inputs that it receives.
>> Currently, since the file reading code is in the map method, I guess its
>> reading the entire file for each and every input leading to a lot of
>> overhead.
> 
> 
> The file should be read in, in the configure() (old api) or setup()
> (new api) method.
> 
> Brock

Re: Mappers getting killed

Posted by Brock Noland <br...@cloudera.com>.
Hi,

On Thu, Oct 27, 2011 at 3:22 AM, Arko Provo Mukherjee
<ar...@gmail.com> wrote:
> Hi,
>
> I have a situation where I have to read a large file into every mapper.
>
> Since its a large HDFS file that is needed to work on each input to the
> mapper, it is taking a lot of time to read the data into the memory from
> HDFS.
>
> Thus the system is killing all my Mappers with the following message:
>
> 11/10/26 22:54:52 INFO mapred.JobClient: Task Id :
> attempt_201106271322_12504_m_000000_0, Status : FAILED
> Task attempt_201106271322_12504_m_000000_0 failed to report status for 601
> seconds. Killing!
>
> The cluster is not entirely owned by me and hence I cannot change
> the mapred.task.timeout so as to be able to read the entire file.
> Any suggestions?
> Also, is there a way such that a Mapper instance reads the file once for all
> the inputs that it receives.
> Currently, since the file reading code is in the map method, I guess its
> reading the entire file for each and every input leading to a lot of
> overhead.


The file should be read in, in the configure() (old api) or setup()
(new api) method.

Brock