You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Krish Donald <go...@gmail.com> on 2015/03/02 05:41:01 UTC

How to troubleshoot failed or stuck jobs

Hi,

Wanted to understand,  How to troubleshoot failed or stuck jobs ?

Thanks
Krish

Re: How to troubleshoot failed or stuck jobs

Posted by Krish Donald <go...@gmail.com>.

Thanks Rohith ...

What are the other issue you have seen for failed or stuck jobs?

On Sun, Mar 1, 2015 at 10:06 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:

>  Hi
>
>
>
> 1.       For the Failed jobs, you can directly check the MRAppMaster
> logs.  There you get reason for failed jobs.
>
> 2.       For the stuck job, you need to do some ground work to identify
> what is going wrong. It can be either YARN issue or MapReduce issue.
>
> 2.1   In a recent time, I have face job stuck many times if headroom
> calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM
> uses this as deciding factors (
> https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding
> parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198
>
> 2.2   When the job is stuck,
>
> YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total
> Memory, How many NodeManagers? What is the headroom sent to AM.
>
>                  MapReduce – Any NM’s are blacklisted, Does all the
> reducers tasks are using ClusterMemory? By default Reducers start before
> Mapper completion. In case if Mapper fails because of some unstable node,
> then reducers take over the cluster. Here, it is expected reducers should
> be pre-empted. Need to identify whether reducers are getting pre-empted.
>
> MRAppMaster log would help for some extent to analyze the issue.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Krish Donald [mailto:gotomypc27@gmail.com]
> *Sent:* 02 March 2015 11:09
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to troubleshoot failed or stuck jobs
>
>
>
> Thanks for Link Ted,
>
>
>
> However wanted to understand the approach which should be taken when
> troubleshooting failed or stuck jobs ?
>
>
>
>
>
> On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com> wrote:
>
> Here are some related discussions and JIRA:
>
>
>
> http://search-hadoop.com/m/LgpTk2gxrGx
>
> http://search-hadoop.com/m/LgpTk2YLArE
>
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
>
>
> Cheers
>
>
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:
>
> Hi,
>
>
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
>
>
> Thanks
>
> Krish
>
>
>
>
>

Re: How to troubleshoot failed or stuck jobs

Posted by Krish Donald <go...@gmail.com>.

Thanks Rohith ...

What are the other issue you have seen for failed or stuck jobs?

On Sun, Mar 1, 2015 at 10:06 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:

>  Hi
>
>
>
> 1.       For the Failed jobs, you can directly check the MRAppMaster
> logs.  There you get reason for failed jobs.
>
> 2.       For the stuck job, you need to do some ground work to identify
> what is going wrong. It can be either YARN issue or MapReduce issue.
>
> 2.1   In a recent time, I have face job stuck many times if headroom
> calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM
> uses this as deciding factors (
> https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding
> parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198
>
> 2.2   When the job is stuck,
>
> YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total
> Memory, How many NodeManagers? What is the headroom sent to AM.
>
>                  MapReduce – Any NM’s are blacklisted, Does all the
> reducers tasks are using ClusterMemory? By default Reducers start before
> Mapper completion. In case if Mapper fails because of some unstable node,
> then reducers take over the cluster. Here, it is expected reducers should
> be pre-empted. Need to identify whether reducers are getting pre-empted.
>
> MRAppMaster log would help for some extent to analyze the issue.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Krish Donald [mailto:gotomypc27@gmail.com]
> *Sent:* 02 March 2015 11:09
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to troubleshoot failed or stuck jobs
>
>
>
> Thanks for Link Ted,
>
>
>
> However wanted to understand the approach which should be taken when
> troubleshooting failed or stuck jobs ?
>
>
>
>
>
> On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com> wrote:
>
> Here are some related discussions and JIRA:
>
>
>
> http://search-hadoop.com/m/LgpTk2gxrGx
>
> http://search-hadoop.com/m/LgpTk2YLArE
>
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
>
>
> Cheers
>
>
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:
>
> Hi,
>
>
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
>
>
> Thanks
>
> Krish
>
>
>
>
>

Re: How to troubleshoot failed or stuck jobs

Posted by Krish Donald <go...@gmail.com>.

Thanks Rohith ...

What are the other issue you have seen for failed or stuck jobs?

On Sun, Mar 1, 2015 at 10:06 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:

>  Hi
>
>
>
> 1.       For the Failed jobs, you can directly check the MRAppMaster
> logs.  There you get reason for failed jobs.
>
> 2.       For the stuck job, you need to do some ground work to identify
> what is going wrong. It can be either YARN issue or MapReduce issue.
>
> 2.1   In a recent time, I have face job stuck many times if headroom
> calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM
> uses this as deciding factors (
> https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding
> parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198
>
> 2.2   When the job is stuck,
>
> YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total
> Memory, How many NodeManagers? What is the headroom sent to AM.
>
>                  MapReduce – Any NM’s are blacklisted, Does all the
> reducers tasks are using ClusterMemory? By default Reducers start before
> Mapper completion. In case if Mapper fails because of some unstable node,
> then reducers take over the cluster. Here, it is expected reducers should
> be pre-empted. Need to identify whether reducers are getting pre-empted.
>
> MRAppMaster log would help for some extent to analyze the issue.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Krish Donald [mailto:gotomypc27@gmail.com]
> *Sent:* 02 March 2015 11:09
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to troubleshoot failed or stuck jobs
>
>
>
> Thanks for Link Ted,
>
>
>
> However wanted to understand the approach which should be taken when
> troubleshooting failed or stuck jobs ?
>
>
>
>
>
> On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com> wrote:
>
> Here are some related discussions and JIRA:
>
>
>
> http://search-hadoop.com/m/LgpTk2gxrGx
>
> http://search-hadoop.com/m/LgpTk2YLArE
>
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
>
>
> Cheers
>
>
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:
>
> Hi,
>
>
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
>
>
> Thanks
>
> Krish
>
>
>
>
>

Re: How to troubleshoot failed or stuck jobs

Posted by Krish Donald <go...@gmail.com>.

Thanks Rohith ...

What are the other issue you have seen for failed or stuck jobs?

On Sun, Mar 1, 2015 at 10:06 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:

>  Hi
>
>
>
> 1.       For the Failed jobs, you can directly check the MRAppMaster
> logs.  There you get reason for failed jobs.
>
> 2.       For the stuck job, you need to do some ground work to identify
> what is going wrong. It can be either YARN issue or MapReduce issue.
>
> 2.1   In a recent time, I have face job stuck many times if headroom
> calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM
> uses this as deciding factors (
> https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding
> parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198
>
> 2.2   When the job is stuck,
>
> YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total
> Memory, How many NodeManagers? What is the headroom sent to AM.
>
>                  MapReduce – Any NM’s are blacklisted, Does all the
> reducers tasks are using ClusterMemory? By default Reducers start before
> Mapper completion. In case if Mapper fails because of some unstable node,
> then reducers take over the cluster. Here, it is expected reducers should
> be pre-empted. Need to identify whether reducers are getting pre-empted.
>
> MRAppMaster log would help for some extent to analyze the issue.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Krish Donald [mailto:gotomypc27@gmail.com]
> *Sent:* 02 March 2015 11:09
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to troubleshoot failed or stuck jobs
>
>
>
> Thanks for Link Ted,
>
>
>
> However wanted to understand the approach which should be taken when
> troubleshooting failed or stuck jobs ?
>
>
>
>
>
> On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com> wrote:
>
> Here are some related discussions and JIRA:
>
>
>
> http://search-hadoop.com/m/LgpTk2gxrGx
>
> http://search-hadoop.com/m/LgpTk2YLArE
>
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
>
>
> Cheers
>
>
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:
>
> Hi,
>
>
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
>
>
> Thanks
>
> Krish
>
>
>
>
>

RE: How to troubleshoot failed or stuck jobs

Posted by Rohith Sharma K S <ro...@huawei.com>.

Hi


1.       For the Failed jobs, you can directly check the MRAppMaster logs.  There you get reason for failed jobs.

2.       For the stuck job, you need to do some ground work to identify what is going wrong. It can be either YARN issue or MapReduce issue.

2.1   In a recent time, I have face job stuck many times if headroom calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM uses this as deciding factors ( https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198

2.2   When the job is stuck,
YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total Memory, How many NodeManagers? What is the headroom sent to AM.
                 MapReduce – Any NM’s are blacklisted, Does all the reducers tasks are using ClusterMemory? By default Reducers start before Mapper completion. In case if Mapper fails because of some unstable node, then reducers take over the cluster. Here, it is expected reducers should be pre-empted. Need to identify whether reducers are getting pre-empted.
MRAppMaster log would help for some extent to analyze the issue.

Thanks & Regards
Rohith Sharma K S

From: Krish Donald [mailto:gotomypc27@gmail.com]
Sent: 02 March 2015 11:09
To: user@hadoop.apache.org
Subject: Re: How to troubleshoot failed or stuck jobs

Thanks for Link Ted,

However wanted to understand the approach which should be taken when troubleshooting failed or stuck jobs ?


On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com>> wrote:
Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com>> wrote:
Hi,

Wanted to understand,  How to troubleshoot failed or stuck jobs ?

Thanks
Krish

RE: How to troubleshoot failed or stuck jobs

Posted by Rohith Sharma K S <ro...@huawei.com>.

Hi


1.       For the Failed jobs, you can directly check the MRAppMaster logs.  There you get reason for failed jobs.

2.       For the stuck job, you need to do some ground work to identify what is going wrong. It can be either YARN issue or MapReduce issue.

2.1   In a recent time, I have face job stuck many times if headroom calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM uses this as deciding factors ( https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198

2.2   When the job is stuck,
YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total Memory, How many NodeManagers? What is the headroom sent to AM.
                 MapReduce – Any NM’s are blacklisted, Does all the reducers tasks are using ClusterMemory? By default Reducers start before Mapper completion. In case if Mapper fails because of some unstable node, then reducers take over the cluster. Here, it is expected reducers should be pre-empted. Need to identify whether reducers are getting pre-empted.
MRAppMaster log would help for some extent to analyze the issue.

Thanks & Regards
Rohith Sharma K S

From: Krish Donald [mailto:gotomypc27@gmail.com]
Sent: 02 March 2015 11:09
To: user@hadoop.apache.org
Subject: Re: How to troubleshoot failed or stuck jobs

Thanks for Link Ted,

However wanted to understand the approach which should be taken when troubleshooting failed or stuck jobs ?


On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com>> wrote:
Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com>> wrote:
Hi,

Wanted to understand,  How to troubleshoot failed or stuck jobs ?

Thanks
Krish

RE: How to troubleshoot failed or stuck jobs

Posted by Rohith Sharma K S <ro...@huawei.com>.

Hi


1.       For the Failed jobs, you can directly check the MRAppMaster logs.  There you get reason for failed jobs.

2.       For the stuck job, you need to do some ground work to identify what is going wrong. It can be either YARN issue or MapReduce issue.

2.1   In a recent time, I have face job stuck many times if headroom calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM uses this as deciding factors ( https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198

2.2   When the job is stuck,
YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total Memory, How many NodeManagers? What is the headroom sent to AM.
                 MapReduce – Any NM’s are blacklisted, Does all the reducers tasks are using ClusterMemory? By default Reducers start before Mapper completion. In case if Mapper fails because of some unstable node, then reducers take over the cluster. Here, it is expected reducers should be pre-empted. Need to identify whether reducers are getting pre-empted.
MRAppMaster log would help for some extent to analyze the issue.

Thanks & Regards
Rohith Sharma K S

From: Krish Donald [mailto:gotomypc27@gmail.com]
Sent: 02 March 2015 11:09
To: user@hadoop.apache.org
Subject: Re: How to troubleshoot failed or stuck jobs

Thanks for Link Ted,

However wanted to understand the approach which should be taken when troubleshooting failed or stuck jobs ?


On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com>> wrote:
Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com>> wrote:
Hi,

Wanted to understand,  How to troubleshoot failed or stuck jobs ?

Thanks
Krish

RE: How to troubleshoot failed or stuck jobs

Posted by Rohith Sharma K S <ro...@huawei.com>.

Hi


1.       For the Failed jobs, you can directly check the MRAppMaster logs.  There you get reason for failed jobs.

2.       For the stuck job, you need to do some ground work to identify what is going wrong. It can be either YARN issue or MapReduce issue.

2.1   In a recent time, I have face job stuck many times if headroom calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM uses this as deciding factors ( https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198

2.2   When the job is stuck,
YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total Memory, How many NodeManagers? What is the headroom sent to AM.
                 MapReduce – Any NM’s are blacklisted, Does all the reducers tasks are using ClusterMemory? By default Reducers start before Mapper completion. In case if Mapper fails because of some unstable node, then reducers take over the cluster. Here, it is expected reducers should be pre-empted. Need to identify whether reducers are getting pre-empted.
MRAppMaster log would help for some extent to analyze the issue.

Thanks & Regards
Rohith Sharma K S

From: Krish Donald [mailto:gotomypc27@gmail.com]
Sent: 02 March 2015 11:09
To: user@hadoop.apache.org
Subject: Re: How to troubleshoot failed or stuck jobs

Thanks for Link Ted,

However wanted to understand the approach which should be taken when troubleshooting failed or stuck jobs ?


On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com>> wrote:
Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com>> wrote:
Hi,

Wanted to understand,  How to troubleshoot failed or stuck jobs ?

Thanks
Krish

Re: How to troubleshoot failed or stuck jobs

Posted by Krish Donald <go...@gmail.com>.

Thanks for Link Ted,

However wanted to understand the approach which should be taken when
troubleshooting failed or stuck jobs ?

On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com> wrote:

> Here are some related discussions and JIRA:
>
> http://search-hadoop.com/m/LgpTk2gxrGx
> http://search-hadoop.com/m/LgpTk2YLArE
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
> Cheers
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:
>
>> Hi,
>>
>> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>>
>> Thanks
>> Krish
>>
>
>

Re: How to troubleshoot failed or stuck jobs

Posted by Krish Donald <go...@gmail.com>.

Thanks for Link Ted,

However wanted to understand the approach which should be taken when
troubleshooting failed or stuck jobs ?

On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com> wrote:

> Here are some related discussions and JIRA:
>
> http://search-hadoop.com/m/LgpTk2gxrGx
> http://search-hadoop.com/m/LgpTk2YLArE
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
> Cheers
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:
>
>> Hi,
>>
>> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>>
>> Thanks
>> Krish
>>
>
>

Re: How to troubleshoot failed or stuck jobs

Posted by Krish Donald <go...@gmail.com>.

Thanks for Link Ted,

However wanted to understand the approach which should be taken when
troubleshooting failed or stuck jobs ?

On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com> wrote:

> Here are some related discussions and JIRA:
>
> http://search-hadoop.com/m/LgpTk2gxrGx
> http://search-hadoop.com/m/LgpTk2YLArE
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
> Cheers
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:
>
>> Hi,
>>
>> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>>
>> Thanks
>> Krish
>>
>
>

Re: How to troubleshoot failed or stuck jobs

Posted by Krish Donald <go...@gmail.com>.

Thanks for Link Ted,

However wanted to understand the approach which should be taken when
troubleshooting failed or stuck jobs ?

On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yu...@gmail.com> wrote:

> Here are some related discussions and JIRA:
>
> http://search-hadoop.com/m/LgpTk2gxrGx
> http://search-hadoop.com/m/LgpTk2YLArE
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
> Cheers
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:
>
>> Hi,
>>
>> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>>
>> Thanks
>> Krish
>>
>
>

Re: How to troubleshoot failed or stuck jobs

Posted by Ted Yu <yu...@gmail.com>.

Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
> Thanks
> Krish
>

Re: How to troubleshoot failed or stuck jobs

Posted by Ted Yu <yu...@gmail.com>.

Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
> Thanks
> Krish
>

Re: How to troubleshoot failed or stuck jobs

Posted by Ted Yu <yu...@gmail.com>.

Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
> Thanks
> Krish
>

Re: How to troubleshoot failed or stuck jobs

Posted by Ted Yu <yu...@gmail.com>.

Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
> Thanks
> Krish
>