You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Ashish Kumar Singh <as...@gmail.com> on 2015/07/20 10:45:35 UTC
Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all
the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our
mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14
01:54:46,636 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634
INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish
Re: Application Master waits a long time after Mapper/Reducers
finish
Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
Might due to performance issue of FileOutputCommitter which is resolved in 2.7
https://issues.apache.org/jira/browse/MAPREDUCE-4815
Best Regard,
Jeff Zhang
From: Ashish Kumar Singh <as...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, July 20, 2015 at 4:06 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Application Master waits a long time after Mapper/Reducers finish
Hi Rohit ,
Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .
Also , Namenode and HDFS look healthy in our cluster .
PFA latest AM logs for the job .
Regards,
Ashish
On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <ro...@huawei.com>> wrote:
Hi
>From thread dump, it seems waiting for HDFS operation. Can you attach AM logs, and do you see any client retry for connecting to HDFS?
"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
............................
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
May be you can check from HDFS that is it Healthy?
Thanks & Regards
Rohith Sharma K S
From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com<ma...@gmail.com>]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish
Re: Application Master waits a long time after Mapper/Reducers
finish
Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
Might due to performance issue of FileOutputCommitter which is resolved in 2.7
https://issues.apache.org/jira/browse/MAPREDUCE-4815
Best Regard,
Jeff Zhang
From: Ashish Kumar Singh <as...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, July 20, 2015 at 4:06 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Application Master waits a long time after Mapper/Reducers finish
Hi Rohit ,
Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .
Also , Namenode and HDFS look healthy in our cluster .
PFA latest AM logs for the job .
Regards,
Ashish
On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <ro...@huawei.com>> wrote:
Hi
>From thread dump, it seems waiting for HDFS operation. Can you attach AM logs, and do you see any client retry for connecting to HDFS?
"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
............................
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
May be you can check from HDFS that is it Healthy?
Thanks & Regards
Rohith Sharma K S
From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com<ma...@gmail.com>]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish
Re: Application Master waits a long time after Mapper/Reducers
finish
Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
Might due to performance issue of FileOutputCommitter which is resolved in 2.7
https://issues.apache.org/jira/browse/MAPREDUCE-4815
Best Regard,
Jeff Zhang
From: Ashish Kumar Singh <as...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, July 20, 2015 at 4:06 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Application Master waits a long time after Mapper/Reducers finish
Hi Rohit ,
Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .
Also , Namenode and HDFS look healthy in our cluster .
PFA latest AM logs for the job .
Regards,
Ashish
On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <ro...@huawei.com>> wrote:
Hi
>From thread dump, it seems waiting for HDFS operation. Can you attach AM logs, and do you see any client retry for connecting to HDFS?
"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
............................
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
May be you can check from HDFS that is it Healthy?
Thanks & Regards
Rohith Sharma K S
From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com<ma...@gmail.com>]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish
Re: Application Master waits a long time after Mapper/Reducers
finish
Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
Might due to performance issue of FileOutputCommitter which is resolved in 2.7
https://issues.apache.org/jira/browse/MAPREDUCE-4815
Best Regard,
Jeff Zhang
From: Ashish Kumar Singh <as...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, July 20, 2015 at 4:06 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Application Master waits a long time after Mapper/Reducers finish
Hi Rohit ,
Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .
Also , Namenode and HDFS look healthy in our cluster .
PFA latest AM logs for the job .
Regards,
Ashish
On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <ro...@huawei.com>> wrote:
Hi
>From thread dump, it seems waiting for HDFS operation. Can you attach AM logs, and do you see any client retry for connecting to HDFS?
"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
............................
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
May be you can check from HDFS that is it Healthy?
Thanks & Regards
Rohith Sharma K S
From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com<ma...@gmail.com>]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish
Re: Application Master waits a long time after Mapper/Reducers finish
Posted by Ashish Kumar Singh <as...@gmail.com>.
Hi Rohit ,
Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .
Also , Namenode and HDFS look healthy in our cluster .
PFA latest AM logs for the job .
Regards,
Ashish
On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:
> Hi
>
>
>
> From thread dump, it seems waiting for HDFS operation. Can you attach AM
> logs, and do you see any client retry for connecting to HDFS?
>
>
>
> "CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in
> Object.wait() [0x00007f4f12aa4000]
>
> java.lang.Thread.State: WAITING (on object monitor)
>
> at java.lang.Object.wait(Native Method)
>
> at java.lang.Object.wait(Object.java:503)
>
> ……………………….
>
> at
> org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
>
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
>
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
>
>
>
>
>
> May be you can check from HDFS that is it Healthy?
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Ashish Kumar Singh [mailto:ashish23aks@gmail.com]
> *Sent:* 20 July 2015 14:16
> *To:* user@hadoop.apache.org
> *Subject:* Application Master waits a long time after Mapper/Reducers
> finish
>
>
>
> Hello Users ,
>
> I am facing a problem running Mapreduce jobs on Hadoop 2.6.
> I am observing that the Applocation Master waits for a long time after
> all the Mappers and Reducers are completed before the job is completed .
>
> This wait time sometimes exceeds 20-25 mins which is very strange as our
> mappers and reducers complete in less than 10 minutes for the job .
>
> Below are some observations:
> a) Job completion status stands at 95% when the wait begins
>
> b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14
> 01:54:46,636 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
> job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
>
> c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634
> INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
> job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
>
>
> Appreciate any help on this .
>
> Thread dump while the Application master hangs is attached.
>
> Regards,
>
> Ashish
>
Re: Application Master waits a long time after Mapper/Reducers finish
Posted by Ashish Kumar Singh <as...@gmail.com>.
Hi Rohit ,
Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .
Also , Namenode and HDFS look healthy in our cluster .
PFA latest AM logs for the job .
Regards,
Ashish
On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:
> Hi
>
>
>
> From thread dump, it seems waiting for HDFS operation. Can you attach AM
> logs, and do you see any client retry for connecting to HDFS?
>
>
>
> "CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in
> Object.wait() [0x00007f4f12aa4000]
>
> java.lang.Thread.State: WAITING (on object monitor)
>
> at java.lang.Object.wait(Native Method)
>
> at java.lang.Object.wait(Object.java:503)
>
> ……………………….
>
> at
> org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
>
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
>
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
>
>
>
>
>
> May be you can check from HDFS that is it Healthy?
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Ashish Kumar Singh [mailto:ashish23aks@gmail.com]
> *Sent:* 20 July 2015 14:16
> *To:* user@hadoop.apache.org
> *Subject:* Application Master waits a long time after Mapper/Reducers
> finish
>
>
>
> Hello Users ,
>
> I am facing a problem running Mapreduce jobs on Hadoop 2.6.
> I am observing that the Applocation Master waits for a long time after
> all the Mappers and Reducers are completed before the job is completed .
>
> This wait time sometimes exceeds 20-25 mins which is very strange as our
> mappers and reducers complete in less than 10 minutes for the job .
>
> Below are some observations:
> a) Job completion status stands at 95% when the wait begins
>
> b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14
> 01:54:46,636 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
> job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
>
> c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634
> INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
> job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
>
>
> Appreciate any help on this .
>
> Thread dump while the Application master hangs is attached.
>
> Regards,
>
> Ashish
>
Re: Application Master waits a long time after Mapper/Reducers finish
Posted by Ashish Kumar Singh <as...@gmail.com>.
Hi Rohit ,
Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .
Also , Namenode and HDFS look healthy in our cluster .
PFA latest AM logs for the job .
Regards,
Ashish
On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:
> Hi
>
>
>
> From thread dump, it seems waiting for HDFS operation. Can you attach AM
> logs, and do you see any client retry for connecting to HDFS?
>
>
>
> "CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in
> Object.wait() [0x00007f4f12aa4000]
>
> java.lang.Thread.State: WAITING (on object monitor)
>
> at java.lang.Object.wait(Native Method)
>
> at java.lang.Object.wait(Object.java:503)
>
> ……………………….
>
> at
> org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
>
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
>
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
>
>
>
>
>
> May be you can check from HDFS that is it Healthy?
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Ashish Kumar Singh [mailto:ashish23aks@gmail.com]
> *Sent:* 20 July 2015 14:16
> *To:* user@hadoop.apache.org
> *Subject:* Application Master waits a long time after Mapper/Reducers
> finish
>
>
>
> Hello Users ,
>
> I am facing a problem running Mapreduce jobs on Hadoop 2.6.
> I am observing that the Applocation Master waits for a long time after
> all the Mappers and Reducers are completed before the job is completed .
>
> This wait time sometimes exceeds 20-25 mins which is very strange as our
> mappers and reducers complete in less than 10 minutes for the job .
>
> Below are some observations:
> a) Job completion status stands at 95% when the wait begins
>
> b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14
> 01:54:46,636 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
> job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
>
> c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634
> INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
> job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
>
>
> Appreciate any help on this .
>
> Thread dump while the Application master hangs is attached.
>
> Regards,
>
> Ashish
>
Re: Application Master waits a long time after Mapper/Reducers finish
Posted by Ashish Kumar Singh <as...@gmail.com>.
Hi Rohit ,
Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .
Also , Namenode and HDFS look healthy in our cluster .
PFA latest AM logs for the job .
Regards,
Ashish
On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:
> Hi
>
>
>
> From thread dump, it seems waiting for HDFS operation. Can you attach AM
> logs, and do you see any client retry for connecting to HDFS?
>
>
>
> "CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in
> Object.wait() [0x00007f4f12aa4000]
>
> java.lang.Thread.State: WAITING (on object monitor)
>
> at java.lang.Object.wait(Native Method)
>
> at java.lang.Object.wait(Object.java:503)
>
> ……………………….
>
> at
> org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
>
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
>
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
>
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
>
>
>
>
>
> May be you can check from HDFS that is it Healthy?
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Ashish Kumar Singh [mailto:ashish23aks@gmail.com]
> *Sent:* 20 July 2015 14:16
> *To:* user@hadoop.apache.org
> *Subject:* Application Master waits a long time after Mapper/Reducers
> finish
>
>
>
> Hello Users ,
>
> I am facing a problem running Mapreduce jobs on Hadoop 2.6.
> I am observing that the Applocation Master waits for a long time after
> all the Mappers and Reducers are completed before the job is completed .
>
> This wait time sometimes exceeds 20-25 mins which is very strange as our
> mappers and reducers complete in less than 10 minutes for the job .
>
> Below are some observations:
> a) Job completion status stands at 95% when the wait begins
>
> b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14
> 01:54:46,636 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
> job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
>
> c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634
> INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
> job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
>
>
> Appreciate any help on this .
>
> Thread dump while the Application master hangs is attached.
>
> Regards,
>
> Ashish
>
RE: Application Master waits a long time after Mapper/Reducers
finish
Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi
From thread dump, it seems waiting for HDFS operation. Can you attach AM logs, and do you see any client retry for connecting to HDFS?
"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
……………………….
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
May be you can check from HDFS that is it Healthy?
Thanks & Regards
Rohith Sharma K S
From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org
Subject: Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish
RE: Application Master waits a long time after Mapper/Reducers
finish
Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi
From thread dump, it seems waiting for HDFS operation. Can you attach AM logs, and do you see any client retry for connecting to HDFS?
"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
……………………….
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
May be you can check from HDFS that is it Healthy?
Thanks & Regards
Rohith Sharma K S
From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org
Subject: Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish
RE: Application Master waits a long time after Mapper/Reducers
finish
Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi
From thread dump, it seems waiting for HDFS operation. Can you attach AM logs, and do you see any client retry for connecting to HDFS?
"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
……………………….
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
May be you can check from HDFS that is it Healthy?
Thanks & Regards
Rohith Sharma K S
From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org
Subject: Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish
RE: Application Master waits a long time after Mapper/Reducers
finish
Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi
From thread dump, it seems waiting for HDFS operation. Can you attach AM logs, and do you see any client retry for connecting to HDFS?
"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
……………………….
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
May be you can check from HDFS that is it Healthy?
Thanks & Regards
Rohith Sharma K S
From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org
Subject: Application Master waits a long time after Mapper/Reducers finish
Hello Users ,
I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed .
This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job .
Below are some observations:
a) Job completion status stands at 95% when the wait begins
b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING )
c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED )
Appreciate any help on this .
Thread dump while the Application master hangs is attached.
Regards,
Ashish