You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nigel Daley (JIRA)" <ji...@apache.org> on 2007/05/21 20:39:16 UTC
[jira] Created: (HADOOP-1400) JobClient rpc times out getting job
status
JobClient rpc times out getting job status
------------------------------------------
Key: HADOOP-1400
URL: https://issues.apache.org/jira/browse/HADOOP-1400
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.12.3
Reporter: Nigel Daley
Priority: Minor
JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:473)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
at $Proxy1.getJobStatus(Unknown Source)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502512 ]
Doug Cutting commented on HADOOP-1400:
--------------------------------------
So your contention is that only TimeoutException is retried, and that there's no way a server could deliver a TimeoutException near a bounce? Can you elaborate this?
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Attachment: 1400.patch
Ok, this patch is updated to the current head.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: 1400.patch, job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501983 ]
Doug Cutting commented on HADOOP-1400:
--------------------------------------
I'm concerned about retrying the submitJob() method. All the others seem okay to retry. The only other one with side-effects is killJob(), and it's okay to call that more than once. But we really don't want to submit a job twice.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Work started: (HADOOP-1400) JobClient rpc times out getting
job status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HADOOP-1400 started by Owen O'Malley.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Status: Open (was: Patch Available)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502496 ]
Hadoop QA commented on HADOOP-1400:
-----------------------------------
+1
http://issues.apache.org/jira/secure/attachment/12359198/job-client-retry-4.patch applied and successfully tested against trunk revision r544740.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/256/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/256/console
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502002 ]
Doug Cutting commented on HADOOP-1400:
--------------------------------------
I'm okay with making submit safe against retries, and then retrying it. That would indeed be better. Perhaps the client can ask the server for a job-id, then submit the job with the id, and the server can ignore duplicate submissions. We'd might want to make job ids a bit more unique too. Otherwise, if the JobTracker restarts between submit retries then the same job-id could end up being used by a different client, and a submission would be ignored.
In the meantime, we'd probably get 90+% of the benefit of this patch by simply not retrying submits. Then we can have a separate issue to make submits retryable.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502497 ]
Doug Cutting commented on HADOOP-1400:
--------------------------------------
I don't see how this handles the case I mentioned above, where:
1. Client A allocates job-id 1.
2. JobTracker restarts.
3. Client B allocates job-id 1.
4. Client B submits job 1.
5. Client A retries submtting job 1, and is returned B's job.
Did I miss something, or could this happen? An easy fix is to make job-ids more unique.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Status: Open (was: Patch Available)
I agree that submitJob needs to be more careful. But rather than not protect it against timeouts, the better solution is to have the server ignore the duplicate request.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502504 ]
Michael Bieniosek commented on HADOOP-1400:
-------------------------------------------
> An easy fix is to make job-ids more unique.
+1 to that: I don't really like the reuse of job_0001 across restarts. It makes job automation more difficult.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Priority: Blocker (was: Major)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Attachment: (was: job-client-retry.patch)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Attachment: (was: job-client-retry-4.patch)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley reassigned HADOOP-1400:
-------------------------------------
Assignee: Owen O'Malley
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Work started: (HADOOP-1400) JobClient rpc times out getting
job status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HADOOP-1400 started by Owen O'Malley.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502549 ]
Owen O'Malley commented on HADOOP-1400:
---------------------------------------
I filed the unique job id bug over in HADOOP-1473.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513098 ]
Hadoop QA commented on HADOOP-1400:
-----------------------------------
-1, could not apply patch.
The patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12361916/1400.patch as a patch to trunk revision r556754.
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/421/console
Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Attachment: job-client-retry-4.patch
Ok, this patch has:
1. Retry on timeouts for JobSubmissionProtocol.
2. A new JobSubmissionProtocol getNewJobId method to assign a new job_id.
3. Submit job now passes in the jobid.
4. The submit directory is now $system/$jobid.
5. Change some of the map/reduce classes to expect JobConf instead of Configurations, so that I can get the System directory using the method.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Attachment: (was: job-client-retry.patch)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting updated HADOOP-1400:
---------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I just committed this. Thanks, Owen!
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sameer Paranjpye updated HADOOP-1400:
-------------------------------------
Fix Version/s: 0.14.0
Priority: Major (was: Minor)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Fix For: 0.14.0
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Status: Patch Available (was: In Progress)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting updated HADOOP-1400:
---------------------------------
Status: Open (was: Patch Available)
We'll re-submit this after HADOOP-1473.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Attachment: job-client-retry.patch
I forgot one of the control paths. Here is an updated patch.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Attachment: job-client-retry.patch
Add a RetryProxy around the JobSubmissionProtocol so that time outs are retried.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Status: Patch Available (was: Open)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502509 ]
Owen O'Malley commented on HADOOP-1400:
---------------------------------------
I think that my patch addresses the problem at hand. If the JobTracker bounces, they'll get a connection error, which will be propagated to the user. I can understand that there is a desire for persistent job ids, but that is a big enough feature with enough downsides that it should be addressed in a separate jira.
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Status: Patch Available (was: Open)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job
status
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400:
----------------------------------
Status: Patch Available (was: In Progress)
> JobClient rpc times out getting job status
> ------------------------------------------
>
> Key: HADOOP-1400
> URL: https://issues.apache.org/jira/browse/HADOOP-1400
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Nigel Daley
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> at $Proxy1.getJobStatus(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.