You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nigel Daley (JIRA)" <ji...@apache.org> on 2007/05/21 20:39:16 UTC

[jira] Created: (HADOOP-1400) JobClient rpc times out getting job status

JobClient rpc times out getting job status
------------------------------------------

                 Key: HADOOP-1400
                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.3
            Reporter: Nigel Daley
            Priority: Minor


JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:

07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
	at org.apache.hadoop.ipc.Client.call(Client.java:473)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
	at $Proxy1.getJobStatus(Unknown Source)
	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502512 ] 

Doug Cutting commented on HADOOP-1400:
--------------------------------------

So your contention is that only TimeoutException is retried, and that there's no way a server could deliver a TimeoutException near a bounce?  Can you elaborate this?

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Attachment: 1400.patch

Ok, this patch is updated to the current head.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: 1400.patch, job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501983 ] 

Doug Cutting commented on HADOOP-1400:
--------------------------------------

I'm concerned about retrying the submitJob() method.  All the others seem okay to retry.  The only other one with side-effects is killJob(), and it's okay to call that more than once.  But we really don't want to submit a job twice.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-1400 started by Owen O'Malley.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Status: Open  (was: Patch Available)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502496 ] 

Hadoop QA commented on HADOOP-1400:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12359198/job-client-retry-4.patch applied and successfully tested against trunk revision r544740.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/256/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/256/console

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502002 ] 

Doug Cutting commented on HADOOP-1400:
--------------------------------------

I'm okay with making submit safe against retries, and then retrying it.  That would indeed be better.  Perhaps the client can ask the server for a job-id, then submit the job with the id, and the server can ignore duplicate submissions.  We'd might want to make job ids a bit more unique too.  Otherwise, if the JobTracker restarts between submit retries then the same job-id could end up being used by a different client, and a submission would be ignored.

In the meantime, we'd probably get 90+% of the benefit of this patch by simply not retrying submits.  Then we can have a separate issue to make submits retryable.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502497 ] 

Doug Cutting commented on HADOOP-1400:
--------------------------------------

I don't see how this handles the case I mentioned above, where:

1. Client A allocates job-id 1.
2. JobTracker restarts.
3. Client B allocates job-id 1.
4. Client B submits job 1.
5. Client A retries submtting job 1, and is returned B's job.

Did I miss something, or could this happen?  An easy fix is to make job-ids more unique.


> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Status: Open  (was: Patch Available)

I agree that submitJob needs to be more careful. But rather than not protect it against timeouts, the better solution is to have the server ignore the duplicate request.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502504 ] 

Michael Bieniosek commented on HADOOP-1400:
-------------------------------------------

> An easy fix is to make job-ids more unique.

+1 to that: I don't really like the reuse of job_0001 across restarts.  It makes job automation more difficult.


> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Priority: Blocker  (was: Major)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Attachment:     (was: job-client-retry.patch)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Attachment:     (was: job-client-retry-4.patch)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned HADOOP-1400:
-------------------------------------

    Assignee: Owen O'Malley

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-1400 started by Owen O'Malley.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502549 ] 

Owen O'Malley commented on HADOOP-1400:
---------------------------------------

I filed the unique job id bug over in HADOOP-1473.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513098 ] 

Hadoop QA commented on HADOOP-1400:
-----------------------------------

-1, could not apply patch.

The patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12361916/1400.patch as a patch to trunk revision r556754.

Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/421/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Attachment: job-client-retry-4.patch

Ok, this patch has:
  1. Retry on timeouts for JobSubmissionProtocol.
  2. A new JobSubmissionProtocol getNewJobId method to assign a new job_id.
  3. Submit job now passes in the jobid.
  4. The submit directory is now $system/$jobid.
  5. Change some of the map/reduce classes to expect JobConf instead of Configurations, so that I can get the System directory using the method.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Attachment:     (was: job-client-retry.patch)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1400:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Owen!

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sameer Paranjpye updated HADOOP-1400:
-------------------------------------

    Fix Version/s: 0.14.0
         Priority: Major  (was: Minor)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>             Fix For: 0.14.0
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Status: Patch Available  (was: In Progress)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: 1400.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1400:
---------------------------------

    Status: Open  (was: Patch Available)

We'll re-submit this after HADOOP-1473.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Attachment: job-client-retry.patch

I forgot one of the control paths. Here is an updated patch.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Attachment: job-client-retry.patch

Add a RetryProxy around the JobSubmissionProtocol so that time outs are retried.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Status: Patch Available  (was: Open)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502509 ] 

Owen O'Malley commented on HADOOP-1400:
---------------------------------------

I think that my patch addresses the problem at hand. If the JobTracker bounces, they'll get a connection error, which will be propagated to the user. I can understand that there is a desire for persistent job ids, but that is a big enough feature with enough downsides that it should be addressed in a separate jira.

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry-4.patch, job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Status: Patch Available  (was: Open)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1400) JobClient rpc times out getting job status

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1400:
----------------------------------

    Status: Patch Available  (was: In Progress)

> JobClient rpc times out getting job status
> ------------------------------------------
>
>                 Key: HADOOP-1400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: job-client-retry.patch
>
>
> JobClient rpc times out getting job status and logs an INFO exception message.  JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry.  Here's an example log:
> 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
> 07/05/20 20:45:39 INFO mapred.JobClient:  map 0% reduce 0%
> 07/05/20 20:45:43 INFO mapred.JobClient:  map 2% reduce 0%
> 07/05/20 20:45:44 INFO mapred.JobClient:  map 7% reduce 0%
> 07/05/20 20:45:45 INFO mapred.JobClient:  map 17% reduce 0%
> 07/05/20 20:45:46 INFO mapred.JobClient:  map 23% reduce 0%
> 07/05/20 20:45:47 INFO mapred.JobClient:  map 36% reduce 0%
> 07/05/20 20:45:48 INFO mapred.JobClient:  map 60% reduce 0%
> 07/05/20 20:45:49 INFO mapred.JobClient:  map 71% reduce 0%
> 07/05/20 20:45:50 INFO mapred.JobClient:  map 73% reduce 0%
> 07/05/20 20:45:51 INFO mapred.JobClient:  map 75% reduce 0%
> 07/05/20 20:45:52 INFO mapred.JobClient:  map 79% reduce 0%
> 07/05/20 20:45:53 INFO mapred.JobClient:  map 84% reduce 0%
> 07/05/20 20:45:54 INFO mapred.JobClient:  map 97% reduce 0%
> 07/05/20 20:45:57 INFO mapred.JobClient:  map 99% reduce 0%
> 07/05/20 20:45:58 INFO mapred.JobClient:  map 100% reduce 1%
> 07/05/20 20:46:00 INFO mapred.JobClient:  map 100% reduce 2%
> 07/05/20 20:46:08 INFO mapred.JobClient:  map 100% reduce 3%
> 07/05/20 20:46:11 INFO mapred.JobClient:  map 100% reduce 4%
> 07/05/20 20:46:18 INFO mapred.JobClient:  map 100% reduce 5%
> 07/05/20 20:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 07/05/20 20:46:28 INFO mapred.JobClient:  map 100% reduce 7%
> 07/05/20 20:46:31 INFO mapred.JobClient:  map 100% reduce 8%
> 07/05/20 20:46:37 INFO mapred.JobClient:  map 100% reduce 9%
> 07/05/20 20:46:39 INFO mapred.JobClient:  map 100% reduce 10%
> 07/05/20 20:46:47 INFO mapred.JobClient:  map 100% reduce 11%
> 07/05/20 20:46:50 INFO mapred.JobClient:  map 100% reduce 12%
> 07/05/20 20:46:57 INFO mapred.JobClient:  map 100% reduce 13%
> 07/05/20 20:46:59 INFO mapred.JobClient:  map 100% reduce 14%
> 07/05/20 20:47:07 INFO mapred.JobClient:  map 100% reduce 15%
> 07/05/20 20:47:09 INFO mapred.JobClient:  map 100% reduce 16%
> 07/05/20 20:47:17 INFO mapred.JobClient:  map 100% reduce 17%
> 07/05/20 20:47:19 INFO mapred.JobClient:  map 100% reduce 18%
> 07/05/20 20:47:28 INFO mapred.JobClient:  map 100% reduce 19%
> 07/05/20 20:47:29 INFO mapred.JobClient:  map 100% reduce 20%
> 07/05/20 20:47:35 INFO mapred.JobClient:  map 100% reduce 21%
> 07/05/20 20:47:39 INFO mapred.JobClient:  map 100% reduce 22%
> 07/05/20 20:47:44 INFO mapred.JobClient:  map 100% reduce 23%
> 07/05/20 20:47:49 INFO mapred.JobClient:  map 100% reduce 24%
> 07/05/20 20:47:58 INFO mapred.JobClient:  map 100% reduce 25%
> 07/05/20 20:47:59 INFO mapred.JobClient:  map 100% reduce 26%
> 07/05/20 20:48:05 INFO mapred.JobClient:  map 100% reduce 27%
> 07/05/20 20:48:09 INFO mapred.JobClient:  map 100% reduce 28%
> 07/05/20 20:48:14 INFO mapred.JobClient:  map 100% reduce 29%
> 07/05/20 20:48:19 INFO mapred.JobClient:  map 100% reduce 30%
> 07/05/20 20:48:24 INFO mapred.JobClient:  map 100% reduce 31%
> 07/05/20 20:48:29 INFO mapred.JobClient:  map 100% reduce 32%
> 07/05/20 20:48:31 INFO mapred.JobClient:  map 100% reduce 35%
> 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> 	at $Proxy1.getJobStatus(Unknown Source)
> 	at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
> 	at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
> 	at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> 07/05/20 20:49:35 INFO mapred.JobClient:  map 100% reduce 97%
> 07/05/20 20:49:37 INFO mapred.JobClient:  map 100% reduce 98%
> 07/05/20 20:49:40 INFO mapred.JobClient:  map 100% reduce 99%
> 07/05/20 20:49:42 INFO mapred.JobClient:  map 100% reduce 100%
> 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.