You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "ccw (JIRA)" <ji...@apache.org> on 2012/04/28 13:11:50 UTC

[jira] [Created] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

ccw created MAPREDUCE-4208:
------------------------------

             Summary: The job is hanging up but never continuing until you kill the child process 
                 Key: MAPREDUCE-4208
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
             Project: Hadoop Map/Reduce
          Issue Type: Bug
         Environment: Hadoop 0.20.203.0
Hbase 0.90.3
Hive 0.80.1
            Reporter: ccw


I use the hive MR query on hbase,but the job is never end.
The job is hanging but never continuing util you kill the child process 


2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%

I checked the TT log,

2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%




[hadoop@mem1 logs]$ jps
3282 Child
31547 QuorumPeerMain
1840 TaskTracker
3469 Jps
31070 HRegionServer
30120 DataNode


[hadoop@mem1 logs]$  kill 3282

When I kill the child process , then the job continue and complete.

2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

Posted by "xieguiming (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268104#comment-13268104 ] 

xieguiming commented on MAPREDUCE-4208:
---------------------------------------

Hi ccw:
 whether the speculative is on? and the speculative is useful for your scenario?

I think that the timeout mechanism is not good. because that the service will spend 

long time indeed.

so, if kill the task by timeout mechanism, and the job is failed in fact.
and is the same with killing the job.


so, I suggest that it is suitable for user's application to kill the job when the job does not complete for long time.

                
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

Posted by "ccw (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266360#comment-13266360 ] 

ccw commented on MAPREDUCE-4208:
--------------------------------

I checkd all logs,but there is no error .I want to know why each job encounter this problem until I kill one or more child process. I set the property 'mapred.task.timeout' ,but it is unhelpful for this.  I know the child task is not dead , but also it never work. It is terrible that every time I need to kill the hanging task.I think  it need to be controlled by JT or TT.
                
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

Posted by "ccw (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268065#comment-13268065 ] 

ccw commented on MAPREDUCE-4208:
--------------------------------

Thank you! I found the cause of the problem . The fundamental reason is that data access of Hbase . Because the data(region server) of map task is not on local region server ,so each data access is very slow .
But I think there must be a timeout mechanism to prevent the occurrence of such events. If a task is running too slow ,Hadoop should automatically kill the task rather than manually kill.
Thank you!
                
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans resolved MAPREDUCE-4208.
--------------------------------------------

    Resolution: Not A Problem
    
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265002#comment-13265002 ] 

Robert Joseph Evans commented on MAPREDUCE-4208:
------------------------------------------------

ccw,

Without a heap dump, a stack trace, or even process logs for the hung process I am not really sure what more we can do but tell you that it looks like it was an intermittent issue, because it recovered after rerunning the task.  The task should have timed out after a default of 10 mins with no progress being made, but I don’t know if you disabled that, or if there is something the mr code is doing to prevent it.  If you can post some of that information to this JIRA we might be able to help, otherwise I think we will have to close this as Incomplete.
                
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267483#comment-13267483 ] 

Robert Joseph Evans commented on MAPREDUCE-4208:
------------------------------------------------

So have you been able to reproduce it?  I also think that it is a really bad bug that I want to fix, but I don't know where to start debugging it without more information about how it happened.
                
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

Posted by "ccw (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265859#comment-13265859 ] 

ccw commented on MAPREDUCE-4208:
--------------------------------

OK,I will try to reproduce this problem and tell you more message about it tomorrow.
                
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268447#comment-13268447 ] 

Robert Joseph Evans commented on MAPREDUCE-4208:
------------------------------------------------

There are two types of timeouts that happen for a task.  The first is a user configurable timeout with a default of 10 min.  If a task does not make any progress for 10 min then the task will timeout.  Progress is defined as reading a record of input, writing a record of output, or explicitly in your map/reduce code calling a method in the Context to indicate that you are still making progress. 

The other timeout is that if your task does not heartbeat into the AM for 5 min the task is assumed to be dead.

In your case you are still making progress, it is just very slow, so neither of the timeouts would take place.  You could possible set the progress interval to be something very small and that would probably detect the problem.  We do not want to change the default value for the progress interval because there are lots of jobs that may be doing significant amounts of computation for each record, and we don't want to kill them off.

mapreduce.task.timeout is the timeout you would want to configure.
                
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0 0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira