You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Nishan Shetty (JIRA)" <ji...@apache.org> on 2012/05/30 09:15:23 UTC

[jira] [Created] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Nishan Shetty created MAPREDUCE-4292:
----------------------------------------

             Summary: Job is hanging forever when some maps are failing always
                 Key: MAPREDUCE-4292
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 2.0.0-alpha
            Reporter: Nishan Shetty
            Priority: Critical


Set property "mapred.reduce.tasks" to some value greater than zero

I have a job in which some maps are failing always. 
Observations:
1.Map phase is completing with 100%(with succeeded and failed maps). 
2.Reduce phase is not progressing further after 32%.
3.After map phase is completed job is hanging forever.

Expected that job should be failed after waiting for some time.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285476#comment-13285476 ] 

Harsh J commented on MAPREDUCE-4292:
------------------------------------

What exactly was the failed map's cause? Did the job not fail after 4x map fails? (Note: Reducers may fail and retrigger maps if they can't get its outputs in good time.)

Logs of the MR AM would be good to have.
                
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Devaraj K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286460#comment-13286460 ] 

Devaraj K commented on MAPREDUCE-4292:
--------------------------------------

If we set non-zero value for the property 'mapred.max.map.failures.percent', then the issue is occurring.

                
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Nishan Shetty (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285535#comment-13285535 ] 

Nishan Shetty commented on MAPREDUCE-4292:
------------------------------------------

bq. What exactly was the failed map's cause? Did the job not fail after 4x map fails? (Note: Reducers may fail and retrigger maps if they can't get its outputs in good time.)

I was testing a scenario where some maps fail always. Job did not fail after failing 4 attempts. Reducers were waiting for the map outputs.

bq. Logs of the MR AM would be good to have.

I have attached the am log.

                
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog.dat
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290202#comment-13290202 ] 

Jason Lowe commented on MAPREDUCE-4292:
---------------------------------------

Nishan or Devaraj, have we been able to confirm that this issue only occurs when mapred.max.map.failures.percent (or mapreduce.map.failures.maxpercent) is set to a non-zero value?  I'm curious if you have been able to reproduce job hangs without that property being set.
                
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Nishan Shetty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nishan Shetty updated MAPREDUCE-4292:
-------------------------------------

    Attachment: syslog
    
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Devaraj K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj K reassigned MAPREDUCE-4292:
------------------------------------

    Assignee:     (was: Devaraj K)
    
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Nishan Shetty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nishan Shetty updated MAPREDUCE-4292:
-------------------------------------

    Attachment:     (was: syslog.dat)
    
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Devaraj K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj K resolved MAPREDUCE-4292.
----------------------------------

    Resolution: Duplicate

Dup of MAPREDUCE-3927.
                
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Devaraj K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj K reassigned MAPREDUCE-4292:
------------------------------------

    Assignee: Devaraj K
    
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Assignee: Devaraj K
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286568#comment-13286568 ] 

Jason Lowe commented on MAPREDUCE-4292:
---------------------------------------

If this is caused by mapred.max.map.failures.percent being non-zero then this is a duplicate of MAPREDUCE-3927.
                
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Assignee: Devaraj K
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Nishan Shetty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nishan Shetty updated MAPREDUCE-4292:
-------------------------------------

    Attachment:     (was: syslog)
    
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Nishan Shetty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nishan Shetty updated MAPREDUCE-4292:
-------------------------------------

    Attachment: syslog
    
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Nishan Shetty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nishan Shetty updated MAPREDUCE-4292:
-------------------------------------

    Attachment: syslog.dat
    
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog.dat
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4292) Job is hanging forever when some maps are failing always

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285819#comment-13285819 ] 

Jason Lowe commented on MAPREDUCE-4292:
---------------------------------------

I tried reproducing this with 2.0.0-alpha on a single-node cluster by modifying wordcount to fail map IDs < 4.  As expected, the job failed soon after a map failed four attempts.  Explicitly setting mapred.reduce.tasks=1 for the job had no effect.  Could you attach the job config?  I'm wondering if there are other property settings that are affecting the behavior of the job.  For example, is mapreduce.map.failures.maxpercent set to a non-zero value?
                
> Job is hanging forever when some maps are failing always
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4292
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4292
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Nishan Shetty
>            Priority: Critical
>         Attachments: syslog
>
>
> Set property "mapred.reduce.tasks" to some value greater than zero
> I have a job in which some maps are failing always. 
> Observations:
> 1.Map phase is completing with 100%(with succeeded and failed maps). 
> 2.Reduce phase is not progressing further after 32%.
> 3.After map phase is completed job is hanging forever.
> Expected that job should be failed after waiting for some time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira