You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2011/06/04 05:25:47 UTC

[jira] [Updated] (MAPREDUCE-2529) Recognize Jetty bug 1342 and handle it

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-2529:
-------------------------------------

    Attachment: M2529-1.patch
                M2529-1-20s.patch

Minor nits:
* As a default, always incrementing the metric for undefined regex probably makes more sense
* {{null}} is probably a better default than the empty string
* There's a possible NPE if the exception message is {{null}}
* The unit test is setting combinations of the stack/message regex, but it calls {{checkStackException}} in a few places, which doesn't exercise that logic (I think it's covered, but that could be clearer)
* While this will be useful while we work around bugs emerging from Jetty, we should probably keep it as an undocumented config setting.
* The trunk patch updates {{MRJobConfig}}, which is for user jobs. Moved to {{JTConfig}}

This slight modification defines exceptions with {{null}} messages as matching no regexp. Let me know if it looks OK to you

> Recognize Jetty bug 1342 and handle it
> --------------------------------------
>
>                 Key: MAPREDUCE-2529
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2529
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.204.0, 0.23.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: M2529-1-20s.patch, M2529-1.patch, jetty1342-20security.patch, mapred2529-trunk.patch
>
>
> We are seeing many instances of the Jetty-1342 (http://jira.codehaus.org/browse/JETTY-1342). The bug doesn't cause Jetty to stop responding altogether, some fetches go through but a lot of them throw exceptions and eventually fail. The only way we have found to get the TT out of this state is to restart the TT.  This jira is to catch this particular exception (or perhaps a configurable regex) and handle it in an automated way to either blacklist or shutdown the TT after seeing it a configurable number of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira