You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2008/08/21 15:12:44 UTC

[jira] Created: (HADOOP-3987) TaskTracker.offerService could handle IO and Remote Exceptions better

TaskTracker.offerService could handle IO and Remote Exceptions better
---------------------------------------------------------------------

                 Key: HADOOP-3987
                 URL: https://issues.apache.org/jira/browse/HADOOP-3987
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.19.0
            Reporter: Steve Loughran


The core offerService() loop has a try/catch wrapper that catches and processes exceptions. Most cause offerService() to return, which then triggers a sleep and restart in the main loop. But some exceptions are just logged and ignored, which may be inappropriate


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3987) TaskTracker.offerService could handle IO and Remote Exceptions better

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624365#action_12624365 ] 

Steve Loughran commented on HADOOP-3987:
----------------------------------------

Here is the specific problems that appear to exist, though I could of course be mistaken.

1. All RemoteExceptions that are not an instance of DisallowedTaskTrackerException are ignored, nothing is even printed

        return State.STALE;
      } catch (RemoteException re) {
        String reClass = re.getClassName();
        if (DisallowedTaskTrackerException.class.getName().equals(reClass)) {
          LOG.info("Tasktracker disallowed by JobTracker.");
          return State.DENIED;
        }

2. All IOExceptions are logged, but not reported up; the service remains in the inner (sleepless) while loop.
      } catch (IOException except) {
        String msg = "Caught exception: " + except.getMessage();
        LOG.error(msg, except);
      }
    }

This may not be the best way to handle network and IO problems. 

3. the code that checks for the system directory off the job service will throw an IOException if none is provided, and exception that will be caught and logged in the code in (2).  If a JobTracker is returning null to getSystemDir(), then every TaskTracker that is bonded to it is going to spin, calling getSystemDir() on the server, logging the error and repeating, without any delay at all.

I'm not sure what the ideal exception handling policy here should be, but what is there today has weaknesses. If the network is playing up, logging RemoteExceptions and maybe inserting delays would be good; if JobTracker.getSystemDir() is null then the clients should sleep longer in the hope that someone will fix the job tracker, rather than spinning.



> TaskTracker.offerService could handle IO and Remote Exceptions better
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-3987
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3987
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>
> The core offerService() loop has a try/catch wrapper that catches and processes exceptions. Most cause offerService() to return, which then triggers a sleep and restart in the main loop. But some exceptions are just logged and ignored, which may be inappropriate

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.