You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Maysam Yabandeh (JIRA)" <ji...@apache.org> on 2014/08/11 20:32:14 UTC

[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock (scheduler page)

    [ https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093111#comment-14093111 ] 

Maysam Yabandeh commented on YARN-2405:
---------------------------------------

The problem seems to be that the two separate lists that maintain the list of apps are not in sync. The list of apps is taken from 
{code}
Map<ApplicationId, RMApp> rmContext.getRMApps() 
{code}
and then looked up in the second list in AbstractYarnScheduler
{code}
Map<ApplicationId, SchedulerApplication> applications
{code}
via the following code:
{code}
  public FSSchedulerApp getSchedulerApp(ApplicationAttemptId appAttemptId) {
    return (FSSchedulerApp) super.getApplicationAttempt(appAttemptId);
  }

  public T getApplicationAttempt(ApplicationAttemptId applicationAttemptId) {
    SchedulerApplication<T> app =
        applications.get(applicationAttemptId.getApplicationId());
    return app == null ? null : app.getCurrentAppAttempt();
  }
{code}
which returns null if it does not find the app attempt. The FairSchedulerAppsBlock does not check for the null returned value, thus NPE.

By code inspection we found one of such cases that it could happen. Not sure if it is the same case that we had though. Anyhow, checking for null return values by getSchedulerApp seems to be a broader fix that covers that cases that we have not discovered yet by code inspection.

One scenario that could potentially result into return null value is the following: FairScheduler#addApplication
{code}
    RMApp rmApp = rmContext.getRMApps().get(applicationId);
    FSLeafQueue queue = assignToQueue(rmApp, queueName, user);
    if (queue == null) {
      return;
    }
    // Enforce ACLs
    UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(user);
    if (...) {
      return;
    }
  
    SchedulerApplication application =
        new SchedulerApplication(queue, user);
    applications.put(applicationId, application);
{code}

> NPE in FairSchedulerAppsBlock (scheduler page)
> ----------------------------------------------
>
>                 Key: YARN-2405
>                 URL: https://issues.apache.org/jira/browse/YARN-2405
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Maysam Yabandeh
>
> FairSchedulerAppsBlock#render throws NPE at this line
> {code}
>       int fairShare = fsinfo.getAppFairShare(attemptId);
> {code}
> This causes the scheduler page now showing the app since it lack the definition of appsTableData
> {code}
>  Uncaught ReferenceError: appsTableData is not defined 
> {code}
> The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)