You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Nicolas Bär (JIRA)" <ji...@apache.org> on 2014/08/09 01:38:13 UTC

[jira] [Created] (SAMZA-376) ApplicationMaster Timeout after LeaderNotAvailableException

Nicolas Bär created SAMZA-376:
---------------------------------

             Summary: ApplicationMaster Timeout after LeaderNotAvailableException
                 Key: SAMZA-376
                 URL: https://issues.apache.org/jira/browse/SAMZA-376
             Project: Samza
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Nicolas Bär
            Priority: Minor


The application master does not send a heartbeat to the resource manager if the leader of the topic is not available. It will retry until the leader is available and then send the heartbeat. If the Kafka cluster is busy during this time, the leader election might take a moment and the timeout is reached resulting in a shutdown of the application master.

I hit this issue on our testbed and received a few follow-up error messages after the application master was restarted: 
{quote}
ERROR security.UserGroupInformation: PriviledgedActionException as:baer (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Password not found for ApplicationAttempt appattempt_1407522131931_0001_000001
{quote}
I will investigate in this further, but assume it is better placed at the YARN mailing list.

Here is the relevant part from our discussion on IRC (criccomini):
{quote}
SamzaAppMaster
you'll see:       amClient.start
and later,       amClient.stop
the start is starting the YARN AMClient's heartbeat
now
SamzaAppMasterTaskManager
calls assignContainerToSSPTaskNames
in Util
which calls Util.getInputStreamPartitions(config)
and THAT is where Kafka is called
so basically
before amClient.start is called
that getInputStreamPartitiosn method is invoked
which will block on metadata timeouts
until it can get the data it needs
so SamzaAppMaster is constructing SamzaAppMasterTaskManager before it calls amClient.start
{quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)