You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Prabhu Joseph <pr...@gmail.com> on 2016/01/14 06:01:03 UTC

Spark on YARN job continuously reports "Application does not exist in cache"

Hi All,

  When we submit Spark jobs on YARN, during RM failover, we see lot of jobs
reporting below error messages.


*2016-01-11 09:41:06,682 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Unregistering app attempt : appattempt_1450676950893_0280_000001*
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1450676950893_0280_000001 State change from FINAL_SAVING to
FAILED
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1450676950893_0280 State change from RUNNING to ACCEPTED
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Application appattempt_1450676950893_0280_000001 is done. finalState=FAILED
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering app attempt : appattempt_1450676950893_0280_000002
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
Application application_1450676950893_0280 requests cleared
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1450676950893_0280_000002 State change from NEW to SUBMITTED
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Cleaning master appattempt_1450676950893_0280_000001
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Added Application Attempt appattempt_1450676950893_0280_000002 to scheduler
from user: glenm
2016-01-11 09:41:06,683 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1450676950893_0280_000002 State change from SUBMITTED to
SCHEDULED




*2016-01-11 09:41:06,747 ERROR
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
AppAttemptId doesnt exist in cache appattempt_1450676950893_0280_000001*
ResourceManager has a ConcurrentMap where it puts applicationId during
resgistering of application attempt, and when there is
finishApplicationMaster request, it gets the entry from ConcurrentMap, if
there if no entry present, it throws that ERROR message. When there is
unregistering Application Attempt, it removes the entry.

So, after the unregistering application attempt, there are many
finishApplicationMaster request causing the ERROR.

Need your help to understand on what scenario the above happens.


JIRA's related are

https://issues.apache.org/jira/browse/SPARK-1032
https://issues.apache.org/jira/browse/SPARK-3072



Thanks,
Prabhu Joseph