You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by Jonathan Eagles <je...@gmail.com> on 2015/09/18 17:41:02 UTC

occational submitDag error

I have a user running 0.6.1 tez (git hash
6e588d15184dc691df1c0227f40db91d9bc6d7d6) and a few times a month the job
submitted returns this error.

2015-09-16 01:04:05,069 INFO [IPC Server handler 0 on 50500]
app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0
2015-09-16 01:04:05,617 INFO [IPC Server handler 0 on 50500]
history.HistoryEventHandler:
[HISTORY][DAG:dag_1440165794704_806241_1][Event:DAG_SUBMITTED]:
dagID=dag_1440165794704_806241_1, submitTime=1442365444946
2015-09-16 01:04:05,676 WARN [IPC Server handler 0 on 50500]
ipc.Server: IPC Server handler 0 on 50500, call
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG
from hostname:55741 Call#585 Retry#0
2015-09-16 01:04:05,679 FATAL [IPC Server handler 0 on 50500]
yarn.YarnUncaughtExceptionHandler: Thread Thread[IPC Server handler 0
on 50500,5,main] threw an Error.  Shutting down now...
2015-09-16 01:04:05,764 INFO [IPC Server handler 0 on 50500]
util.ExitUtil: Exiting with status -1


Normally it looks like this

2015-09-17 10:03:40,234 INFO [IPC Server handler 0 on 50503]
app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0
2015-09-17 10:03:40,659 INFO [IPC Server handler 0 on 50503]
history.HistoryEventHandler:
[HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_SUBMITTED]:
dagID=dag_1440165794704_873289_1, submitTime=1442484220062
2015-09-17 10:03:40,694 INFO [IPC Server handler 0 on 50503]
impl.VertexImpl: setting additional outputs for vertex scope-2573
2015-09-17 10:03:40,696 INFO [IPC Server handler 0 on 50503]
impl.DAGImpl: Using DAG Scheduler:
org.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrder
2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503]
history.HistoryEventHandler:
[HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_INITIALIZED]:
dagID=dag_1440165794704_873289_1, initTime=1442484220660
2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503]
impl.DAGImpl: dag_1440165794704_873289_1 transitioned from NEW to
INITED
...

and then it succeeds.

Is there any known escaped Exceptions over rpc that have been fixed
that would solved this issue?

Jon

Re: occational submitDag error

Posted by Hitesh Shah <hi...@apache.org>.
The only layer I can think of in that path might be the recovery layer which is a blocking call that writes the dag submission information to the recovery log. If that fails, the dag submission should also then fail. Though it could potentially be optimized to let the dag submission to go through, run the dag but with recovery disabled ( double-edged sword if the dag is a large one ).

However, that does not really explain the “YarnUncaughtExceptionHandler: Thread Thread[IPC Server handler 0 on 50500,5,main] threw an Error.  Shutting down now…” 

TEZ-2663 is something that could be applicable to branch 0.6 but not sure if it will help track this particular issue down.

thanks
— Hitesh

On Sep 18, 2015, at 8:41 AM, Jonathan Eagles <je...@gmail.com> wrote:

> I have a user running 0.6.1 tez (git hash
> 6e588d15184dc691df1c0227f40db91d9bc6d7d6) and a few times a month the job
> submitted returns this error.
> 
> 2015-09-16 01:04:05,069 INFO [IPC Server handler 0 on 50500]
> app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0
> 2015-09-16 01:04:05,617 INFO [IPC Server handler 0 on 50500]
> history.HistoryEventHandler:
> [HISTORY][DAG:dag_1440165794704_806241_1][Event:DAG_SUBMITTED]:
> dagID=dag_1440165794704_806241_1, submitTime=1442365444946
> 2015-09-16 01:04:05,676 WARN [IPC Server handler 0 on 50500]
> ipc.Server: IPC Server handler 0 on 50500, call
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG
> from hostname:55741 Call#585 Retry#0
> 2015-09-16 01:04:05,679 FATAL [IPC Server handler 0 on 50500]
> yarn.YarnUncaughtExceptionHandler: Thread Thread[IPC Server handler 0
> on 50500,5,main] threw an Error.  Shutting down now...
> 2015-09-16 01:04:05,764 INFO [IPC Server handler 0 on 50500]
> util.ExitUtil: Exiting with status -1
> 
> 
> Normally it looks like this
> 
> 2015-09-17 10:03:40,234 INFO [IPC Server handler 0 on 50503]
> app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0
> 2015-09-17 10:03:40,659 INFO [IPC Server handler 0 on 50503]
> history.HistoryEventHandler:
> [HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_SUBMITTED]:
> dagID=dag_1440165794704_873289_1, submitTime=1442484220062
> 2015-09-17 10:03:40,694 INFO [IPC Server handler 0 on 50503]
> impl.VertexImpl: setting additional outputs for vertex scope-2573
> 2015-09-17 10:03:40,696 INFO [IPC Server handler 0 on 50503]
> impl.DAGImpl: Using DAG Scheduler:
> org.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrder
> 2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503]
> history.HistoryEventHandler:
> [HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_INITIALIZED]:
> dagID=dag_1440165794704_873289_1, initTime=1442484220660
> 2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503]
> impl.DAGImpl: dag_1440165794704_873289_1 transitioned from NEW to
> INITED
> ...
> 
> and then it succeeds.
> 
> Is there any known escaped Exceptions over rpc that have been fixed
> that would solved this issue?
> 
> Jon