You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Jeff Zhang <zj...@gmail.com> on 2014/06/26 09:37:46 UTC

Should we display the real exception in client side if job failed ?

Hi all,

I have a tez job which is failed due to that I didn't put my jar to the
local resources. But on the client side, the exception is not clear for me
to figure what's wrong with it. The real reason is that It couldn't load
the Processor class. I have to run command "yarn logs" to find the real
exception in the container logs.  I also have another case that has
exception in the my Processor, the message on the client side is still not
clear to me. I think that should we pass the real exception to the
diagnostics and display it in client side, this should help user to find
out what's wrong with their program. Let me know your comments, thanks (
following is the logs in client side and container )


*Exception on client side*

14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:
114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
failed, info=[Container container_1403765612557_0004_01_000002 COMPLETED
with diagnostics set to [Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:505)

at org.apache.hadoop.util.Shell.run(Shell.java:418)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)

at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
DefaultContainerExecutor.java:195)

at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
ContainerLaunch.java:300)

at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
ContainerLaunch.java:81)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1
*The reason exception:*

2014-06-26 14:57:02,146 ERROR [main]
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
Thread[main,5,main] threw an Exception.
org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
com.zjffdu.tutorial.tez.WordCount$TokenProcessor
    at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
    at
org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
    at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
    at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
    at
org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
    at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
    at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)


-- 
Best Regards

Jeff Zhang

Re: Should we display the real exception in client side if job failed ?

Posted by Jeff Zhang <zj...@gmail.com>.
Besides, IMHO if user forget to specify the LocalResource for his
Processor, it should cause a TaskAttempt Fail rather than Container Fail.


On Thu, Jun 26, 2014 at 4:43 PM, Jeff Zhang <zj...@gmail.com> wrote:

> I did some experiment by making some code change to send the real
> exception to client side. Let me know your comments whether this is
> valuable to fix it.
>
>
> On Thu, Jun 26, 2014 at 3:37 PM, Jeff Zhang <zj...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a tez job which is failed due to that I didn't put my jar to the
>> local resources. But on the client side, the exception is not clear for me
>> to figure what's wrong with it. The real reason is that It couldn't load
>> the Processor class. I have to run command "yarn logs" to find the real
>> exception in the container logs.  I also have another case that has
>> exception in the my Processor, the message on the client side is still not
>> clear to me. I think that should we pass the real exception to the
>> diagnostics and display it in client side, this should help user to find
>> out what's wrong with their program. Let me know your comments, thanks (
>> following is the logs in client side and container )
>>
>>
>> *Exception on client side*
>>
>> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
>> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:
>> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
>> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
>> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
>> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
>> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
>> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
>> failed, info=[Container container_1403765612557_0004_01_000002 COMPLETED
>> with diagnostics set to [Exception from container-launch:
>> org.apache.hadoop.util.Shell$ExitCodeException:
>> org.apache.hadoop.util.Shell$ExitCodeException: at
>> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>>
>> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>>
>> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
>> Shell.java:650)
>>
>> at
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
>> DefaultContainerExecutor.java:195)
>>
>> at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
>> ContainerLaunch.java:300)
>>
>> at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
>> ContainerLaunch.java:81)
>>
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>>
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Container exited with a non-zero exit code 1
>> *The reason exception:*
>>
>> 2014-06-26 14:57:02,146 ERROR [main]
>> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
>> Thread[main,5,main] threw an Exception.
>> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
>> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>>     at
>> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>>     at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>>     at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>>     at
>> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

RE: Should we display the real exception in client side if job failed ?

Posted by Bikas Saha <bi...@hortonworks.com>.
Please open a jira and post the code changes. Thanks! This may be related to
the fact that we re-direct the stdout/stderr to files explicitly. There is a
YARN bug open for this.

Bikas

-----Original Message-----
From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Thursday, June 26, 2014 1:43 AM
To: dev@tez.incubator.apache.org; user@tez.incubator.apache.org
Subject: Re: Should we display the real exception in client side if job
failed ?

I did some experiment by making some code change to send the real exception
to client side. Let me know your comments whether this is valuable to fix
it.


On Thu, Jun 26, 2014 at 3:37 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> I have a tez job which is failed due to that I didn't put my jar to
> the local resources. But on the client side, the exception is not
> clear for me to figure what's wrong with it. The real reason is that
> It couldn't load the Processor class. I have to run command "yarn
> logs" to find the real exception in the container logs.  I also have
> another case that has exception in the my Processor, the message on
> the client side is still not clear to me. I think that should we pass
> the real exception to the diagnostics and display it in client side,
> this should help user to find out what's wrong with their program. Let
> me know your comments, thanks ( following is the logs in client side
> and container )
>
>
> *Exception on client side*
>
> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0
> Killed:
> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1403765612557_0004_01_000002
> COMPLETED with diagnostics set to [Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
> Shell.java:650)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.lau
> nchContainer(
> DefaultContainerExecutor.java:195)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Co
> ntainerLaunch.call(
> ContainerLaunch.java:300)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Co
> ntainerLaunch.call(
> ContainerLaunch.java:81)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Container exited with a non-zero exit code 1 *The reason exception:*
>
> 2014-06-26 14:57:02,146 ERROR [main]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
> Thread[main,5,main] threw an Exception.
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>     at
> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>     at
> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
>
>
> --
> Best Regards
>
> Jeff Zhang
>



--
Best Regards

Jeff Zhang

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Should we display the real exception in client side if job failed ?

Posted by Bikas Saha <bi...@hortonworks.com>.
Please open a jira and post the code changes. Thanks! This may be related to
the fact that we re-direct the stdout/stderr to files explicitly. There is a
YARN bug open for this.

Bikas

-----Original Message-----
From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Thursday, June 26, 2014 1:43 AM
To: dev@tez.incubator.apache.org; user@tez.incubator.apache.org
Subject: Re: Should we display the real exception in client side if job
failed ?

I did some experiment by making some code change to send the real exception
to client side. Let me know your comments whether this is valuable to fix
it.


On Thu, Jun 26, 2014 at 3:37 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> I have a tez job which is failed due to that I didn't put my jar to
> the local resources. But on the client side, the exception is not
> clear for me to figure what's wrong with it. The real reason is that
> It couldn't load the Processor class. I have to run command "yarn
> logs" to find the real exception in the container logs.  I also have
> another case that has exception in the my Processor, the message on
> the client side is still not clear to me. I think that should we pass
> the real exception to the diagnostics and display it in client side,
> this should help user to find out what's wrong with their program. Let
> me know your comments, thanks ( following is the logs in client side
> and container )
>
>
> *Exception on client side*
>
> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0
> Killed:
> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1403765612557_0004_01_000002
> COMPLETED with diagnostics set to [Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
> Shell.java:650)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.lau
> nchContainer(
> DefaultContainerExecutor.java:195)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Co
> ntainerLaunch.call(
> ContainerLaunch.java:300)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Co
> ntainerLaunch.call(
> ContainerLaunch.java:81)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Container exited with a non-zero exit code 1 *The reason exception:*
>
> 2014-06-26 14:57:02,146 ERROR [main]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
> Thread[main,5,main] threw an Exception.
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>     at
> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>     at
> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
>
>
> --
> Best Regards
>
> Jeff Zhang
>



--
Best Regards

Jeff Zhang

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Should we display the real exception in client side if job failed ?

Posted by Mohammad Islam <mi...@yahoo.com.INVALID>.
Hi Jeff,
By reading your explanations, it sounds a  good idea. You can create a JIRA and , if possible, put your patch for the review.
Regards,
Mohammad


On Thursday, June 26, 2014 1:50 AM, Jeff Zhang <zj...@gmail.com> wrote:
 


I did some experiment by making some code change to send the real exception
to client side. Let me know your comments whether this is valuable to fix
it.


On Thu, Jun 26, 2014 at 3:37 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> I have a tez job which is failed due to that I didn't put my jar to the
> local resources. But on the client side, the exception is not clear for me
> to figure what's wrong with it. The real reason is that It couldn't load
> the Processor class. I have to run command "yarn logs" to find the real
> exception in the container logs.  I also have another case that has
> exception in the my Processor, the message on the client side is still not
> clear to me. I think that should we pass the real exception to the
> diagnostics and display it in client side, this should help user to find
> out what's wrong with their program. Let me know your comments, thanks (
> following is the logs in client side and container )
>
>
> *Exception on client side*
>
> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:
> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1403765612557_0004_01_000002 COMPLETED
> with diagnostics set to [Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
> Shell.java:650)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
> DefaultContainerExecutor.java:195)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:300)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:81)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Container exited with a non-zero exit code 1
> *The reason exception:*
>
> 2014-06-26 14:57:02,146 ERROR [main]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
> Thread[main,5,main] threw an Exception.
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>     at
> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>     at
> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
>
>
> --
> Best Regards
>
> Jeff Zhang

>



-- 
Best Regards

Jeff Zhang

Re: Should we display the real exception in client side if job failed ?

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Jeff,
By reading your explanations, it sounds a  good idea. You can create a JIRA and , if possible, put your patch for the review.
Regards,
Mohammad


On Thursday, June 26, 2014 1:50 AM, Jeff Zhang <zj...@gmail.com> wrote:
 


I did some experiment by making some code change to send the real exception
to client side. Let me know your comments whether this is valuable to fix
it.


On Thu, Jun 26, 2014 at 3:37 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> I have a tez job which is failed due to that I didn't put my jar to the
> local resources. But on the client side, the exception is not clear for me
> to figure what's wrong with it. The real reason is that It couldn't load
> the Processor class. I have to run command "yarn logs" to find the real
> exception in the container logs.  I also have another case that has
> exception in the my Processor, the message on the client side is still not
> clear to me. I think that should we pass the real exception to the
> diagnostics and display it in client side, this should help user to find
> out what's wrong with their program. Let me know your comments, thanks (
> following is the logs in client side and container )
>
>
> *Exception on client side*
>
> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:
> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1403765612557_0004_01_000002 COMPLETED
> with diagnostics set to [Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
> Shell.java:650)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
> DefaultContainerExecutor.java:195)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:300)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:81)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Container exited with a non-zero exit code 1
> *The reason exception:*
>
> 2014-06-26 14:57:02,146 ERROR [main]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
> Thread[main,5,main] threw an Exception.
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>     at
> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>     at
> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
>
>
> --
> Best Regards
>
> Jeff Zhang

>



-- 
Best Regards

Jeff Zhang

Re: Should we display the real exception in client side if job failed ?

Posted by Jeff Zhang <zj...@gmail.com>.
Besides, IMHO if user forget to specify the LocalResource for his
Processor, it should cause a TaskAttempt Fail rather than Container Fail.


On Thu, Jun 26, 2014 at 4:43 PM, Jeff Zhang <zj...@gmail.com> wrote:

> I did some experiment by making some code change to send the real
> exception to client side. Let me know your comments whether this is
> valuable to fix it.
>
>
> On Thu, Jun 26, 2014 at 3:37 PM, Jeff Zhang <zj...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a tez job which is failed due to that I didn't put my jar to the
>> local resources. But on the client side, the exception is not clear for me
>> to figure what's wrong with it. The real reason is that It couldn't load
>> the Processor class. I have to run command "yarn logs" to find the real
>> exception in the container logs.  I also have another case that has
>> exception in the my Processor, the message on the client side is still not
>> clear to me. I think that should we pass the real exception to the
>> diagnostics and display it in client side, this should help user to find
>> out what's wrong with their program. Let me know your comments, thanks (
>> following is the logs in client side and container )
>>
>>
>> *Exception on client side*
>>
>> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
>> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:
>> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
>> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
>> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
>> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
>> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
>> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
>> failed, info=[Container container_1403765612557_0004_01_000002 COMPLETED
>> with diagnostics set to [Exception from container-launch:
>> org.apache.hadoop.util.Shell$ExitCodeException:
>> org.apache.hadoop.util.Shell$ExitCodeException: at
>> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>>
>> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>>
>> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
>> Shell.java:650)
>>
>> at
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
>> DefaultContainerExecutor.java:195)
>>
>> at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
>> ContainerLaunch.java:300)
>>
>> at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
>> ContainerLaunch.java:81)
>>
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>>
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Container exited with a non-zero exit code 1
>> *The reason exception:*
>>
>> 2014-06-26 14:57:02,146 ERROR [main]
>> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
>> Thread[main,5,main] threw an Exception.
>> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
>> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>>     at
>> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>>     at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>>     at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>>     at
>> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Re: Should we display the real exception in client side if job failed ?

Posted by Jeff Zhang <zj...@gmail.com>.
I did some experiment by making some code change to send the real exception
to client side. Let me know your comments whether this is valuable to fix
it.


On Thu, Jun 26, 2014 at 3:37 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> I have a tez job which is failed due to that I didn't put my jar to the
> local resources. But on the client side, the exception is not clear for me
> to figure what's wrong with it. The real reason is that It couldn't load
> the Processor class. I have to run command "yarn logs" to find the real
> exception in the container logs.  I also have another case that has
> exception in the my Processor, the message on the client side is still not
> clear to me. I think that should we pass the real exception to the
> diagnostics and display it in client side, this should help user to find
> out what's wrong with their program. Let me know your comments, thanks (
> following is the logs in client side and container )
>
>
> *Exception on client side*
>
> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:
> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1403765612557_0004_01_000002 COMPLETED
> with diagnostics set to [Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
> Shell.java:650)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
> DefaultContainerExecutor.java:195)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:300)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:81)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Container exited with a non-zero exit code 1
> *The reason exception:*
>
> 2014-06-26 14:57:02,146 ERROR [main]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
> Thread[main,5,main] threw an Exception.
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>     at
> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>     at
> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Re: Should we display the real exception in client side if job failed ?

Posted by Jeff Zhang <zj...@gmail.com>.
I did some experiment by making some code change to send the real exception
to client side. Let me know your comments whether this is valuable to fix
it.


On Thu, Jun 26, 2014 at 3:37 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> I have a tez job which is failed due to that I didn't put my jar to the
> local resources. But on the client side, the exception is not clear for me
> to figure what's wrong with it. The real reason is that It couldn't load
> the Processor class. I have to run command "yarn logs" to find the real
> exception in the container logs.  I also have another case that has
> exception in the my Processor, the message on the client side is still not
> clear to me. I think that should we pass the real exception to the
> diagnostics and display it in client side, this should help user to find
> out what's wrong with their program. Let me know your comments, thanks (
> following is the logs in client side and container )
>
>
> *Exception on client side*
>
> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:
> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1403765612557_0004_01_000002 COMPLETED
> with diagnostics set to [Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
> Shell.java:650)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
> DefaultContainerExecutor.java:195)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:300)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:81)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Container exited with a non-zero exit code 1
> *The reason exception:*
>
> 2014-06-26 14:57:02,146 ERROR [main]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
> Thread[main,5,main] threw an Exception.
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>     at
> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>     at
> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang