You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tez.apache.org by "r7raul1984@163.com" <r7...@163.com> on 2015/05/29 10:15:33 UTC

Tez lauche container error when use UseG1GC

 I change my mapreduce.map.java.opts  's  value from -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249

When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.

set mapreduce.framework.name=yarn-tez; 
set hive.execution.engine=tez; 
select userid,count(*) from u_data group by userid order by userid;
The  query return error.
I found error :
2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] container.AMContainerImpl: Container container_1432885077153_0004_01_000005 finished with diagnostics set to [Container failed. Exception from container-launch. 
Container id: container_1432885077153_0004_01_000005 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

But I try
hive> set hive.execution.engine=mr; 
hive> set mapreduce.framework.name=yarn; 
hive> select userid,count(*) from u_data group by userid order by userid limit 1; 
Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9 
Total jobs = 2 
Launching Job 1 out of 2 
Number of reduce tasks not specified. Estimated from input data size: 1 
In order to change the average load for a reducer (in bytes): 
set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
set mapreduce.job.reduces=<number> 
Starting Job = job_1432885077153_0005, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0005/ 
Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0005 
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 
2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0% 
2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 sec 
2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.96 sec 
MapReduce Total cumulative CPU time: 2 seconds 960 msec 
Ended Job = job_1432885077153_0005 
Launching Job 2 out of 2 
Number of reduce tasks determined at compile time: 1 
In order to change the average load for a reducer (in bytes): 
set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
set mapreduce.job.reduces=<number> 
Starting Job = job_1432885077153_0006, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0006/ 
Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0006 
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 
2015-05-29 16:07:03,333 Stage-2 map = 0%, reduce = 0% 
2015-05-29 16:07:07,485 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec 
2015-05-29 16:07:15,739 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.35 sec 
MapReduce Total cumulative CPU time: 2 seconds 350 msec 
Ended Job = job_1432885077153_0006 
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.96 sec HDFS Read: 1985399 HDFS Write: 20068 SUCCESS 
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.35 sec HDFS Read: 24481 HDFS Write: 6 SUCCESS 
Total MapReduce CPU Time Spent: 5 seconds 310 msec 

That's ok.




r7raul1984@163.com

Re: Re: Tez lauche container error when use UseG1GC

Posted by "r7raul1984@163.com" <r7...@163.com>.

Log is:
Status: Running (Executing on YARN cluster with App id application_1432885077153_0011) 

-------------------------------------------------------------------------------- 
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
-------------------------------------------------------------------------------- 
Map 1 FAILED 1 0 0 1 4 0 
Reducer 2 KILLED 1 0 0 1 0 1 
Reducer 3 KILLED 1 0 0 1 0 1 
-------------------------------------------------------------------------------- 
VERTICES: 00/03 [>>--------------------------] 0% ELAPSED TIME: 16.13 s 
-------------------------------------------------------------------------------- 
Status: Failed 
Vertex failed, vertexName=Map 1, vertexId=vertex_1432885077153_0011_1_00, diagnostics=[Task failed, taskId=task_1432885077153_0011_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1432885077153_0011_01_000002 finished with diagnostics set to [Container failed. Exception from container-launch. 
Container id: container_1432885077153_0011_01_000002 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

Container exited with a non-zero exit code 1 
]], TaskAttempt 1 failed, info=[Container container_1432885077153_0011_01_000003 finished with diagnostics set to [Container failed. Exception from container-launch. 
Container id: container_1432885077153_0011_01_000003 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

Container exited with a non-zero exit code 1 
]], TaskAttempt 2 failed, info=[Container container_1432885077153_0011_01_000004 finished with diagnostics set to [Container failed. Exception from container-launch. 
Container id: container_1432885077153_0011_01_000004 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

Container exited with a non-zero exit code 1 
]], TaskAttempt 3 failed, info=[Container container_1432885077153_0011_01_000005 finished with diagnostics set to [Container failed. Exception from container-launch. 
Container id: container_1432885077153_0011_01_000005 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

Container exited with a non-zero exit code 1 
]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1432885077153_0011_1_00 [Map 1] killed/failed due to:null] 
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1432885077153_0011_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1432885077153_0011_1_01 [Reducer 2] killed/failed due to:null] 
Vertex killed, vertexName=Reducer 3, vertexId=vertex_1432885077153_0011_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1432885077153_0011_1_02 [Reducer 3] killed/failed due to:null] 
DAG failed due to vertex failure. failedVertices:1 killedVertices:2 
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask 

yarn logs -applicationId application_1432885077153_0011 
15/06/01 08:07:01 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032 
Logs not available at /tmp/logs/root/logs/application_1432885077153_0011 
Log aggregation has not completed or is not enabled. 

r7raul1984@163.com

From: Hitesh Shah
Date: 2015-05-29 23:31
To: user
Subject: Re: Tez lauche container error when use UseG1GC
To clarify, given that the error is showing up with container_1432885077153_0004_01_000005, that means that the AM launched properly. 

Use “bin/yarn logs -applicationId application_1432885077153_0004" to get the logs. See if there are any errors for the logs for container_1432885077153_0004_01_000005. If there are none, you will need to search for "Assigning container to task” for the above container in the AM’s logs. Using this log line, you will see what host the container belongs to and you should then look at the NodeManager logs and search for the container id.

The above would be a lot simpler if you have the UI setup to work against 0.5.3 but may still require you to dig through the NodeManager logs. 

thanks
— Hitesh 

On May 29, 2015, at 3:48 AM, Jianfeng (Jeff) Zhang <jz...@hortonworks.com> wrote:

> 
> Could you check the yarn app logs to see what the error is ?  If there’s still no useful info, you may refer the yarn RM/NN logs
> 
> 
> 
> 
> Best Regard,
> Jeff Zhang
> 
> 
> From: "r7raul1984@163.com" <r7...@163.com>
> Reply-To: user <us...@tez.apache.org>
> Date: Friday, May 29, 2015 at 4:16 PM
> To: user <us...@tez.apache.org>
> Subject: Re: Tez lauche container error when use UseG1GC
> 
> BTW my tez_site.xml content is:
> <configuration> 
> <property> 
> <name>tez.lib.uris</name> 
> <value>hdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz</value> 
> </property> 
> <property> 
> <name>tez.task.generate.counters.per.io</name> 
> <value>true</value> 
> </property> 
> <property> 
> <description>Log history using the Timeline Server</description> 
> <name>tez.history.logging.service.class</name> 
> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value> 
> </property> 
> <property> 
> <description>Publish configuration information to Timeline server </description> 
> <name>tez.runtime.convert.user-payload.to.history-text</name> 
> <value>true</value> 
> </property> 
> <property> 
> <name>tez.am.launch.cmd-opts</name> 
> <value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/</value> 
> </property> 
> 
> </configuration>
> 
> r7raul1984@163.com
>  
> From: r7raul1984@163.com
> Date: 2015-05-29 16:15
> To: user
> Subject: Tez lauche container error when use UseG1GC
>  I change my mapreduce.map.java.opts  's  value from -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249
> 
> When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.
> 
> set mapreduce.framework.name=yarn-tez; 
> set hive.execution.engine=tez; 
> select userid,count(*) from u_data group by userid order by userid;
> The  query return error.
> I found error :
> 2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] container.AMContainerImpl: Container container_1432885077153_0004_01_000005 finished with diagnostics set to [Container failed. Exception from container-launch. 
> Container id: container_1432885077153_0004_01_000005 
> Exit code: 1 
> Stack trace: ExitCodeException exitCode=1: 
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
> at org.apache.hadoop.util.Shell.run(Shell.java:455) 
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
> at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) 
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
> at java.lang.Thread.run(Thread.java:745) 
> 
> But I try
> hive> set hive.execution.engine=mr; 
> hive> set mapreduce.framework.name=yarn; 
> hive> select userid,count(*) from u_data group by userid order by userid limit 1; 
> Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9 
> Total jobs = 2 
> Launching Job 1 out of 2 
> Number of reduce tasks not specified. Estimated from input data size: 1 
> In order to change the average load for a reducer (in bytes): 
> set hive.exec.reducers.bytes.per.reducer=<number> 
> In order to limit the maximum number of reducers: 
> set hive.exec.reducers.max=<number> 
> In order to set a constant number of reducers: 
> set mapreduce.job.reduces=<number> 
> Starting Job = job_1432885077153_0005, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0005/ 
> Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0005 
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 
> 2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0% 
> 2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 sec 
> 2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.96 sec 
> MapReduce Total cumulative CPU time: 2 seconds 960 msec 
> Ended Job = job_1432885077153_0005 
> Launching Job 2 out of 2 
> Number of reduce tasks determined at compile time: 1 
> In order to change the average load for a reducer (in bytes): 
> set hive.exec.reducers.bytes.per.reducer=<number> 
> In order to limit the maximum number of reducers: 
> set hive.exec.reducers.max=<number> 
> In order to set a constant number of reducers: 
> set mapreduce.job.reduces=<number> 
> Starting Job = job_1432885077153_0006, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0006/ 
> Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0006 
> Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 
> 2015-05-29 16:07:03,333 Stage-2 map = 0%, reduce = 0% 
> 2015-05-29 16:07:07,485 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec 
> 2015-05-29 16:07:15,739 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.35 sec 
> MapReduce Total cumulative CPU time: 2 seconds 350 msec 
> Ended Job = job_1432885077153_0006 
> MapReduce Jobs Launched: 
> Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.96 sec HDFS Read: 1985399 HDFS Write: 20068 SUCCESS 
> Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.35 sec HDFS Read: 24481 HDFS Write: 6 SUCCESS 
> Total MapReduce CPU Time Spent: 5 seconds 310 msec 
> 
> That's ok.
> 
> 
> r7raul1984@163.com

Re: Tez lauche container error when use UseG1GC

Posted by Hitesh Shah <hi...@apache.org>.

To clarify, given that the error is showing up with container_1432885077153_0004_01_000005, that means that the AM launched properly. 

Use “bin/yarn logs -applicationId application_1432885077153_0004" to get the logs. See if there are any errors for the logs for container_1432885077153_0004_01_000005. If there are none, you will need to search for "Assigning container to task” for the above container in the AM’s logs. Using this log line, you will see what host the container belongs to and you should then look at the NodeManager logs and search for the container id.

The above would be a lot simpler if you have the UI setup to work against 0.5.3 but may still require you to dig through the NodeManager logs. 

thanks
— Hitesh 

On May 29, 2015, at 3:48 AM, Jianfeng (Jeff) Zhang <jz...@hortonworks.com> wrote:

> 
> Could you check the yarn app logs to see what the error is ?  If there’s still no useful info, you may refer the yarn RM/NN logs
> 
> 
> 
> 
> Best Regard,
> Jeff Zhang
> 
> 
> From: "r7raul1984@163.com" <r7...@163.com>
> Reply-To: user <us...@tez.apache.org>
> Date: Friday, May 29, 2015 at 4:16 PM
> To: user <us...@tez.apache.org>
> Subject: Re: Tez lauche container error when use UseG1GC
> 
> BTW my tez_site.xml content is:
> <configuration> 
> <property> 
> <name>tez.lib.uris</name> 
> <value>hdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz</value> 
> </property> 
> <property> 
> <name>tez.task.generate.counters.per.io</name> 
> <value>true</value> 
> </property> 
> <property> 
> <description>Log history using the Timeline Server</description> 
> <name>tez.history.logging.service.class</name> 
> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value> 
> </property> 
> <property> 
> <description>Publish configuration information to Timeline server </description> 
> <name>tez.runtime.convert.user-payload.to.history-text</name> 
> <value>true</value> 
> </property> 
> <property> 
> <name>tez.am.launch.cmd-opts</name> 
> <value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/</value> 
> </property> 
> 
> </configuration>
> 
> r7raul1984@163.com
>  
> From: r7raul1984@163.com
> Date: 2015-05-29 16:15
> To: user
> Subject: Tez lauche container error when use UseG1GC
>  I change my mapreduce.map.java.opts  's  value from -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249
> 
> When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.
> 
> set mapreduce.framework.name=yarn-tez; 
> set hive.execution.engine=tez; 
> select userid,count(*) from u_data group by userid order by userid;
> The  query return error.
> I found error :
> 2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] container.AMContainerImpl: Container container_1432885077153_0004_01_000005 finished with diagnostics set to [Container failed. Exception from container-launch. 
> Container id: container_1432885077153_0004_01_000005 
> Exit code: 1 
> Stack trace: ExitCodeException exitCode=1: 
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
> at org.apache.hadoop.util.Shell.run(Shell.java:455) 
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
> at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) 
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
> at java.lang.Thread.run(Thread.java:745) 
> 
> But I try
> hive> set hive.execution.engine=mr; 
> hive> set mapreduce.framework.name=yarn; 
> hive> select userid,count(*) from u_data group by userid order by userid limit 1; 
> Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9 
> Total jobs = 2 
> Launching Job 1 out of 2 
> Number of reduce tasks not specified. Estimated from input data size: 1 
> In order to change the average load for a reducer (in bytes): 
> set hive.exec.reducers.bytes.per.reducer=<number> 
> In order to limit the maximum number of reducers: 
> set hive.exec.reducers.max=<number> 
> In order to set a constant number of reducers: 
> set mapreduce.job.reduces=<number> 
> Starting Job = job_1432885077153_0005, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0005/ 
> Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0005 
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 
> 2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0% 
> 2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 sec 
> 2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.96 sec 
> MapReduce Total cumulative CPU time: 2 seconds 960 msec 
> Ended Job = job_1432885077153_0005 
> Launching Job 2 out of 2 
> Number of reduce tasks determined at compile time: 1 
> In order to change the average load for a reducer (in bytes): 
> set hive.exec.reducers.bytes.per.reducer=<number> 
> In order to limit the maximum number of reducers: 
> set hive.exec.reducers.max=<number> 
> In order to set a constant number of reducers: 
> set mapreduce.job.reduces=<number> 
> Starting Job = job_1432885077153_0006, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0006/ 
> Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0006 
> Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 
> 2015-05-29 16:07:03,333 Stage-2 map = 0%, reduce = 0% 
> 2015-05-29 16:07:07,485 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec 
> 2015-05-29 16:07:15,739 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.35 sec 
> MapReduce Total cumulative CPU time: 2 seconds 350 msec 
> Ended Job = job_1432885077153_0006 
> MapReduce Jobs Launched: 
> Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.96 sec HDFS Read: 1985399 HDFS Write: 20068 SUCCESS 
> Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.35 sec HDFS Read: 24481 HDFS Write: 6 SUCCESS 
> Total MapReduce CPU Time Spent: 5 seconds 310 msec 
> 
> That's ok.
> 
> 
> r7raul1984@163.com

Re: Tez lauche container error when use UseG1GC

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.

Could you check the yarn app logs to see what the error is ?  If there's still no useful info, you may refer the yarn RM/NN logs




Best Regard,
Jeff Zhang


From: "r7raul1984@163.com<ma...@163.com>" <r7...@163.com>>
Reply-To: user <us...@tez.apache.org>>
Date: Friday, May 29, 2015 at 4:16 PM
To: user <us...@tez.apache.org>>
Subject: Re: Tez lauche container error when use UseG1GC

BTW my tez_site.xml content is:
<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz</value>
</property>
<property>
<name>tez.task.generate.counters.per.io</name>
<value>true</value>
</property>
<property>
<description>Log history using the Timeline Server</description>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<description>Publish configuration information to Timeline server </description>
<name>tez.runtime.convert.user-payload.to.history-text</name>
<value>true</value>
</property>
<property>
<name>tez.am.launch.cmd-opts</name>
<value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/</value>
</property>

</configuration>

________________________________
r7raul1984@163.com<ma...@163.com>

From: r7raul1984@163.com<ma...@163.com>
Date: 2015-05-29 16:15
To: user<ma...@tez.apache.org>
Subject: Tez lauche container error when use UseG1GC
 I change my mapreduce.map.java.opts  's  value from -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249

When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.

set mapreduce.framework.name=yarn-tez;
set hive.execution.engine=tez;
select userid,count(*) from u_data group by userid order by userid;

The  query return error.

I found error :

2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] container.AMContainerImpl: Container container_1432885077153_0004_01_000005 finished with diagnostics set to [Container failed. Exception from container-launch.
Container id: container_1432885077153_0004_01_000005
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

But I try
hive> set hive.execution.engine=mr;
hive> set mapreduce.framework.name=yarn;
hive> select userid,count(*) from u_data group by userid order by userid limit 1;
Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1432885077153_0005, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0005/
Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0%
2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 sec
2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.96 sec
MapReduce Total cumulative CPU time: 2 seconds 960 msec
Ended Job = job_1432885077153_0005
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1432885077153_0006, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0006/
Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0006
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2015-05-29 16:07:03,333 Stage-2 map = 0%, reduce = 0%
2015-05-29 16:07:07,485 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2015-05-29 16:07:15,739 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.35 sec
MapReduce Total cumulative CPU time: 2 seconds 350 msec
Ended Job = job_1432885077153_0006
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.96 sec HDFS Read: 1985399 HDFS Write: 20068 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.35 sec HDFS Read: 24481 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 310 msec

That's ok.


________________________________
r7raul1984@163.com<ma...@163.com>

Re: Tez lauche container error when use UseG1GC

Posted by "r7raul1984@163.com" <r7...@163.com>.

BTW my tez_site.xml content is:
<configuration> 
<property> 
<name>tez.lib.uris</name> 
<value>hdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz</value> 
</property> 
<property> 
<name>tez.task.generate.counters.per.io</name> 
<value>true</value> 
</property> 
<property> 
<description>Log history using the Timeline Server</description> 
<name>tez.history.logging.service.class</name> 
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value> 
</property> 
<property> 
<description>Publish configuration information to Timeline server </description> 
<name>tez.runtime.convert.user-payload.to.history-text</name> 
<value>true</value> 
</property> 
<property> 
<name>tez.am.launch.cmd-opts</name> 
<value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/</value> 
</property> 

</configuration>



r7raul1984@163.com
 
From: r7raul1984@163.com
Date: 2015-05-29 16:15
To: user
Subject: Tez lauche container error when use UseG1GC
 I change my mapreduce.map.java.opts  's  value from -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249

When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.

set mapreduce.framework.name=yarn-tez; 
set hive.execution.engine=tez; 
select userid,count(*) from u_data group by userid order by userid;
The  query return error.
I found error :
2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] container.AMContainerImpl: Container container_1432885077153_0004_01_000005 finished with diagnostics set to [Container failed. Exception from container-launch. 
Container id: container_1432885077153_0004_01_000005 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) 
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

But I try
hive> set hive.execution.engine=mr; 
hive> set mapreduce.framework.name=yarn; 
hive> select userid,count(*) from u_data group by userid order by userid limit 1; 
Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9 
Total jobs = 2 
Launching Job 1 out of 2 
Number of reduce tasks not specified. Estimated from input data size: 1 
In order to change the average load for a reducer (in bytes): 
set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
set mapreduce.job.reduces=<number> 
Starting Job = job_1432885077153_0005, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0005/ 
Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0005 
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 
2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0% 
2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 sec 
2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.96 sec 
MapReduce Total cumulative CPU time: 2 seconds 960 msec 
Ended Job = job_1432885077153_0005 
Launching Job 2 out of 2 
Number of reduce tasks determined at compile time: 1 
In order to change the average load for a reducer (in bytes): 
set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
set mapreduce.job.reduces=<number> 
Starting Job = job_1432885077153_0006, Tracking URL = http://localhost:8088/proxy/application_1432885077153_0006/ 
Kill Command = /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job -kill job_1432885077153_0006 
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 
2015-05-29 16:07:03,333 Stage-2 map = 0%, reduce = 0% 
2015-05-29 16:07:07,485 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec 
2015-05-29 16:07:15,739 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.35 sec 
MapReduce Total cumulative CPU time: 2 seconds 350 msec 
Ended Job = job_1432885077153_0006 
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.96 sec HDFS Read: 1985399 HDFS Write: 20068 SUCCESS 
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.35 sec HDFS Read: 24481 HDFS Write: 6 SUCCESS 
Total MapReduce CPU Time Spent: 5 seconds 310 msec 

That's ok.




r7raul1984@163.com