You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by ju...@nomura.com on 2017/08/11 15:38:16 UTC

Building Cube Error, Container Killed

Hi
We encountered the problem that container got killed, below is the log we get from Kylin.
Can you please help to determine what’s the root cost??
The cluster has more than 300GB memory, should be more than enough to process the data set which is only 9gb in ORC format

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = monjuu-g_20170811163025_0f450a94-a309-4020-bb10-e7fab796f0dd
Total jobs = 10
Stage-1 is selected by condition resolver.
Launching Job 1 out of 10
Number of reduce tasks not specified. Estimated from input data size: 9
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1501889990114_2877, Tracking URL = https://xxxxxxxx:8090/proxy/application_1501889990114_2877/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job  -kill job_1501889990114_2877
Hadoop job information for Stage-1: number of mappers: 11; number of reducers: 9
2017-08-11 16:30:39,130 Stage-1 map = 0%,  reduce = 0%
2017-08-11 16:30:58,469 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 266.1 sec
2017-08-11 16:31:01,651 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 250.03 sec
2017-08-11 16:31:03,799 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 201.53 sec
2017-08-11 16:31:08,041 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 242.99 sec
2017-08-11 16:31:10,178 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 280.18 sec
2017-08-11 16:31:16,562 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 379.78 sec
2017-08-11 16:31:17,629 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 344.4 sec
2017-08-11 16:31:18,690 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 346.76 sec
2017-08-11 16:31:23,994 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 342.87 sec
2017-08-11 16:31:29,295 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 351.87 sec
2017-08-11 16:31:31,411 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 365.22 sec
2017-08-11 16:31:34,585 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 398.59 sec
2017-08-11 16:31:39,875 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 334.08 sec
2017-08-11 16:31:45,174 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 378.24 sec
2017-08-11 16:31:47,294 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 412.01 sec
2017-08-11 16:31:48,353 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 427.95 sec
2017-08-11 16:31:49,406 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 421.51 sec
2017-08-11 16:31:50,461 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 371.18 sec
2017-08-11 16:31:55,761 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 420.58 sec
2017-08-11 16:31:56,814 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 426.93 sec
2017-08-11 16:31:57,870 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 82.33 sec
MapReduce Total cumulative CPU time: 1 minutes 22 seconds 330 msec
Ended Job = job_1501889990114_2877 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1501889990114_2877_m_000007 (and more) from job job_1501889990114_2877
Examining task ID: task_1501889990114_2877_m_000001 (and more) from job job_1501889990114_2877

Task with the most failures(4):
-----
Task ID:
  task_1501889990114_2877_m_000007

-----
Diagnostic Messages for this Task:
Container [pid=4049,containerID=container_e66_1501889990114_2877_01_000031] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_e66_1501889990114_2877_01_000031 :
                |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
                |- 4054 4049 4049 4049 (java) 3031 114 3038318592 270714 /opt/sunjdk/jdk1.8.0_92/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx900m -Djava.io.tmpdir=/local/0/opt/hadoop-mapr/usercache/monjuu-g/appcache/application_1501889990114_2877/container_e66_1501889990114_2877_01_000031/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1501889990114_2877/container_e66_1501889990114_2877_01_000031 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.117.142.5 25857 attempt_1501889990114_2877_m_000007_3 72567767433247
                |- 4049 4047 4049 4049 (bash) 0 0 108654592 306 /bin/bash -c /opt/sunjdk/jdk1.8.0_92/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Xmx900m -Djava.io.tmpdir=/local/0/opt/hadoop-mapr/usercache/monjuu-g/appcache/application_1501889990114_2877/container_e66_1501889990114_2877_01_000031/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1501889990114_2877/container_e66_1501889990114_2877_01_000031 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.117.142.5 25857 attempt_1501889990114_2877_m_000007_3 72567767433247 1>/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1501889990114_2877/container_e66_1501889990114_2877_01_000031/stdout 2>/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1501889990114_2877/container_e66_1501889990114_2877_01_000031/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 11  Reduce: 9   Cumulative CPU: 82.33 sec   MAPRFS Read: 0 MAPRFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 1 minutes 22 seconds 330 msec


This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm

Re: Building Cube Error, Container Killed

Posted by ShaoFeng Shi <sh...@apache.org>.

"Current usage: 1.0 GB of 1 GB physical memory used; 2.9 GB of 2.1 GB
virtual memory used. Killing container."

YARN killed the container has its memory usage exceeds the max. quota. Try
to update the yarn/hive configuration to allocate more memory.

2017-08-11 23:38 GMT+08:00 <ju...@nomura.com>:

> Hi
>
> We encountered the problem that container got killed, below is the log we
> get from Kylin.
>
> Can you please help to determine what’s the root cost??
>
> The cluster has more than 300GB memory, should be more than enough to
> process the data set which is only 9gb in ORC format
>
>
>
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in
> the future versions. Consider using a different execution engine (i.e.
> spark, tez) or using Hive 1.X releases.
>
> Query ID = monjuu-g_20170811163025_0f450a94-a309-4020-bb10-e7fab796f0dd
>
> Total jobs = 10
>
> Stage-1 is selected by condition resolver.
>
> Launching Job 1 out of 10
>
> Number of reduce tasks not specified. Estimated from input data size: 9
>
> In order to change the average load for a reducer (in bytes):
>
>   set hive.exec.reducers.bytes.per.reducer=<number>
>
> In order to limit the maximum number of reducers:
>
>   set hive.exec.reducers.max=<number>
>
> In order to set a constant number of reducers:
>
>   set mapreduce.job.reduces=<number>
>
> Starting Job = job_1501889990114_2877, Tracking URL =
> https://xxxxxxxx:8090/proxy/application_1501889990114_2877/
>
> Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job  -kill
> job_1501889990114_2877
>
> Hadoop job information for Stage-1: number of mappers: 11; number of
> reducers: 9
>
> 2017-08-11 16:30:39,130 Stage-1 map = 0%,  reduce = 0%
>
> 2017-08-11 16:30:58,469 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> 266.1 sec
>
> 2017-08-11 16:31:01,651 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU
> 250.03 sec
>
> 2017-08-11 16:31:03,799 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU
> 201.53 sec
>
> 2017-08-11 16:31:08,041 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU
> 242.99 sec
>
> 2017-08-11 16:31:10,178 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU
> 280.18 sec
>
> 2017-08-11 16:31:16,562 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU
> 379.78 sec
>
> 2017-08-11 16:31:17,629 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU
> 344.4 sec
>
> 2017-08-11 16:31:18,690 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU
> 346.76 sec
>
> 2017-08-11 16:31:23,994 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU
> 342.87 sec
>
> 2017-08-11 16:31:29,295 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> 351.87 sec
>
> 2017-08-11 16:31:31,411 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU
> 365.22 sec
>
> 2017-08-11 16:31:34,585 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU
> 398.59 sec
>
> 2017-08-11 16:31:39,875 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU
> 334.08 sec
>
> 2017-08-11 16:31:45,174 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU
> 378.24 sec
>
> 2017-08-11 16:31:47,294 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU
> 412.01 sec
>
> 2017-08-11 16:31:48,353 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU
> 427.95 sec
>
> 2017-08-11 16:31:49,406 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU
> 421.51 sec
>
> 2017-08-11 16:31:50,461 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU
> 371.18 sec
>
> 2017-08-11 16:31:55,761 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU
> 420.58 sec
>
> 2017-08-11 16:31:56,814 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU
> 426.93 sec
>
> 2017-08-11 16:31:57,870 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 82.33 sec
>
> MapReduce Total cumulative CPU time: 1 minutes 22 seconds 330 msec
>
> Ended Job = job_1501889990114_2877 with errors
>
> Error during job, obtaining debugging information...
>
> Examining task ID: task_1501889990114_2877_m_000007 (and more) from job
> job_1501889990114_2877
>
> Examining task ID: task_1501889990114_2877_m_000001 (and more) from job
> job_1501889990114_2877
>
>
>
> Task with the most failures(4):
>
> -----
>
> Task ID:
>
>   task_1501889990114_2877_m_000007
>
>
>
> -----
>
> Diagnostic Messages for this Task:
>
> Container [pid=4049,containerID=container_e66_1501889990114_2877_01_000031]
> is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB
> physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing
> container.
>
> Dump of the process-tree for container_e66_1501889990114_2877_01_000031 :
>
>                 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>
>                 |- 4054 4049 4049 4049 (java) 3031 114 3038318592 270714
> /opt/sunjdk/jdk1.8.0_92/bin/java -Djava.net.preferIPv4Stack=true
> -Dhadoop.metrics.log.level=WARN -Xmx900m -Djava.io.tmpdir=/local/0/opt/
> hadoop-mapr/usercache/monjuu-g/appcache/application_
> 1501889990114_2877/container_e66_1501889990114_2877_01_000031/tmp
> -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/
> logs/userlogs/application_1501889990114_2877/container_
> e66_1501889990114_2877_01_000031 -Dyarn.app.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog
> org.apache.hadoop.mapred.YarnChild 10.117.142.5 25857
> attempt_1501889990114_2877_m_000007_3 72567767433247
>
>                 |- 4049 4047 4049 4049 (bash) 0 0 108654592 306 /bin/bash
> -c /opt/sunjdk/jdk1.8.0_92/bin/java -Djava.net.preferIPv4Stack=true
> -Dhadoop.metrics.log.level=WARN  -Xmx900m -Djava.io.tmpdir=/local/0/opt/
> hadoop-mapr/usercache/monjuu-g/appcache/application_
> 1501889990114_2877/container_e66_1501889990114_2877_01_000031/tmp
> -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/
> logs/userlogs/application_1501889990114_2877/container_
> e66_1501889990114_2877_01_000031 -Dyarn.app.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog
> org.apache.hadoop.mapred.YarnChild 10.117.142.5 25857
> attempt_1501889990114_2877_m_000007_3 72567767433247
> 1>/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_
> 1501889990114_2877/container_e66_1501889990114_2877_01_000031/stdout
> 2>/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_
> 1501889990114_2877/container_e66_1501889990114_2877_01_000031/stderr
>
>
>
> Container killed on request. Exit code is 143
>
> Container exited with a non-zero exit code 143
>
>
>
>
>
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.
> exec.mr.MapRedTask
>
> MapReduce Jobs Launched:
>
> Stage-Stage-1: Map: 11  Reduce: 9   Cumulative CPU: 82.33 sec   MAPRFS
> Read: 0 MAPRFS Write: 0 FAIL
>
> Total MapReduce CPU Time Spent: 1 minutes 22 seconds 330 msec
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>



-- 
Best regards,

Shaofeng Shi 史少锋