You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Stephen Sprague <sp...@gmail.com> on 2017/01/19 04:32:45 UTC

random KILL's in YARN

hey guys,
I have a question on why Hiveserver2 would issue a "killjob" signal.

We run Yarn on Hadoop 5.6 with the HiveServer2 process. It uses the
fair-scheduler. Pre-emption is turned off. At least twice a day we have
jobs that are randomly killed.  they can be big jobs, they can be small
ones. they can be Tez jobs, they can be MR jobs.  I can't find any pattern
and i can't find *WHY* this is occurring so i'm reaching out.

i have the RM process running at DEBUG level logging as well as all the NM
processes so i have pretty detailed logs.  Still i can't find a reason -
all is see is something like this:

2017-01-18 11:18:15,732 DEBUG [IPC Server handler 0 on 42807]
org.apache.hadoop.ipc.Server: IPC Server handler 0 on 42807:
org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.killJob from
172.19.75.137:39623 Call#5981979 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
2017-01-18 11:18:15,732 DEBUG [IPC Server handler 0 on 42807]
org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:dwr
(auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.
java:2038)
*2017-01-18 11:18:15,736 INFO [IPC Server handler 0 on 42807]
org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Kill job
job_1484692657978_3312 received from dwr (auth:SIMPLE) at 172.19.75.137*
2017-01-18 11:18:15,736 DEBUG [AsyncDispatcher event handler]
org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event
org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent.EventType:
JOB_DIAGNOSTIC_UPDATE
2017-01-18 11:18:15,736 DEBUG [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Processing
job_1484692657978_3312 of type JOB_DIAGNOSTIC_UPDATE
2017-01-18 11:18:15,736 DEBUG [AsyncDispatcher event handler]
org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event
org.apache.hadoop.mapreduce.v2.app.job.event.JobEvent.EventType: JOB_KILL
2017-01-18 11:18:15,736 DEBUG [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Processing
job_1484692657978_3312 of type JOB_KILL
2017-01-18 11:18:15,737 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1484692657978_3312Job Transitioned from RUNNING to KILL_WAIT
2017-01-18 11:18:15,737 DEBUG [AsyncDispatcher event handler]
org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event
org.apache.hadoop.mapreduce.v2.app.job.event.TaskEvent.EventType: T_KILL
2017-01-18 11:18:15,737 DEBUG [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Processing
task_1484692657978_3312_m_000000 of type T_KILL
2017-01-18 11:18:15,737 DEBUG [AsyncDispatcher event handler]
org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event
org.apache.hadoop.mapreduce.v2.app.job.event.TaskEvent.EventType: T_KILL
2017-01-18 11:18:15,737 DEBUG [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Processing
task_1484692657978_3312_m_000001 of type T_KILL




* 172.19.75.137 is node where the HiveServer2 process is running - so i'm
pretty sure this is the process issuing the kill.  but why?  why kill the
job?  makes no sense to me.  what would cause this to happen?



The Hiveserver2 log doesn't have anything useful in it at the INFO level
and i can't seem to get it to log at the DEBUG level. the bit in red below
doesn't seem to work so any suggestions here gladly accepted too.

$ ps -ef | grep -i hiveserver2

dwr      21927  1785  8 Jan14 ?        10:10:11
/usr/lib/jvm/java-8-oracle/jre//bin/java -Xmx12288m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/lib/hadoop/logs
-Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop
-Dhadoop.id.str= -Dhadoop.root.logger=INFO,console
-Djava.library.path=/usr/lib/hadoop/lib/native
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
-Xmx268435456 -Xmx10G -Dlog4j.configurationFile=hive-log4j2.properties
-Dorg.apache.logging.log4j.simplelog.StatusLogger.level=DEBUG
-Dlog4j.configurationFile=hive-log4j2.properties
-Djava.util.logging.config.file=/usr/lib/apache-hive-2.1.0-bin/bin/../conf/parquet-logging.properties
-Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar
/usr/lib/apache-hive-2.1.0-bin/lib/hive-service-2.1.0.jar
org.apache.hive.service.server.HiveServer2


*--hiveconf hive.root.logger=DEBUG,console*
thanks,
Stephen