You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Karen Coppage (Jira)" <ji...@apache.org> on 2019/10/28 15:24:00 UTC
[jira] [Created] (HIVE-22416) MR-related operation logs missing when parallel execution is enabled

Karen Coppage created HIVE-22416:
------------------------------------

             Summary:  MR-related operation logs missing when parallel execution is enabled
                 Key: HIVE-22416
                 URL: https://issues.apache.org/jira/browse/HIVE-22416
             Project: Hive
          Issue Type: Bug
            Reporter: Karen Coppage
            Assignee: Karen Coppage


Repro steps:
 1. Happy path, parallel execution disabled
{code:java}
0: jdbc:hive2://localhost:10000> set hive.exec.parallel=false;
No rows affected (0.023 seconds)
0: jdbc:hive2://localhost:10000> select count  (*) from t1;
INFO  : Compiling command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d): select count  (*) from t1
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d); Time taken: 0.309 seconds
INFO  : Executing command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d): select count  (*) from t1
WARN  : 
INFO  : Query ID = karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
DEBUG : Configuring job job_local495362389_0008 with file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008 as the submit dir
DEBUG : adding the following namenodes' delegation tokens:[file:///]
DEBUG : Creating splits at file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008
INFO  : number of splits:0
INFO  : Submitting tokens for job: job_local495362389_0008
INFO  : Executing with tokens: []
INFO  : The url to track the job: http://localhost:8080/
INFO  : Job running in-process (local Hadoop)
INFO  : 2019-10-28 15:26:22,537 Stage-1 map = 0%,  reduce = 100%
INFO  : Ended Job = job_local495362389_0008
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d); Time taken: 6.497 seconds
INFO  : OK
DEBUG : Shutting down query select count  (*) from t1
+-----+
| c0  |
+-----+
| 0   |
+-----+
1 row selected (11.874 seconds)
{code}
2. Faulty path, parallel execution enabled
{code:java}
0: jdbc:hive2://localhost:10000> set hive.server2.logging.operation.level=EXECUTION;
No rows affected (0.236 seconds)
0: jdbc:hive2://localhost:10000> set hive.exec.parallel=true;
No rows affected (0.01 seconds)
0: jdbc:hive2://localhost:10000> select count  (*) from t1;
INFO  : Compiling command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77): select count  (*) from t1
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77); Time taken: 4.707 seconds
INFO  : Executing command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77): select count  (*) from t1
WARN  : 
INFO  : Query ID = karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in parallel
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77); Time taken: 44.577 seconds
INFO  : OK
DEBUG : Shutting down query select count  (*) from t1
+-----+
| c0  |
+-----+
| 0   |
+-----+
1 row selected (54.665 seconds)
{code}
The issue is that Log4J stores the session ID and query ID in some atomic thread metadata (org.apache.logging.log4j.ThreadContext.getImmutableContext()). If the queryId is missing from this metadata, then the RoutingAppender (which is defined programmatically in LogDivertAppender) will route the log to a NullAppender, which logs nothing. If the queryId is present, then the RoutingAppender routes the event to the "query-appender" logger, which will log the line in the operation log/Beeline. This is not happening in a multi-threaded context since new threads created for parallel query execution do not have the queryId/sessionId metadata.

The solution is to add the queryId/sessionId metadata to any new threads created for MR work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)