You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2024/03/09 16:12:00 UTC

[jira] [Updated] (HIVE-28112) Clear dagId from MDC/NDC when re-executing the query with new dagId

     [ https://issues.apache.org/jira/browse/HIVE-28112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

László Bodor updated HIVE-28112:
--------------------------------
    Description: 
1. dag fails
{code}
<14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl" dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed. FinalState=FAILED
{code}

2. AM lost plugin decides to re-execute:
{code}
<14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message: AM record not found (likely died) in zookeeper for application id: application_1709708735265_0007 retryPossible: true
{code}

3. there are messages, that belong to a new execution (when there is no DAG at all), still showing the last dagId, which is confusing, e.g:
{code}
<14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver" dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
{code}
while compiling a query, a dag id is not even supposed to be present
the new dag id will correspond to the same hive query id, so hive query id can be used to keep the connection between query attempts

the last message, that makes sense for the last dagId is:
{code}
<14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver" dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute query
{code} 
so we might want to delete the dagId from MDC/NDC around this point: "Preparing to re-execute query"


> Clear dagId from MDC/NDC when re-executing the query with new dagId
> -------------------------------------------------------------------
>
>                 Key: HIVE-28112
>                 URL: https://issues.apache.org/jira/browse/HIVE-28112
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Priority: Major
>
> 1. dag fails
> {code}
> <14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl" dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed. FinalState=FAILED
> {code}
> 2. AM lost plugin decides to re-execute:
> {code}
> <14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message: AM record not found (likely died) in zookeeper for application id: application_1709708735265_0007 retryPossible: true
> {code}
> 3. there are messages, that belong to a new execution (when there is no DAG at all), still showing the last dagId, which is confusing, e.g:
> {code}
> <14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver" dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
> {code}
> while compiling a query, a dag id is not even supposed to be present
> the new dag id will correspond to the same hive query id, so hive query id can be used to keep the connection between query attempts
> the last message, that makes sense for the last dagId is:
> {code}
> <14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver" dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute query
> {code} 
> so we might want to delete the dagId from MDC/NDC around this point: "Preparing to re-execute query"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)