You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/05/11 06:02:03 UTC

[GitHub] [hive] dengzhhu653 opened a new pull request #1998: HIVE-24802: Show operation log at webui

dengzhhu653 opened a new pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   Currently we provide getQueryLog in HiveStatement to fetch the operation log,  and the operation log would be deleted on operation closing(delay for the canceled operation).  Sometimes it's would be not easy for the user(jdbc) or administrators to deep into the details of the finished(failed) operation, so we present the operation log on webui and keep the operation log for some time for latter analysis.
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   default is disabled,  should set hive.server2.historic.operation.log.enabled = true to enable it.
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   unit test / local machine
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r629791045



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,372 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = sessionManager.getHiveConf();
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String originalLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    String historicLogLoc = originalLogLoc + HISTORIC_DIR_SUFFIX;
+    File operationLogRootDir = new File(historicLogLoc);
+
+    if (operationLogRootDir.exists() && !operationLogRootDir.isDirectory()) {
+      LOG.warn("The historic operation log root directory exists, but it is not a directory: " +
+          operationLogRootDir.getAbsolutePath());
+      return;
+    }
+
+    if (!operationLogRootDir.exists()) {
+      if (!operationLogRootDir.mkdirs()) {
+        LOG.warn("Unable to create historic operation log root directory: " +
+            operationLogRootDir.getAbsolutePath());
+        return;
+      }
+    }
+    HISTORIC_OPERATION_LOG_ROOT_DIR = historicLogLoc;
+  }
+
+  public static OperationLog createOperationLog(Operation operation, QueryState queryState) {
+    HiveSession session = operation.getParentSession();
+    File parentFile = session.getOperationLogSessionDir();
+    boolean isHistoricLogEnabled = HISTORIC_OPERATION_LOG_ROOT_DIR != null;
+    if (isHistoricLogEnabled && operation instanceof SQLOperation) {
+      String sessionId = session.getSessionHandle().getHandleIdentifier().toString();
+      parentFile = new File(HISTORIC_OPERATION_LOG_ROOT_DIR + "/" + sessionId);
+      if (!parentFile.exists()) {
+        if (!parentFile.mkdirs()) {
+          LOG.warn("Unable to create the historic operation log session dir: {}, " +
+              "fall back to the original operation log session dir.", parentFile);
+          parentFile = session.getOperationLogSessionDir();
+          isHistoricLogEnabled = false;
+        }
+      } else if (!parentFile.isDirectory()) {
+        LOG.warn("The historic operation log session dir: {} is exist, but it's not a directory, " +
+            "fall back to the original operation log session dir.", parentFile);
+        parentFile = session.getOperationLogSessionDir();
+        isHistoricLogEnabled = false;
+      }
+    }
+
+    OperationHandle opHandle = operation.getHandle();
+    File operationLogFile = new File(parentFile, queryState.getQueryId());
+    OperationLog operationLog;
+    if (isHistoricLogEnabled) {
+      // dynamically setting the log location to route the operation log
+      HiveConf.setVar(queryState.getConf(),
+          HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION, HISTORIC_OPERATION_LOG_ROOT_DIR);
+      HiveConf.setBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
+      LOG.info("The operation log location changes from {} to {}.", new File(session.getOperationLogSessionDir(),
+          queryState.getQueryId()), operationLogFile);
+    }
+    operationLog = new OperationLog(opHandle.toString(), operationLogFile, queryState.getConf());
+    return operationLog;
+  }
+
+  private Set<String> getLiveSessions() {
+    Collection<HiveSession> hiveSessions = sessionManager.getSessions();
+    Set<String> liveSessions = new HashSet<>();
+    for (HiveSession session : hiveSessions) {
+      liveSessions.add(session.getSessionHandle().getHandleIdentifier().toString());
+    }
+    return liveSessions;
+  }
+
+  private Set<String> getHistoricSessions(String logRootDir) {
+    File logDir = new File(logRootDir);

Review comment:
       yes, add a null check here to avoid such case. thanks for the review!
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 closed pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 closed pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-884691869


   Hi @pvary,  cloud these changes go in?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 closed pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 closed pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary merged pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-840575070


   trigger a new test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r634311952



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,416 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.FileFilter;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - thriftPort__startTime
+ *       - sessionId
+ *          - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *       - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String historicLogRootDir;
+  private static long maxBytesToFetch;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+  private String historicParentLogDir;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      maxBytesToFetch = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (historicLogRootDir != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String origLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    File logLocation = new File(origLogLoc);
+    historicParentLogDir = logLocation.getAbsolutePath() + HISTORIC_DIR_SUFFIX;
+    int serverPort = hiveConf.getIntVar(HiveConf.ConfVars.HIVE_SERVER2_THRIFT_PORT);
+    String logRootDir = new StringBuilder(historicParentLogDir)
+        .append("/").append(serverPort)
+        .append("__").append(System.currentTimeMillis()).toString();

Review comment:
        Make sense, I will add the node info to the directory. Thank you for the input.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 closed pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 closed pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633427139



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String originalLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    String historicLogLoc = originalLogLoc + HISTORIC_DIR_SUFFIX;
+    File operationLogRootDir = new File(historicLogLoc);
+
+    if (operationLogRootDir.exists() && !operationLogRootDir.isDirectory()) {
+      LOG.warn("The historic operation log root directory exists, but it is not a directory: " +
+          operationLogRootDir.getAbsolutePath());
+      return;
+    }
+
+    if (!operationLogRootDir.exists()) {
+      if (!operationLogRootDir.mkdirs()) {
+        LOG.warn("Unable to create historic operation log root directory: " +
+            operationLogRootDir.getAbsolutePath());
+        return;
+      }
+    }
+    HISTORIC_OPERATION_LOG_ROOT_DIR = historicLogLoc;
+  }
+
+  public static OperationLog createOperationLog(Operation operation, QueryState queryState) {
+    HiveSession session = operation.getParentSession();
+    File parentFile = session.getOperationLogSessionDir();
+    boolean isHistoricLogEnabled = HISTORIC_OPERATION_LOG_ROOT_DIR != null;
+    if (isHistoricLogEnabled && operation instanceof SQLOperation) {
+      String sessionId = session.getSessionHandle().getHandleIdentifier().toString();
+      parentFile = new File(HISTORIC_OPERATION_LOG_ROOT_DIR + "/" + sessionId);
+      if (!parentFile.exists()) {
+        if (!parentFile.mkdirs()) {
+          LOG.warn("Unable to create the historic operation log session dir: " + parentFile +
+              ", fall back to the original operation log session dir.");
+          parentFile = session.getOperationLogSessionDir();
+          isHistoricLogEnabled = false;
+        }
+      } else if (!parentFile.isDirectory()) {
+        LOG.warn("The historic operation log session dir: " + parentFile + " is exist, but it's not a directory, " +
+            "fall back to the original operation log session dir.");
+        parentFile = session.getOperationLogSessionDir();
+        isHistoricLogEnabled = false;
+      }
+    }
+
+    OperationHandle opHandle = operation.getHandle();
+    File operationLogFile = new File(parentFile, queryState.getQueryId());
+    OperationLog operationLog;
+    HiveConf.setBoolVar(queryState.getConf(),
+        HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED, isHistoricLogEnabled);
+    if (isHistoricLogEnabled) {
+      // dynamically setting the log location to route the operation log
+      HiveConf.setVar(queryState.getConf(),
+          HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION, HISTORIC_OPERATION_LOG_ROOT_DIR);
+      if (HiveConf.getBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_IN_TEST)) {
+        HiveConf.setBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
+      }
+      LOG.info("The operation log location changes from {} to {}.", new File(session.getOperationLogSessionDir(),
+          queryState.getQueryId()), operationLogFile);
+    }
+    operationLog = new OperationLog(opHandle.toString(), operationLogFile, queryState.getConf());
+    return operationLog;
+  }
+
+  private Set<String> getLiveSessions() {
+    Collection<HiveSession> hiveSessions = sessionManager.getSessions();
+    Set<String> liveSessions = new HashSet<>();
+    for (HiveSession session : hiveSessions) {
+      liveSessions.add(session.getSessionHandle().getHandleIdentifier().toString());
+    }
+    return liveSessions;
+  }
+
+  private Set<String> getHistoricSessions(String logRootDir) {
+    File logDir = new File(logRootDir);
+    Set<String> results = new HashSet<>();
+    if (logDir.exists() && logDir.isDirectory()) {
+      File[] subFiles = logDir.listFiles();

Review comment:
       What happens when we have multiple HS2 instances with the same root dir configuration?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633107448



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java
##########
@@ -73,10 +74,10 @@ public OperationLog(String name, File file, HiveConf hiveConf) {
       opLoggingLevel = LoggingLevel.UNKNOWN;
     }
 
+    isRemoveLogs = hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS);

Review comment:
       Done, check the HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED when hive not in test. Thanks for the review!

##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String originalLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    String historicLogLoc = originalLogLoc + HISTORIC_DIR_SUFFIX;
+    File operationLogRootDir = new File(historicLogLoc);
+
+    if (operationLogRootDir.exists() && !operationLogRootDir.isDirectory()) {
+      LOG.warn("The historic operation log root directory exists, but it is not a directory: " +
+          operationLogRootDir.getAbsolutePath());
+      return;
+    }
+
+    if (!operationLogRootDir.exists()) {
+      if (!operationLogRootDir.mkdirs()) {
+        LOG.warn("Unable to create historic operation log root directory: " +
+            operationLogRootDir.getAbsolutePath());
+        return;
+      }
+    }
+    HISTORIC_OPERATION_LOG_ROOT_DIR = historicLogLoc;
+  }
+
+  public static OperationLog createOperationLog(Operation operation, QueryState queryState) {
+    HiveSession session = operation.getParentSession();
+    File parentFile = session.getOperationLogSessionDir();
+    boolean isHistoricLogEnabled = HISTORIC_OPERATION_LOG_ROOT_DIR != null;
+    if (isHistoricLogEnabled && operation instanceof SQLOperation) {
+      String sessionId = session.getSessionHandle().getHandleIdentifier().toString();
+      parentFile = new File(HISTORIC_OPERATION_LOG_ROOT_DIR + "/" + sessionId);
+      if (!parentFile.exists()) {
+        if (!parentFile.mkdirs()) {
+          LOG.warn("Unable to create the historic operation log session dir: " + parentFile +
+              ", fall back to the original operation log session dir.");
+          parentFile = session.getOperationLogSessionDir();
+          isHistoricLogEnabled = false;
+        }
+      } else if (!parentFile.isDirectory()) {
+        LOG.warn("The historic operation log session dir: " + parentFile + " is exist, but it's not a directory, " +
+            "fall back to the original operation log session dir.");
+        parentFile = session.getOperationLogSessionDir();
+        isHistoricLogEnabled = false;
+      }
+    }
+
+    OperationHandle opHandle = operation.getHandle();
+    File operationLogFile = new File(parentFile, queryState.getQueryId());
+    OperationLog operationLog;
+    HiveConf.setBoolVar(queryState.getConf(),
+        HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED, isHistoricLogEnabled);
+    if (isHistoricLogEnabled) {
+      // dynamically setting the log location to route the operation log
+      HiveConf.setVar(queryState.getConf(),
+          HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION, HISTORIC_OPERATION_LOG_ROOT_DIR);
+      if (HiveConf.getBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_IN_TEST)) {
+        HiveConf.setBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
+      }
+      LOG.info("The operation log location changes from {} to {}.", new File(session.getOperationLogSessionDir(),
+          queryState.getQueryId()), operationLogFile);
+    }
+    operationLog = new OperationLog(opHandle.toString(), operationLogFile, queryState.getConf());
+    return operationLog;
+  }
+
+  private Set<String> getLiveSessions() {
+    Collection<HiveSession> hiveSessions = sessionManager.getSessions();
+    Set<String> liveSessions = new HashSet<>();
+    for (HiveSession session : hiveSessions) {
+      liveSessions.add(session.getSessionHandle().getHandleIdentifier().toString());
+    }
+    return liveSessions;
+  }
+
+  private Set<String> getHistoricSessions(String logRootDir) {
+    File logDir = new File(logRootDir);
+    Set<String> results = new HashSet<>();
+    if (logDir.exists() && logDir.isDirectory()) {
+      File[] subFiles = logDir.listFiles();

Review comment:
       oh yes, if multiple instances share the same root dir,  the cleaner will remove the live operation log in other instances. Make the root dir be private by adding a subdir that specific for the instance: the thrift port and the start time, this can avoid concurrent file listings and removals.  Thanks very much for the comments!

##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationManager.java
##########
@@ -85,10 +79,7 @@ public synchronized void init(HiveConf hiveConf) {
     LogDivertAppender.registerRoutingAppender(hiveConf);
     LogDivertAppenderForTest.registerRoutingAppenderIfInTest(hiveConf);
 
-    if (hiveConf.isWebUiQueryInfoCacheEnabled()) {
-      historicalQueryInfos = new QueryInfoCache(
-        hiveConf.getIntVar(ConfVars.HIVE_SERVER2_WEBUI_MAX_HISTORIC_QUERIES));
-    }
+    this.queryInfoCache = new QueryInfoCache(hiveConf);

Review comment:
       yes, the cache will only be used when the WebUI is enabled, make it Optional here.

##########
File path: service/src/java/org/apache/hive/service/cli/session/SessionManager.java
##########
@@ -281,6 +284,7 @@ private void initOperationLogRootDir() {
         LOG.warn("Failed to schedule cleanup HS2 operation logging root dir: " +
             operationLogRootDir.getAbsolutePath(), e);
       }
+      logManager = new OperationLogManager(this, hiveConf);

Review comment:
       done

##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 closed pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 closed pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-884750860


   Could you please retrigger a test run, and if it is green then I will push in.
   (2 days ago I had a problem with concurrent commits/CI runs and I would like prevent it)
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633086123



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java
##########
@@ -73,10 +74,10 @@ public OperationLog(String name, File file, HiveConf hiveConf) {
       opLoggingLevel = LoggingLevel.UNKNOWN;
     }
 
+    isRemoveLogs = hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS);

Review comment:
       I would rather keep these configurations independent:
   - `HIVE_TESTING_REMOVE_LOGS`
   - `HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED`
   
   And if any of them are set to true, then do not remove the logs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633643084



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationManager.java
##########
@@ -85,10 +79,7 @@ public synchronized void init(HiveConf hiveConf) {
     LogDivertAppender.registerRoutingAppender(hiveConf);
     LogDivertAppenderForTest.registerRoutingAppenderIfInTest(hiveConf);
 
-    if (hiveConf.isWebUiQueryInfoCacheEnabled()) {
-      historicalQueryInfos = new QueryInfoCache(
-        hiveConf.getIntVar(ConfVars.HIVE_SERVER2_WEBUI_MAX_HISTORIC_QUERIES));
-    }
+    this.queryInfoCache = new QueryInfoCache(hiveConf);

Review comment:
       yes, the cache will only be used when the WebUI is enabled, make it Optional here.

##########
File path: service/src/java/org/apache/hive/service/cli/session/SessionManager.java
##########
@@ -281,6 +284,7 @@ private void initOperationLogRootDir() {
         LOG.warn("Failed to schedule cleanup HS2 operation logging root dir: " +
             operationLogRootDir.getAbsolutePath(), e);
       }
+      logManager = new OperationLogManager(this, hiveConf);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zchovan commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
zchovan commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r629403542



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,372 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = sessionManager.getHiveConf();
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String originalLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    String historicLogLoc = originalLogLoc + HISTORIC_DIR_SUFFIX;
+    File operationLogRootDir = new File(historicLogLoc);
+
+    if (operationLogRootDir.exists() && !operationLogRootDir.isDirectory()) {
+      LOG.warn("The historic operation log root directory exists, but it is not a directory: " +
+          operationLogRootDir.getAbsolutePath());
+      return;
+    }
+
+    if (!operationLogRootDir.exists()) {
+      if (!operationLogRootDir.mkdirs()) {
+        LOG.warn("Unable to create historic operation log root directory: " +
+            operationLogRootDir.getAbsolutePath());
+        return;
+      }
+    }
+    HISTORIC_OPERATION_LOG_ROOT_DIR = historicLogLoc;
+  }
+
+  public static OperationLog createOperationLog(Operation operation, QueryState queryState) {
+    HiveSession session = operation.getParentSession();
+    File parentFile = session.getOperationLogSessionDir();
+    boolean isHistoricLogEnabled = HISTORIC_OPERATION_LOG_ROOT_DIR != null;
+    if (isHistoricLogEnabled && operation instanceof SQLOperation) {
+      String sessionId = session.getSessionHandle().getHandleIdentifier().toString();
+      parentFile = new File(HISTORIC_OPERATION_LOG_ROOT_DIR + "/" + sessionId);
+      if (!parentFile.exists()) {
+        if (!parentFile.mkdirs()) {
+          LOG.warn("Unable to create the historic operation log session dir: {}, " +
+              "fall back to the original operation log session dir.", parentFile);
+          parentFile = session.getOperationLogSessionDir();
+          isHistoricLogEnabled = false;
+        }
+      } else if (!parentFile.isDirectory()) {
+        LOG.warn("The historic operation log session dir: {} is exist, but it's not a directory, " +
+            "fall back to the original operation log session dir.", parentFile);
+        parentFile = session.getOperationLogSessionDir();
+        isHistoricLogEnabled = false;
+      }
+    }
+
+    OperationHandle opHandle = operation.getHandle();
+    File operationLogFile = new File(parentFile, queryState.getQueryId());
+    OperationLog operationLog;
+    if (isHistoricLogEnabled) {
+      // dynamically setting the log location to route the operation log
+      HiveConf.setVar(queryState.getConf(),
+          HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION, HISTORIC_OPERATION_LOG_ROOT_DIR);
+      HiveConf.setBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
+      LOG.info("The operation log location changes from {} to {}.", new File(session.getOperationLogSessionDir(),
+          queryState.getQueryId()), operationLogFile);
+    }
+    operationLog = new OperationLog(opHandle.toString(), operationLogFile, queryState.getConf());
+    return operationLog;
+  }
+
+  private Set<String> getLiveSessions() {
+    Collection<HiveSession> hiveSessions = sessionManager.getSessions();
+    Set<String> liveSessions = new HashSet<>();
+    for (HiveSession session : hiveSessions) {
+      liveSessions.add(session.getSessionHandle().getHandleIdentifier().toString());
+    }
+    return liveSessions;
+  }
+
+  private Set<String> getHistoricSessions(String logRootDir) {
+    File logDir = new File(logRootDir);

Review comment:
       what happens if the logRootDir was deleted before this function is called? wouldn't this throw an NPE?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 closed pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 closed pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-884782378


   > Could you please retrigger a test run, and if it is green then I will push in.
   > (2 days ago I had a problem with concurrent commits/CI runs and I would like prevent it)
   > 
   > Thanks,
   > Peter
   ok, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r634304943



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,416 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.FileFilter;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - thriftPort__startTime
+ *       - sessionId
+ *          - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *       - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String historicLogRootDir;
+  private static long maxBytesToFetch;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+  private String historicParentLogDir;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      maxBytesToFetch = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (historicLogRootDir != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String origLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    File logLocation = new File(origLogLoc);
+    historicParentLogDir = logLocation.getAbsolutePath() + HISTORIC_DIR_SUFFIX;
+    int serverPort = hiveConf.getIntVar(HiveConf.ConfVars.HIVE_SERVER2_THRIFT_PORT);
+    String logRootDir = new StringBuilder(historicParentLogDir)
+        .append("/").append(serverPort)
+        .append("__").append(System.currentTimeMillis()).toString();

Review comment:
       In our clusters we usually aggregate the log directories from multiple instances in case of an issue, and it is handy for the users if the name of the node is already in the file name or in the directory name of the logs, but we can always amend this later if needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-833483396


   Hi @pvary, @sahilTakiar  could you please take a look if have a sec?  
   thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633430432



##########
File path: service/src/java/org/apache/hive/service/cli/session/SessionManager.java
##########
@@ -281,6 +284,7 @@ private void initOperationLogRootDir() {
         LOG.warn("Failed to schedule cleanup HS2 operation logging root dir: " +
             operationLogRootDir.getAbsolutePath(), e);
       }
+      logManager = new OperationLogManager(this, hiveConf);

Review comment:
       Do we need the `OperationLogManager` if we do not use async log removal? Could we use `Optional` for this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633425398



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;

Review comment:
       These are not final. I think we usually try to reserve this naming format for `static final` variables




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-854315065


   Hey @belugabehr, cloud you please take a look at the changes? thanks :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-810736869


   @yongzhi @saihemanth-cloudera any thoughts or comments? thanks in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-861532574


   @pvary sorry for pinging, cloud the pr be moved a little further? thank you! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-885560608


   Hi @pvary, got a green build finally, would you please take another look? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633642125



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String originalLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    String historicLogLoc = originalLogLoc + HISTORIC_DIR_SUFFIX;
+    File operationLogRootDir = new File(historicLogLoc);
+
+    if (operationLogRootDir.exists() && !operationLogRootDir.isDirectory()) {
+      LOG.warn("The historic operation log root directory exists, but it is not a directory: " +
+          operationLogRootDir.getAbsolutePath());
+      return;
+    }
+
+    if (!operationLogRootDir.exists()) {
+      if (!operationLogRootDir.mkdirs()) {
+        LOG.warn("Unable to create historic operation log root directory: " +
+            operationLogRootDir.getAbsolutePath());
+        return;
+      }
+    }
+    HISTORIC_OPERATION_LOG_ROOT_DIR = historicLogLoc;
+  }
+
+  public static OperationLog createOperationLog(Operation operation, QueryState queryState) {
+    HiveSession session = operation.getParentSession();
+    File parentFile = session.getOperationLogSessionDir();
+    boolean isHistoricLogEnabled = HISTORIC_OPERATION_LOG_ROOT_DIR != null;
+    if (isHistoricLogEnabled && operation instanceof SQLOperation) {
+      String sessionId = session.getSessionHandle().getHandleIdentifier().toString();
+      parentFile = new File(HISTORIC_OPERATION_LOG_ROOT_DIR + "/" + sessionId);
+      if (!parentFile.exists()) {
+        if (!parentFile.mkdirs()) {
+          LOG.warn("Unable to create the historic operation log session dir: " + parentFile +
+              ", fall back to the original operation log session dir.");
+          parentFile = session.getOperationLogSessionDir();
+          isHistoricLogEnabled = false;
+        }
+      } else if (!parentFile.isDirectory()) {
+        LOG.warn("The historic operation log session dir: " + parentFile + " is exist, but it's not a directory, " +
+            "fall back to the original operation log session dir.");
+        parentFile = session.getOperationLogSessionDir();
+        isHistoricLogEnabled = false;
+      }
+    }
+
+    OperationHandle opHandle = operation.getHandle();
+    File operationLogFile = new File(parentFile, queryState.getQueryId());
+    OperationLog operationLog;
+    HiveConf.setBoolVar(queryState.getConf(),
+        HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED, isHistoricLogEnabled);
+    if (isHistoricLogEnabled) {
+      // dynamically setting the log location to route the operation log
+      HiveConf.setVar(queryState.getConf(),
+          HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION, HISTORIC_OPERATION_LOG_ROOT_DIR);
+      if (HiveConf.getBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_IN_TEST)) {
+        HiveConf.setBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
+      }
+      LOG.info("The operation log location changes from {} to {}.", new File(session.getOperationLogSessionDir(),
+          queryState.getQueryId()), operationLogFile);
+    }
+    operationLog = new OperationLog(opHandle.toString(), operationLogFile, queryState.getConf());
+    return operationLog;
+  }
+
+  private Set<String> getLiveSessions() {
+    Collection<HiveSession> hiveSessions = sessionManager.getSessions();
+    Set<String> liveSessions = new HashSet<>();
+    for (HiveSession session : hiveSessions) {
+      liveSessions.add(session.getSessionHandle().getHandleIdentifier().toString());
+    }
+    return liveSessions;
+  }
+
+  private Set<String> getHistoricSessions(String logRootDir) {
+    File logDir = new File(logRootDir);
+    Set<String> results = new HashSet<>();
+    if (logDir.exists() && logDir.isDirectory()) {
+      File[] subFiles = logDir.listFiles();

Review comment:
       oh yes, if multiple instances share the same root dir,  the cleaner will remove the live operation log in other instances. Make the root dir be private by adding a subdir that specific for the instance: the thrift port and the start time, this can avoid concurrent file listings and removals.  Thanks very much for the comments!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-854315065


   Hey @belugabehr, cloud you please take a look at the changes? thanks :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-843083815


   +1 pending tests.
   You should rebase, that will fix the Iceberg errors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 edited a comment on pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 edited a comment on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-884782378


   > Could you please retrigger a test run, and if it is green then I will push in.
   > (2 days ago I had a problem with concurrent commits/CI runs and I would like prevent it)
   > 
   > Thanks,
   > Peter
   
   ok, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 closed pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 closed pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633643380



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r634294859



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,416 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.FileFilter;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - thriftPort__startTime
+ *       - sessionId
+ *          - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *       - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String historicLogRootDir;
+  private static long maxBytesToFetch;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+  private String historicParentLogDir;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      maxBytesToFetch = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (historicLogRootDir != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String origLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    File logLocation = new File(origLogLoc);
+    historicParentLogDir = logLocation.getAbsolutePath() + HISTORIC_DIR_SUFFIX;
+    int serverPort = hiveConf.getIntVar(HiveConf.ConfVars.HIVE_SERVER2_THRIFT_PORT);
+    String logRootDir = new StringBuilder(historicParentLogDir)
+        .append("/").append(serverPort)
+        .append("__").append(System.currentTimeMillis()).toString();

Review comment:
       Since the operation log sits on the local machine, we already know the ip or server name when we try to scan the directories, the start time `System.currentTimeMillis()`  can take the same effect as and more meaningful than the UUID to ensure uniqueness of the same instance. So here I use the unique server port and the start time to construct the directory name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633428483



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String originalLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    String historicLogLoc = originalLogLoc + HISTORIC_DIR_SUFFIX;
+    File operationLogRootDir = new File(historicLogLoc);
+
+    if (operationLogRootDir.exists() && !operationLogRootDir.isDirectory()) {
+      LOG.warn("The historic operation log root directory exists, but it is not a directory: " +
+          operationLogRootDir.getAbsolutePath());
+      return;
+    }
+
+    if (!operationLogRootDir.exists()) {
+      if (!operationLogRootDir.mkdirs()) {
+        LOG.warn("Unable to create historic operation log root directory: " +
+            operationLogRootDir.getAbsolutePath());
+        return;
+      }
+    }
+    HISTORIC_OPERATION_LOG_ROOT_DIR = historicLogLoc;
+  }
+
+  public static OperationLog createOperationLog(Operation operation, QueryState queryState) {
+    HiveSession session = operation.getParentSession();
+    File parentFile = session.getOperationLogSessionDir();
+    boolean isHistoricLogEnabled = HISTORIC_OPERATION_LOG_ROOT_DIR != null;
+    if (isHistoricLogEnabled && operation instanceof SQLOperation) {
+      String sessionId = session.getSessionHandle().getHandleIdentifier().toString();
+      parentFile = new File(HISTORIC_OPERATION_LOG_ROOT_DIR + "/" + sessionId);
+      if (!parentFile.exists()) {
+        if (!parentFile.mkdirs()) {
+          LOG.warn("Unable to create the historic operation log session dir: " + parentFile +
+              ", fall back to the original operation log session dir.");
+          parentFile = session.getOperationLogSessionDir();
+          isHistoricLogEnabled = false;
+        }
+      } else if (!parentFile.isDirectory()) {
+        LOG.warn("The historic operation log session dir: " + parentFile + " is exist, but it's not a directory, " +
+            "fall back to the original operation log session dir.");
+        parentFile = session.getOperationLogSessionDir();
+        isHistoricLogEnabled = false;
+      }
+    }
+
+    OperationHandle opHandle = operation.getHandle();
+    File operationLogFile = new File(parentFile, queryState.getQueryId());
+    OperationLog operationLog;
+    HiveConf.setBoolVar(queryState.getConf(),
+        HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED, isHistoricLogEnabled);
+    if (isHistoricLogEnabled) {
+      // dynamically setting the log location to route the operation log
+      HiveConf.setVar(queryState.getConf(),
+          HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION, HISTORIC_OPERATION_LOG_ROOT_DIR);
+      if (HiveConf.getBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_IN_TEST)) {
+        HiveConf.setBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
+      }
+      LOG.info("The operation log location changes from {} to {}.", new File(session.getOperationLogSessionDir(),
+          queryState.getQueryId()), operationLogFile);
+    }
+    operationLog = new OperationLog(opHandle.toString(), operationLogFile, queryState.getConf());
+    return operationLog;
+  }
+
+  private Set<String> getLiveSessions() {
+    Collection<HiveSession> hiveSessions = sessionManager.getSessions();
+    Set<String> liveSessions = new HashSet<>();
+    for (HiveSession session : hiveSessions) {
+      liveSessions.add(session.getSessionHandle().getHandleIdentifier().toString());
+    }
+    return liveSessions;
+  }
+
+  private Set<String> getHistoricSessions(String logRootDir) {
+    File logDir = new File(logRootDir);
+    Set<String> results = new HashSet<>();
+    if (logDir.exists() && logDir.isDirectory()) {
+      File[] subFiles = logDir.listFiles();

Review comment:
       Also plan for concurrent file listings and removals




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633086123



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java
##########
@@ -73,10 +74,10 @@ public OperationLog(String name, File file, HiveConf hiveConf) {
       opLoggingLevel = LoggingLevel.UNKNOWN;
     }
 
+    isRemoveLogs = hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS);

Review comment:
       I would rather keep these configurations independent:
   - `HIVE_TESTING_REMOVE_LOGS`
   - `HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED`
   
   And if any of them are set to true, then do not remove the logs

##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;

Review comment:
       These are not final. I think we usually try to reserve this naming format for `static final` variables

##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String originalLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    String historicLogLoc = originalLogLoc + HISTORIC_DIR_SUFFIX;
+    File operationLogRootDir = new File(historicLogLoc);
+
+    if (operationLogRootDir.exists() && !operationLogRootDir.isDirectory()) {
+      LOG.warn("The historic operation log root directory exists, but it is not a directory: " +
+          operationLogRootDir.getAbsolutePath());
+      return;
+    }
+
+    if (!operationLogRootDir.exists()) {
+      if (!operationLogRootDir.mkdirs()) {
+        LOG.warn("Unable to create historic operation log root directory: " +
+            operationLogRootDir.getAbsolutePath());
+        return;
+      }
+    }
+    HISTORIC_OPERATION_LOG_ROOT_DIR = historicLogLoc;
+  }
+
+  public static OperationLog createOperationLog(Operation operation, QueryState queryState) {
+    HiveSession session = operation.getParentSession();
+    File parentFile = session.getOperationLogSessionDir();
+    boolean isHistoricLogEnabled = HISTORIC_OPERATION_LOG_ROOT_DIR != null;
+    if (isHistoricLogEnabled && operation instanceof SQLOperation) {
+      String sessionId = session.getSessionHandle().getHandleIdentifier().toString();
+      parentFile = new File(HISTORIC_OPERATION_LOG_ROOT_DIR + "/" + sessionId);
+      if (!parentFile.exists()) {
+        if (!parentFile.mkdirs()) {
+          LOG.warn("Unable to create the historic operation log session dir: " + parentFile +
+              ", fall back to the original operation log session dir.");
+          parentFile = session.getOperationLogSessionDir();
+          isHistoricLogEnabled = false;
+        }
+      } else if (!parentFile.isDirectory()) {
+        LOG.warn("The historic operation log session dir: " + parentFile + " is exist, but it's not a directory, " +
+            "fall back to the original operation log session dir.");
+        parentFile = session.getOperationLogSessionDir();
+        isHistoricLogEnabled = false;
+      }
+    }
+
+    OperationHandle opHandle = operation.getHandle();
+    File operationLogFile = new File(parentFile, queryState.getQueryId());
+    OperationLog operationLog;
+    HiveConf.setBoolVar(queryState.getConf(),
+        HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED, isHistoricLogEnabled);
+    if (isHistoricLogEnabled) {
+      // dynamically setting the log location to route the operation log
+      HiveConf.setVar(queryState.getConf(),
+          HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION, HISTORIC_OPERATION_LOG_ROOT_DIR);
+      if (HiveConf.getBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_IN_TEST)) {
+        HiveConf.setBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
+      }
+      LOG.info("The operation log location changes from {} to {}.", new File(session.getOperationLogSessionDir(),
+          queryState.getQueryId()), operationLogFile);
+    }
+    operationLog = new OperationLog(opHandle.toString(), operationLogFile, queryState.getConf());
+    return operationLog;
+  }
+
+  private Set<String> getLiveSessions() {
+    Collection<HiveSession> hiveSessions = sessionManager.getSessions();
+    Set<String> liveSessions = new HashSet<>();
+    for (HiveSession session : hiveSessions) {
+      liveSessions.add(session.getSessionHandle().getHandleIdentifier().toString());
+    }
+    return liveSessions;
+  }
+
+  private Set<String> getHistoricSessions(String logRootDir) {
+    File logDir = new File(logRootDir);
+    Set<String> results = new HashSet<>();
+    if (logDir.exists() && logDir.isDirectory()) {
+      File[] subFiles = logDir.listFiles();

Review comment:
       What happens when we have multiple HS2 instances with the same root dir configuration?

##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##########
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the optimization
+ * and investigating the problem of the operation handily for users or administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *    - sessionId
+ *        - queryId (the operation log file)
+ * <p>
+ * The lifecycle of the log is managed by a daemon called {@link OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) {
+    this.operationManager = sessionManager.getOperationManager();
+    this.hiveConf = hiveConf;
+    this.sessionManager = sessionManager;
+    if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+        && hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+        && hiveConf.isWebUiQueryInfoCacheEnabled()) {
+      initHistoricOperationLogRootDir();
+      MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+          HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+      if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+          && !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+        cleaner = new OperationLogDirCleaner();
+        cleaner.start();
+      }
+    }
+  }
+
+  private void initHistoricOperationLogRootDir() {
+    String originalLogLoc = hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION);
+    String historicLogLoc = originalLogLoc + HISTORIC_DIR_SUFFIX;
+    File operationLogRootDir = new File(historicLogLoc);
+
+    if (operationLogRootDir.exists() && !operationLogRootDir.isDirectory()) {
+      LOG.warn("The historic operation log root directory exists, but it is not a directory: " +
+          operationLogRootDir.getAbsolutePath());
+      return;
+    }
+
+    if (!operationLogRootDir.exists()) {
+      if (!operationLogRootDir.mkdirs()) {
+        LOG.warn("Unable to create historic operation log root directory: " +
+            operationLogRootDir.getAbsolutePath());
+        return;
+      }
+    }
+    HISTORIC_OPERATION_LOG_ROOT_DIR = historicLogLoc;
+  }
+
+  public static OperationLog createOperationLog(Operation operation, QueryState queryState) {
+    HiveSession session = operation.getParentSession();
+    File parentFile = session.getOperationLogSessionDir();
+    boolean isHistoricLogEnabled = HISTORIC_OPERATION_LOG_ROOT_DIR != null;
+    if (isHistoricLogEnabled && operation instanceof SQLOperation) {
+      String sessionId = session.getSessionHandle().getHandleIdentifier().toString();
+      parentFile = new File(HISTORIC_OPERATION_LOG_ROOT_DIR + "/" + sessionId);
+      if (!parentFile.exists()) {
+        if (!parentFile.mkdirs()) {
+          LOG.warn("Unable to create the historic operation log session dir: " + parentFile +
+              ", fall back to the original operation log session dir.");
+          parentFile = session.getOperationLogSessionDir();
+          isHistoricLogEnabled = false;
+        }
+      } else if (!parentFile.isDirectory()) {
+        LOG.warn("The historic operation log session dir: " + parentFile + " is exist, but it's not a directory, " +
+            "fall back to the original operation log session dir.");
+        parentFile = session.getOperationLogSessionDir();
+        isHistoricLogEnabled = false;
+      }
+    }
+
+    OperationHandle opHandle = operation.getHandle();
+    File operationLogFile = new File(parentFile, queryState.getQueryId());
+    OperationLog operationLog;
+    HiveConf.setBoolVar(queryState.getConf(),
+        HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED, isHistoricLogEnabled);
+    if (isHistoricLogEnabled) {
+      // dynamically setting the log location to route the operation log
+      HiveConf.setVar(queryState.getConf(),
+          HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LOG_LOCATION, HISTORIC_OPERATION_LOG_ROOT_DIR);
+      if (HiveConf.getBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_IN_TEST)) {
+        HiveConf.setBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
+      }
+      LOG.info("The operation log location changes from {} to {}.", new File(session.getOperationLogSessionDir(),
+          queryState.getQueryId()), operationLogFile);
+    }
+    operationLog = new OperationLog(opHandle.toString(), operationLogFile, queryState.getConf());
+    return operationLog;
+  }
+
+  private Set<String> getLiveSessions() {
+    Collection<HiveSession> hiveSessions = sessionManager.getSessions();
+    Set<String> liveSessions = new HashSet<>();
+    for (HiveSession session : hiveSessions) {
+      liveSessions.add(session.getSessionHandle().getHandleIdentifier().toString());
+    }
+    return liveSessions;
+  }
+
+  private Set<String> getHistoricSessions(String logRootDir) {
+    File logDir = new File(logRootDir);
+    Set<String> results = new HashSet<>();
+    if (logDir.exists() && logDir.isDirectory()) {
+      File[] subFiles = logDir.listFiles();

Review comment:
       Also plan for concurrent file listings and removals

##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationManager.java
##########
@@ -85,10 +79,7 @@ public synchronized void init(HiveConf hiveConf) {
     LogDivertAppender.registerRoutingAppender(hiveConf);
     LogDivertAppenderForTest.registerRoutingAppenderIfInTest(hiveConf);
 
-    if (hiveConf.isWebUiQueryInfoCacheEnabled()) {
-      historicalQueryInfos = new QueryInfoCache(
-        hiveConf.getIntVar(ConfVars.HIVE_SERVER2_WEBUI_MAX_HISTORIC_QUERIES));
-    }
+    this.queryInfoCache = new QueryInfoCache(hiveConf);

Review comment:
       Do we need `QueryInfoCache` when we does not use the WebUI?
   Would it be possible to use `Optional` for it?

##########
File path: service/src/java/org/apache/hive/service/cli/session/SessionManager.java
##########
@@ -281,6 +284,7 @@ private void initOperationLogRootDir() {
         LOG.warn("Failed to schedule cleanup HS2 operation logging root dir: " +
             operationLogRootDir.getAbsolutePath(), e);
       }
+      logManager = new OperationLogManager(this, hiveConf);

Review comment:
       Do we need the `OperationLogManager` if we do not use async log removal? Could we use `Optional` for this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633429536



##########
File path: service/src/java/org/apache/hive/service/cli/operation/OperationManager.java
##########
@@ -85,10 +79,7 @@ public synchronized void init(HiveConf hiveConf) {
     LogDivertAppender.registerRoutingAppender(hiveConf);
     LogDivertAppenderForTest.registerRoutingAppenderIfInTest(hiveConf);
 
-    if (hiveConf.isWebUiQueryInfoCacheEnabled()) {
-      historicalQueryInfos = new QueryInfoCache(
-        hiveConf.getIntVar(ConfVars.HIVE_SERVER2_WEBUI_MAX_HISTORIC_QUERIES));
-    }
+    this.queryInfoCache = new QueryInfoCache(hiveConf);

Review comment:
       Do we need `QueryInfoCache` when we does not use the WebUI?
   Would it be possible to use `Optional` for it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633107448



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java
##########
@@ -73,10 +74,10 @@ public OperationLog(String name, File file, HiveConf hiveConf) {
       opLoggingLevel = LoggingLevel.UNKNOWN;
     }
 
+    isRemoveLogs = hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS);

Review comment:
       Done, check the HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED when hive not in test. Thanks for the review!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 closed pull request #1998: HIVE-24802: Show operation log at webui

Posted by GitBox <gi...@apache.org>.
dengzhhu653 closed pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org