You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/05/02 22:58:17 UTC

[GitHub] [hive] scarlin-cloudera opened a new pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

scarlin-cloudera opened a new pull request #2238:
URL: https://github.com/apache/hive/pull/2238


   Testing this fix, please do not review yet.
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kishendas commented on a change in pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
kishendas commented on a change in pull request #2238:
URL: https://github.com/apache/hive/pull/2238#discussion_r630589315



##########
File path: service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
##########
@@ -408,18 +400,10 @@ private synchronized void acquireAfterOpLock(boolean userAccess) {
     // set the thread name with the logging prefix.
     sessionState.updateThreadName();
 
-    // If the thread local Hive is different from sessionHive, it means, the previous query execution in
-    // master thread has re-created Hive object due to changes in MS related configurations in sessionConf.
-    // So, it is necessary to reset sessionHive object based on new sessionConf. Here, we cannot,
-    // directly set sessionHive with thread local Hive because if the previous command was REPL LOAD, then
-    // the config changes lives only within command execution not in session level.
-    // So, the safer option is to invoke Hive.get() which decides if to reuse Thread local Hive or re-create it.
-    if (Hive.getThreadLocal() != sessionHive) {
-      try {
-        setSessionHive();
-      } catch (HiveSQLException e) {
-        throw new RuntimeException(e);
-      }
+    try {

Review comment:
       Please avoid formatting only changes.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1869,7 +1874,10 @@ public void close() throws IOException {
       unCacheDataNucleusClassLoaders();
     } finally {
       // removes the threadlocal variables, closes underlying HMS connection
-      Hive.closeCurrent();
+      if (hiveDb != null) {
+        hiveDb.close(true);
+        hiveDb = null;

Review comment:
       Why do you have to set it to null ?

##########
File path: service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
##########
@@ -240,21 +240,13 @@ protected int processCmd(String cmd) {
    * @throws HiveSQLException
    */
   private void setSessionHive() throws HiveSQLException {
-    Hive newSessionHive;
     try {
-      newSessionHive = Hive.get(getHiveConf());
-
-      // HMS connections from sessionHive shouldn't be closed by any query execution thread when it
-      // recreates the Hive object. It is allowed to be closed only when session is closed/released.
-      newSessionHive.setAllowClose(false);
+      sessionHive = sessionState.getHiveDb();
     } catch (HiveException e) {
-      throw new HiveSQLException("Failed to get metastore connection", e);

Review comment:
       Can you not just do - LOG.error("Failed to retrieve Hive object", e); ?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1869,7 +1874,10 @@ public void close() throws IOException {
       unCacheDataNucleusClassLoaders();
     } finally {
       // removes the threadlocal variables, closes underlying HMS connection
-      Hive.closeCurrent();
+      if (hiveDb != null) {
+        hiveDb.close(true);

Review comment:
       Will it not fail this because of hiveDb.setAllowClose(false); elsewhere ?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -2210,6 +2218,18 @@ public void endScope(String queryId) {
   public Map<Object, Object> getQueryCache(String queryId) {
     return cache.get(queryId);
   }
+
+  public Hive getHiveDb() throws HiveException {
+    if (hiveDb == null) {
+      hiveDb = Hive.createHiveForSession(sessionConf);
+      // Need to setAllowClose to false. For legacy reasons, the Hive object is stored
+      // in thread local storage. If allowClose is true, the session can get closed when
+      // the thread goes away which is not desirable when the Hive object is used across
+      // different queries in the session.
+      hiveDb.setAllowClose(false);

Review comment:
       What happens if we try to close ? Will it throw exception or it will be a no-op ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on pull request #2238:
URL: https://github.com/apache/hive/pull/2238#issuecomment-831599192


   Looks like it passed tests, so I hope people can review this now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kishendas commented on pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
kishendas commented on pull request #2238:
URL: https://github.com/apache/hive/pull/2238#issuecomment-839289414


   Can you please fill these ? 
   
   What changes were proposed in this pull request?
   
   Why are the changes needed?
   
   Does this PR introduce any user-facing change?
   
   How was this patch tested?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a change in pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on a change in pull request #2238:
URL: https://github.com/apache/hive/pull/2238#discussion_r632672635



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1869,7 +1874,10 @@ public void close() throws IOException {
       unCacheDataNucleusClassLoaders();
     } finally {
       // removes the threadlocal variables, closes underlying HMS connection
-      Hive.closeCurrent();
+      if (hiveDb != null) {
+        hiveDb.close(true);
+        hiveDb = null;

Review comment:
       Just for an extra safety purpose.  It's probably not necessary.
   
   But I figured as part of the contract, if hiveDb is set, then the connection is open.  There is a "start" method and a "close" method.  If the session ever got reopened, then we would expect the hiveDb value to be null.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on pull request #2238:
URL: https://github.com/apache/hive/pull/2238#issuecomment-843421092


   I tried to reproduce this here, but I could not.
   
   Interestingly, not only could I not reproduce this, but I also put in some debugging statements and the thrift threads seem to be tied to the thrift client. Not sure why this doesn't happen in my environment. 
   
   I still think it's worth pushing this fix though since there's really no reason a thrift thread needs to be attached to the client. 
   
   I'm wondering if this is related to HIVE-23925, but I tried running this 20x in my environment without my fix and it passed each time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a change in pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on a change in pull request #2238:
URL: https://github.com/apache/hive/pull/2238#discussion_r632676371



##########
File path: service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
##########
@@ -408,18 +400,10 @@ private synchronized void acquireAfterOpLock(boolean userAccess) {
     // set the thread name with the logging prefix.
     sessionState.updateThreadName();
 
-    // If the thread local Hive is different from sessionHive, it means, the previous query execution in
-    // master thread has re-created Hive object due to changes in MS related configurations in sessionConf.
-    // So, it is necessary to reset sessionHive object based on new sessionConf. Here, we cannot,
-    // directly set sessionHive with thread local Hive because if the previous command was REPL LOAD, then
-    // the config changes lives only within command execution not in session level.
-    // So, the safer option is to invoke Hive.get() which decides if to reuse Thread local Hive or re-create it.
-    if (Hive.getThreadLocal() != sessionHive) {
-      try {
-        setSessionHive();
-      } catch (HiveSQLException e) {
-        throw new RuntimeException(e);
-      }
+    try {

Review comment:
       Not sure I understand this comment.  The "if" clause went away since we do not depend on anything with thread local storage to setSessionHive.  The format changed only because the "if" statement went away.  I guess I could leave it indented an extra 2 spaces...is that what you're suggesting?

##########
File path: service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
##########
@@ -240,21 +240,13 @@ protected int processCmd(String cmd) {
    * @throws HiveSQLException
    */
   private void setSessionHive() throws HiveSQLException {
-    Hive newSessionHive;
     try {
-      newSessionHive = Hive.get(getHiveConf());
-
-      // HMS connections from sessionHive shouldn't be closed by any query execution thread when it
-      // recreates the Hive object. It is allowed to be closed only when session is closed/released.
-      newSessionHive.setAllowClose(false);
+      sessionHive = sessionState.getHiveDb();
     } catch (HiveException e) {
-      throw new HiveSQLException("Failed to get metastore connection", e);

Review comment:
       Done

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -2210,6 +2218,18 @@ public void endScope(String queryId) {
   public Map<Object, Object> getQueryCache(String queryId) {
     return cache.get(queryId);
   }
+
+  public Hive getHiveDb() throws HiveException {
+    if (hiveDb == null) {
+      hiveDb = Hive.createHiveForSession(sessionConf);
+      // Need to setAllowClose to false. For legacy reasons, the Hive object is stored
+      // in thread local storage. If allowClose is true, the session can get closed when
+      // the thread goes away which is not desirable when the Hive object is used across
+      // different queries in the session.
+      hiveDb.setAllowClose(false);

Review comment:
       I'm not sure I understand this.  We are not closing here.  This is just a value which ensures that when the thread goes away, the session stored in thread local storage will not be closed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kishendas commented on a change in pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
kishendas commented on a change in pull request #2238:
URL: https://github.com/apache/hive/pull/2238#discussion_r632839815



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -2210,6 +2218,18 @@ public void endScope(String queryId) {
   public Map<Object, Object> getQueryCache(String queryId) {
     return cache.get(queryId);
   }
+
+  public Hive getHiveDb() throws HiveException {
+    if (hiveDb == null) {
+      hiveDb = Hive.createHiveForSession(sessionConf);
+      // Need to setAllowClose to false. For legacy reasons, the Hive object is stored
+      // in thread local storage. If allowClose is true, the session can get closed when
+      // the thread goes away which is not desirable when the Hive object is used across
+      // different queries in the session.
+      hiveDb.setAllowClose(false);

Review comment:
       Got it. I see that close() method has a check for allowClose(). 

##########
File path: service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
##########
@@ -408,18 +400,10 @@ private synchronized void acquireAfterOpLock(boolean userAccess) {
     // set the thread name with the logging prefix.
     sessionState.updateThreadName();
 
-    // If the thread local Hive is different from sessionHive, it means, the previous query execution in
-    // master thread has re-created Hive object due to changes in MS related configurations in sessionConf.
-    // So, it is necessary to reset sessionHive object based on new sessionConf. Here, we cannot,
-    // directly set sessionHive with thread local Hive because if the previous command was REPL LOAD, then
-    // the config changes lives only within command execution not in session level.
-    // So, the safer option is to invoke Hive.get() which decides if to reuse Thread local Hive or re-create it.
-    if (Hive.getThreadLocal() != sessionHive) {
-      try {
-        setSessionHive();
-      } catch (HiveSQLException e) {
-        throw new RuntimeException(e);
-      }
+    try {

Review comment:
       My bad. Did not see if block going away. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a change in pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on a change in pull request #2238:
URL: https://github.com/apache/hive/pull/2238#discussion_r632669443



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1869,7 +1874,10 @@ public void close() throws IOException {
       unCacheDataNucleusClassLoaders();
     } finally {
       // removes the threadlocal variables, closes underlying HMS connection
-      Hive.closeCurrent();
+      if (hiveDb != null) {
+        hiveDb.close(true);

Review comment:
       That's correct.  The original design set the "allowClose" variable to false since there is cleanup logic when a thread dies that could close the session.  Since this is explicitly called at session close time, we can safely close the connection.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kishendas commented on pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
kishendas commented on pull request #2238:
URL: https://github.com/apache/hive/pull/2238#issuecomment-841530958


   How was this patch tested?
   
   We ran 10 beeline clients concurrently over a span of 6 hours to reproduce the original problem. With the fix, we were no longer able to reproduce.
   
   Since the testing was manual, is it possible to write a test with multiple clients/sessions which tests the logic that you have modified ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk merged pull request #2238: HIVE-25085: MetaStore Clients no longer shared across sessions.

Posted by GitBox <gi...@apache.org>.
kgyrtkirk merged pull request #2238:
URL: https://github.com/apache/hive/pull/2238


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org