You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/08/24 14:49:32 UTC

[GitHub] [hive] abstractdog opened a new pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

abstractdog opened a new pull request #1423:
URL: https://github.com/apache/hive/pull/1423


   Change-Id: I311f131c03392618cc2dac186e7e53a48ede1eb4
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   As the title suggests, expensive bloom filter deserialization can be eliminated by caching the bloom filters. This way, only 1 filter instance per daemon (or container in container mode) will be present.
   
   
   ### Why are the changes needed?
   Performance improvement.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Tested on cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1423:
URL: https://github.com/apache/hive/pull/1423#issuecomment-679861897


   @rbalamohan : could you please take a look? simple patch, tested on cluster


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1423:
URL: https://github.com/apache/hive/pull/1423#discussion_r476265302



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java
##########
@@ -100,26 +103,39 @@ public void init(Configuration conf) {
     default:
       throw new IllegalStateException("Unsupported type " + colVectorType);
     }
+
+    String queryId = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID);
+    runtimeCache = ObjectCacheFactory.getCache(conf, queryId, false, true);
   }
 
-  private void initValue()  {
-    InputStream in = null;
+  private void initValue() {
     try {
-      Object val = bloomFilterDynamicValue.getValue();
-      if (val != null) {
-        BinaryObjectInspector boi = (BinaryObjectInspector) bloomFilterDynamicValue.getObjectInspector();
-        byte[] bytes = boi.getPrimitiveJavaObject(val);
-        in = new NonSyncByteArrayInputStream(bytes);
-        bloomFilter = BloomKFilter.deserialize(in);
-      } else {
-        bloomFilter = null;
-      }
-      initialized = true;
-    } catch (Exception err) {
-      throw new RuntimeException(err);
-    } finally {
-      IOUtils.closeStream(in);

Review comment:
       no...for bytearrayinputstream its not needed 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1423:
URL: https://github.com/apache/hive/pull/1423#issuecomment-683243881


   pushed to master


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1423:
URL: https://github.com/apache/hive/pull/1423#issuecomment-680849960


   could you please take a look @rbalamohan ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #1423:
URL: https://github.com/apache/hive/pull/1423#discussion_r476289475



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java
##########
@@ -100,26 +103,39 @@ public void init(Configuration conf) {
     default:
       throw new IllegalStateException("Unsupported type " + colVectorType);
     }
+
+    String queryId = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID);
+    runtimeCache = ObjectCacheFactory.getCache(conf, queryId, false, true);
   }
 
-  private void initValue()  {
-    InputStream in = null;
+  private void initValue() {
     try {
-      Object val = bloomFilterDynamicValue.getValue();
-      if (val != null) {
-        BinaryObjectInspector boi = (BinaryObjectInspector) bloomFilterDynamicValue.getObjectInspector();
-        byte[] bytes = boi.getPrimitiveJavaObject(val);
-        in = new NonSyncByteArrayInputStream(bytes);
-        bloomFilter = BloomKFilter.deserialize(in);
-      } else {
-        bloomFilter = null;
-      }
-      initialized = true;
-    } catch (Exception err) {
-      throw new RuntimeException(err);
-    } finally {
-      IOUtils.closeStream(in);

Review comment:
       added it back and force pushed
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #1423:
URL: https://github.com/apache/hive/pull/1423#discussion_r476270406



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java
##########
@@ -100,26 +103,39 @@ public void init(Configuration conf) {
     default:
       throw new IllegalStateException("Unsupported type " + colVectorType);
     }
+
+    String queryId = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID);
+    runtimeCache = ObjectCacheFactory.getCache(conf, queryId, false, true);
   }
 
-  private void initValue()  {
-    InputStream in = null;
+  private void initValue() {
     try {
-      Object val = bloomFilterDynamicValue.getValue();
-      if (val != null) {
-        BinaryObjectInspector boi = (BinaryObjectInspector) bloomFilterDynamicValue.getObjectInspector();
-        byte[] bytes = boi.getPrimitiveJavaObject(val);
-        in = new NonSyncByteArrayInputStream(bytes);
-        bloomFilter = BloomKFilter.deserialize(in);
-      } else {
-        bloomFilter = null;
-      }
-      initialized = true;
-    } catch (Exception err) {
-      throw new RuntimeException(err);
-    } finally {
-      IOUtils.closeStream(in);

Review comment:
       good catch, I think it's needed, or at least I removed it accidentally




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog closed pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

Posted by GitBox <gi...@apache.org>.
abstractdog closed pull request #1423:
URL: https://github.com/apache/hive/pull/1423


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1423: HIVE-24065: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1423:
URL: https://github.com/apache/hive/pull/1423#discussion_r476264312



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java
##########
@@ -100,26 +103,39 @@ public void init(Configuration conf) {
     default:
       throw new IllegalStateException("Unsupported type " + colVectorType);
     }
+
+    String queryId = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID);
+    runtimeCache = ObjectCacheFactory.getCache(conf, queryId, false, true);
   }
 
-  private void initValue()  {
-    InputStream in = null;
+  private void initValue() {
     try {
-      Object val = bloomFilterDynamicValue.getValue();
-      if (val != null) {
-        BinaryObjectInspector boi = (BinaryObjectInspector) bloomFilterDynamicValue.getObjectInspector();
-        byte[] bytes = boi.getPrimitiveJavaObject(val);
-        in = new NonSyncByteArrayInputStream(bytes);
-        bloomFilter = BloomKFilter.deserialize(in);
-      } else {
-        bloomFilter = null;
-      }
-      initialized = true;
-    } catch (Exception err) {
-      throw new RuntimeException(err);
-    } finally {
-      IOUtils.closeStream(in);

Review comment:
       I don't see this close in the new implementation...isn't that needed?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org