You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/07/14 04:34:56 UTC

[GitHub] [hive] rbalamohan opened a new pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

rbalamohan opened a new pull request #1250:
URL: https://github.com/apache/hive/pull/1250


   https://issues.apache.org/jira/browse/HIVE-23843
   
   TestVectorGroupByOperator has been modified to run with and without hive.vectorized.groupby.agg.enable.lrucache.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rbalamohan edited a comment on pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
rbalamohan edited a comment on pull request #1250:
URL: https://github.com/apache/hive/pull/1250#issuecomment-657962495


   Observed good improvement in Q67 and few other queries.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rbalamohan commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r457792912



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -358,6 +360,34 @@ public void close(boolean aborted) throws HiveException {
      */
     private long numRowsCompareHashAggr;
 
+    /**
+     * To track current memory usage.
+     */
+    private long currMemUsed;
+
+    /**
+     * Whether to make use of LRUCache for map aggr buffers or not.
+     */
+    private boolean lruCache;
+
+    class LRUCache extends LinkedHashMap<KeyWrapper, VectorAggregationBufferRow> {

Review comment:
       LinkedHashMap provides this semantics and already maintains the double linked list internally. Invokes removeEldestEntry on need basis.  However, planning to make it a lot more simpler in next iteration of the patch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk closed pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
kgyrtkirk closed pull request #1250:
URL: https://github.com/apache/hive/pull/1250


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r454341433



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -358,6 +360,34 @@ public void close(boolean aborted) throws HiveException {
      */
     private long numRowsCompareHashAggr;
 
+    /**
+     * To track current memory usage.
+     */
+    private long currMemUsed;
+
+    /**
+     * Whether to make use of LRUCache for map aggr buffers or not.
+     */
+    private boolean lruCache;
+
+    class LRUCache extends LinkedHashMap<KeyWrapper, VectorAggregationBufferRow> {

Review comment:
       this doesn't look like an `LRU` cache to me...it removes the eldest entry; in case of `lru` you should move the accessed entry to the head of the list
   
   I think guava has some lru cache implementation - might worth a look




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459243033



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       @ashutoshc this conversation was still not resolved - I was waiting for a response; I think we could have improved further on this patch just by changing it a little bit.
   
   @rbalamohan  we are batch removing from the cache elements here; which does not happen in regular LRU stuff.
   
   if we have {{K}} cache slots; and start the stream with an element which is there for say {{N*K}} times ; that will raise the bar to retain a new cache element during flush to {{N}}.
   
   I think the counters of the retained entries should be reset to 0 at least - it will increase it's effectiveness and neutralize long-term memory effects - like the above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r457944846



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       the new patch flattens an LRU logic into the operator itself - I rather liked the earlier pluggable approach better; since that could enable us to later plug-in LFRU which might be the best for this job...
   
   I believe that is because we might not want to plug in different kind of algos here...which is fine
   however I think we can shift toward an LFRU alike operation by penalitizing entries which are kept:
   * reset accesscount of kept entries to 0
   * or...by multiply it by .5 or something...(can be placed in conf)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r454349711



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -358,6 +360,34 @@ public void close(boolean aborted) throws HiveException {
      */
     private long numRowsCompareHashAggr;
 
+    /**
+     * To track current memory usage.
+     */
+    private long currMemUsed;
+
+    /**
+     * Whether to make use of LRUCache for map aggr buffers or not.
+     */
+    private boolean lruCache;
+
+    class LRUCache extends LinkedHashMap<KeyWrapper, VectorAggregationBufferRow> {
+
+      @Override
+      protected boolean removeEldestEntry(Map.Entry<KeyWrapper, VectorAggregationBufferRow> eldest) {
+        if (currMemUsed > maxHashTblMemory || size() > maxHtEntries || gcCanary.get() == null) {

Review comment:
       this method seems to have been polluted by the "isFull" logic - which is unexpected with this method name
   the "isFull" should be moved outside - and remove should only called when the condition is met

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -4065,6 +4065,9 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     HIVE_VECTORIZATION_GROUPBY_MAXENTRIES("hive.vectorized.groupby.maxentries", 1000000,
         "Max number of entries in the vector group by aggregation hashtables. \n" +
         "Exceeding this will trigger a flush irrelevant of memory pressure condition."),
+    HIVE_VECTORIZATION_GROUPBY_ENABLE_LRU_FOR_AGGR(

Review comment:
       instead of introducing a boolean toggle; add a mode switch (default/lru/etc)

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -420,35 +460,56 @@ public void doProcessBatch(VectorizedRowBatch batch, boolean isFirstGroupingSet,
       //Flush if memory limits were reached
       // We keep flushing until the memory is under threshold
       int preFlushEntriesCount = numEntriesHashTable;
-      while (shouldFlush(batch)) {
-        flush(false);
 
-        if(gcCanary.get() == null) {
-          gcCanaryFlushes++;
-          gcCanary = new SoftReference<Object>(new Object());
-        }
+      if (!lruCache) {
+        while (shouldFlush(batch)) {
+          flush(false);
+
+          if(gcCanary.get() == null) {
+            gcCanaryFlushes++;
+            gcCanary = new SoftReference<Object>(new Object());
+          }
 
-        //Validate that some progress is being made
-        if (!(numEntriesHashTable < preFlushEntriesCount)) {
-          if (LOG.isDebugEnabled()) {
-            LOG.debug(String.format("Flush did not progress: %d entries before, %d entries after",
-                preFlushEntriesCount,
-                numEntriesHashTable));
+          //Validate that some progress is being made
+          if (!(numEntriesHashTable < preFlushEntriesCount)) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug(String.format("Flush did not progress: %d entries before, %d entries after",
+                  preFlushEntriesCount,
+                  numEntriesHashTable));
+            }
+            break;
           }
-          break;
+          preFlushEntriesCount = numEntriesHashTable;
         }
-        preFlushEntriesCount = numEntriesHashTable;
+      } else {
+        checkAndFlushLRU(batch);
       }
 
       if (sumBatchSize == 0 && 0 != batch.size) {
         // Sample the first batch processed for variable sizes.
         updateAvgVariableSize(batch);
+        currMemUsed = numEntriesHashTable * (fixedHashEntrySize + avgVariableSize);

Review comment:
       this is strange...there is a `currMemUsed` field an there is also a `currMemUsed` local variable in `shouldFlush` - they might cause things to me more interesting :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r457932155



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -358,6 +360,34 @@ public void close(boolean aborted) throws HiveException {
      */
     private long numRowsCompareHashAggr;
 
+    /**
+     * To track current memory usage.
+     */
+    private long currMemUsed;
+
+    /**
+     * Whether to make use of LRUCache for map aggr buffers or not.
+     */
+    private boolean lruCache;
+
+    class LRUCache extends LinkedHashMap<KeyWrapper, VectorAggregationBufferRow> {

Review comment:
       okay; but this is not an `LRU` cache; please rename the class/option/etc because it may only cause confusion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rbalamohan commented on pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
rbalamohan commented on pull request #1250:
URL: https://github.com/apache/hive/pull/1250#issuecomment-657962495


   Observed good improvement in Q67.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459540489



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       I don't think we should get to L1 cache stuff here; I'm ok with not having an LRU, we might not need to do that - let's stay on logical level:
   
   I'm saying that if in a round the hit count of an element is not reduced or reset to zero (when it's being kept for next round) those keys could retain their places for a long time because of very old cache hits - and they will keep their place in the cache; this could lead to a case in which all the new elements (n/2) added to the cache are evicted and why very old entries which are not even getting any hits will keep half of the cache filled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rbalamohan commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459175320



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       Yes, for now there is no plan to make this pluggable. This is mainly for the rollup queries, where it works best with LRU. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rbalamohan commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459378925



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       For rollup, it should be the other way around. If retained entries are reset to zero, it could potentially get evicted in next iteration polluting the map. I guess the confusion is with strict LRU impl here. Intent is not to implement strict LRU, and evict one at time (earlier impl with LinkedHashMap followed that, but ends up with L1 cache misses and ends up with heavy object tracking). Current implementation is light weight, but does not change the pattern of evicting 10% on entries. Just adds logic to retain heavily accessed entries. 
   
   Potential further optimization is to reuse the evicted entities and reuse them for pooling (e.g for every keywrapper gets cloned internally via copyKey() in the map. This causes high mem pressure on certain queries). This can be added as an additional optimization in follow up jira.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rbalamohan commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459813384



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       https://issues.apache.org/jira/browse/HIVE-23917




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rbalamohan commented on pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
rbalamohan commented on pull request #1250:
URL: https://github.com/apache/hive/pull/1250#issuecomment-661564023


   Simplified and revised the patch. Q67 @ 10 TB scale shows good improvement (2050+ seconds --> 1500+ seconds) in internal cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rbalamohan commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Posted by GitBox <gi...@apache.org>.
rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459811176



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       >> keys could retain their places for a long time because of very old cache hits - and they will keep their place in the cache
   
   This depends on incoming data and would be the worst case scenario similar to earlier implementation.  However, there is a corner case (again depending on data) that large number of entries in the map exceeds the average threshold which could prevent 10% flushing limit. Adding the reset would help preventing this. I will create a follow up ticket on this.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org