You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/03/31 11:45:05 UTC

[GitHub] [hive] kgyrtkirk opened a new pull request #964: HIVE-23095 ndv 70

kgyrtkirk opened a new pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964#discussion_r403398547
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
 ##########
 @@ -148,8 +148,12 @@ public int encodeHash(long hashcode) {
     }
   }
 
-  public int getSize() {
-    return sparseMap.size() + tempListIdx;
+  public boolean isSizeGreaterThan(int s) {
+    if (sparseMap.size() + tempListIdx > s) {
+      mergeTempListToSparseMap();
 
 Review comment:
   The tempList array was added for insertion performance (at the cost of slight misestimation when there are duplicates in the temp list) to quickly buffer up 1024 elements. This is negating that by merging it to sparse map more frequently. We might as well add it to sparse map directly instead of tempList so that getSize() becomes a constant time operation. The sparse register will switch to dense around after 3200 elements so sparsemap will not be a hot loop. I would recommend removing tempList from sparse register that way there will be less branching. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964#discussion_r405679222
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
 ##########
 @@ -148,8 +148,12 @@ public int encodeHash(long hashcode) {
     }
   }
 
-  public int getSize() {
-    return sparseMap.size() + tempListIdx;
+  public boolean isSizeGreaterThan(int s) {
+    if (sparseMap.size() + tempListIdx > s) {
+      mergeTempListToSparseMap();
 
 Review comment:
   also can we remove fastutil dependency? For such small encoding switch threshold I don't think fastutils fast hashmap implementation is worth it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #964: HIVE-23095 ndv 70

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964#discussion_r404221711
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
 ##########
 @@ -148,8 +148,12 @@ public int encodeHash(long hashcode) {
     }
   }
 
-  public int getSize() {
-    return sparseMap.size() + tempListIdx;
+  public boolean isSizeGreaterThan(int s) {
+    if (sparseMap.size() + tempListIdx > s) {
+      mergeTempListToSparseMap();
 
 Review comment:
   we are using some "sizeOptimized" version; which switches to dense around ~130 elements
   
   I'll reply back to removal/etc on the jira

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #964: HIVE-23095 ndv 70

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964#discussion_r405482181
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
 ##########
 @@ -148,8 +148,12 @@ public int encodeHash(long hashcode) {
     }
   }
 
-  public int getSize() {
-    return sparseMap.size() + tempListIdx;
+  public boolean isSizeGreaterThan(int s) {
+    if (sparseMap.size() + tempListIdx > s) {
+      mergeTempListToSparseMap();
 
 Review comment:
   we are using:
   * sizeOptimized => p=10
   * bitpacking is enabled by default 
   formula to count the threshold in this case is: 
   ```
   2**p * 6/8/5 = ~150
   ```
   https://github.com/apache/hive/blob/d91cc0cd84b7d0ecc0f29d44b109b46e21194eec/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L116
   
   I also found that a little too few...but that's what it is...
   
   all changes are here; this conversation is on a diff which is "outdated"
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964#discussion_r404355453
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
 ##########
 @@ -148,8 +148,12 @@ public int encodeHash(long hashcode) {
     }
   }
 
-  public int getSize() {
-    return sparseMap.size() + tempListIdx;
+  public boolean isSizeGreaterThan(int s) {
+    if (sparseMap.size() + tempListIdx > s) {
+      mergeTempListToSparseMap();
 
 Review comment:
   ~130 elements? Did encoding switch threshold change? 
   Can you please push commit to github PR for easier review?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964#discussion_r405677422
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
 ##########
 @@ -148,8 +148,12 @@ public int encodeHash(long hashcode) {
     }
   }
 
-  public int getSize() {
-    return sparseMap.size() + tempListIdx;
+  public boolean isSizeGreaterThan(int s) {
+    if (sparseMap.size() + tempListIdx > s) {
+      mergeTempListToSparseMap();
 
 Review comment:
   ah ok.. i did the math with p=14 and forgot that we switched to 10 in hive. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964#discussion_r403398823
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
 ##########
 @@ -195,7 +199,7 @@ public void extractLowBitsTo(HLLRegister dest) {
       byte lr = entry.getValue(); // this can be a max of 65, never > 127
       if (lr != 0) {
         // should be a no-op for sparse
-        dest.add((long) ((1 << (p + lr - 1)) | idx));
+        dest.add((1 << (p + lr - 1)) | idx);
 
 Review comment:
   nit: just noticing this. To be safe wrt overflow can you make it `1L << (p + lr - 1)`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #964: HIVE-23095 ndv 70
URL: https://github.com/apache/hive/pull/964#discussion_r405677422
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
 ##########
 @@ -148,8 +148,12 @@ public int encodeHash(long hashcode) {
     }
   }
 
-  public int getSize() {
-    return sparseMap.size() + tempListIdx;
+  public boolean isSizeGreaterThan(int s) {
+    if (sparseMap.size() + tempListIdx > s) {
+      mergeTempListToSparseMap();
 
 Review comment:
   ah oh.. i did the math with p=14 and forgot that we switched to 10 in hive. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org