You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/21 07:37:56 UTC

[GitHub] [spark] cloud-fan commented on a diff in pull request #38722: [SPARK-41200][CORE] BytesToBytesMap's longArray size can be up to MAX_CAPACITY

cloud-fan commented on code in PR #38722:
URL: https://github.com/apache/spark/pull/38722#discussion_r1027662257


##########
core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java:
##########
@@ -812,9 +812,7 @@ public boolean append(Object kbase, long koff, int klen, Object vbase, long voff
 
         // If the map has reached its growth threshold, try to grow it.
         if (numKeys >= growthThreshold) {
-          // We use two array entries per key, so the array size is twice the capacity.
-          // We should compare the current capacity of the array, instead of its size.
-          if (longArray.size() / 2 < MAX_CAPACITY) {
+          if (longArray.size() < MAX_CAPACITY) {

Review Comment:
   I think we need to read the code of this entire class to understand the capacity. According to the code comment
   ```
     /**
      * The maximum number of keys that BytesToBytesMap supports. The hash table has to be
      * power-of-2-sized and its backing Java array can contain at most (1 &lt;&lt; 30) elements,
      * since that's the largest power-of-2 that's less than Integer.MAX_VALUE. We need two long array
      * entries per key, giving us a maximum capacity of (1 &lt;&lt; 29).
      */
     public static final int MAX_CAPACITY = (1 << 29);
   ```
   
   This is to restrict the number of keys, and `longArray.size() / 2` is the number of keys. The code here should be correct.
   
   I also saw code like this
   ```
   private void allocate(int capacity) {
       assert (capacity >= 0);
       capacity = Math.max((int) Math.min(MAX_CAPACITY, ByteArrayMethods.nextPowerOf2(capacity)), 64);
       assert (capacity <= MAX_CAPACITY);
       longArray = allocateArray(capacity * 2L);
       longArray.zeroOut();
   
       this.growthThreshold = (int) (capacity * loadFactor);
       this.mask = capacity - 1;
     }
   ```
   
   `longArray` is 2x capacity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org