You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/08/03 11:42:59 UTC

[GitHub] spark pull request #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggrega...

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21931#discussion_r207519280
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala ---
    @@ -83,7 +84,7 @@ class VectorizedHashMapGenerator(
            |  private ${classOf[ColumnarBatch].getName} batch;
            |  private ${classOf[MutableColumnarRow].getName} aggBufferRow;
            |  private int[] buckets;
    -       |  private int capacity = 1 << 16;
    +       |  private int capacity = $maxCapacity;
    --- End diff --
    
    We can see the following code at L226. If a user specify `2^n` value (e.g. 1024), it works functionally correct. What happens if a user specified non `2^n` value (e.g. 127)?
    ```
    idx = (idx + 1) & (numBuckets - 1);
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org