You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/08/03 11:42:59 UTC
[GitHub] spark pull request #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggrega...
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21931#discussion_r207519280
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala ---
@@ -83,7 +84,7 @@ class VectorizedHashMapGenerator(
| private ${classOf[ColumnarBatch].getName} batch;
| private ${classOf[MutableColumnarRow].getName} aggBufferRow;
| private int[] buckets;
- | private int capacity = 1 << 16;
+ | private int capacity = $maxCapacity;
--- End diff --
We can see the following code at L226. If a user specify `2^n` value (e.g. 1024), it works functionally correct. What happens if a user specified non `2^n` value (e.g. 127)?
```
idx = (idx + 1) & (numBuckets - 1);
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org