You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/12/03 16:12:27 UTC
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/19869
[SPARK-22677][SQL] cleanup whole stage codegen for hash aggregate
## What changes were proposed in this pull request?
The `HashAggregateExec` whole stage codegen path is a little messy and hard to understand, this code cleans it up a little bit, especially for the fast hash map part.
## How was this patch tested?
existing tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark hash-agg
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19869.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19869
----
commit 174f4ec2ea000de01da6f494db366dc3ff58ccc4
Author: Wenchen Fan <we...@databricks.com>
Date: 2017-11-24T12:43:37Z
cleanup whole stage codegen for hash aggregate
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84432/testReport)** for PR 19869 at commit [`b1acd35`](https://github.com/apache/spark/commit/b1acd35d24ea449a97434b54759cf8ed441661f2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19869
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154580311
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -596,7 +596,7 @@ case class HashAggregateExec(
ctx.addMutableState(fastHashMapClassName, fastHashMapTerm,
s"$fastHashMapTerm = new $fastHashMapClassName();")
ctx.addMutableState(
- classOf[java.util.Iterator[ColumnarRow]].getName,
+ s"java.util.Iterator<${classOf[ColumnarRow]}>",
--- End diff --
```scala
scala> s"java.util.Iterator<${classOf[ColumnarRow]}>"
res2: String = java.util.Iterator<class org.apache.spark.sql.execution.vectorized.ColumnarRow>
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84435/testReport)** for PR 19869 at commit [`f0320d5`](https://github.com/apache/spark/commit/f0320d5f7b3406334a5250ecc2d9c5248142f34d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84416/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19869
LGTM for super minor comments...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19869
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154562413
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
def outputFromRowBasedMap: String = {
s"""
- while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
- }
- $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
--- End diff --
super nit: can we also drop unnecessary spaces in the head from this file? e.g.,
```
s"""
| private void $doAgg() throws java.io.IOException {
```
```
s"""
|private void $doAgg() throws java.io.IOException {
```
https://github.com/cloud-fan/spark/blob/9b8ae3d6635c5ed0323bf088e20d0de55dd1c098/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L233
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84416/testReport)** for PR 19869 at commit [`479fd8d`](https://github.com/apache/spark/commit/479fd8d4cb6ba6f0126d996362125746dfeac8f2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154528431
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
def outputFromRowBasedMap: String = {
s"""
- while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
- }
- $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
+ | UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
+ | UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
+ | $outputFunc($keyTerm, $bufferTerm);
+ |
+ | if (shouldStop()) return;
+ |}
+ |$fastHashMapTerm.close();
+ """.stripMargin
}
// Iterate over the aggregate rows and convert them from ColumnarRow to UnsafeRow
def outputFromVectorizedMap: String = {
val row = ctx.freshName("fastHashMapRow")
ctx.currentVars = null
ctx.INPUT_ROW = row
- val generateKeyRow = GenerateUnsafeProjection.createCode(ctx,
- groupingKeySchema.toAttributes.zipWithIndex
+ val generateKeyRow = GenerateUnsafeProjection.createCode(ctx,
+ groupingKeySchema.toAttributes.zipWithIndex
.map { case (attr, i) => BoundReference(i, attr.dataType, attr.nullable) }
- )
- val generateBufferRow = GenerateUnsafeProjection.createCode(ctx,
- bufferSchema.toAttributes.zipWithIndex
- .map { case (attr, i) =>
- BoundReference(groupingKeySchema.length + i, attr.dataType, attr.nullable) })
- s"""
- | while ($iterTermForFastHashMap.hasNext()) {
- | $numOutput.add(1);
- | org.apache.spark.sql.execution.vectorized.ColumnarRow $row =
- | (org.apache.spark.sql.execution.vectorized.ColumnarRow)
- | $iterTermForFastHashMap.next();
- | ${generateKeyRow.code}
- | ${generateBufferRow.code}
- | $outputFunc(${generateKeyRow.value}, ${generateBufferRow.value});
- |
- | if (shouldStop()) return;
- | }
- |
- | $fastHashMapTerm.close();
- """.stripMargin
+ )
+ val generateBufferRow = GenerateUnsafeProjection.createCode(ctx,
+ bufferSchema.toAttributes.zipWithIndex.map { case (attr, i) =>
+ BoundReference(groupingKeySchema.length + i, attr.dataType, attr.nullable)
+ })
+ val columnarRowCls = classOf[ColumnarRow].getName
+ s"""
+ |while ($iterTermForFastHashMap.hasNext()) {
+ | $columnarRowCls $row = ($columnarRowCls) $iterTermForFastHashMap.next();
+ | ${generateKeyRow.code}
+ | ${generateBufferRow.code}
+ | $outputFunc(${generateKeyRow.value}, ${generateBufferRow.value});
+ |
+ | if (shouldStop()) return;
+ |}
+ |
+ |$fastHashMapTerm.close();
+ """.stripMargin
}
+ def outputFromRegularHashMap: String = {
+ s"""
+ |while ($iterTerm.next()) {
--- End diff --
moved from https://github.com/apache/spark/pull/19869/files#diff-2eb948516b5beaeb746aadac27fbd5b4L731
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154528409
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
def outputFromRowBasedMap: String = {
s"""
- while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
- }
- $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
+ | UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
+ | UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
+ | $outputFunc($keyTerm, $bufferTerm);
+ |
+ | if (shouldStop()) return;
+ |}
+ |$fastHashMapTerm.close();
+ """.stripMargin
}
// Iterate over the aggregate rows and convert them from ColumnarRow to UnsafeRow
def outputFromVectorizedMap: String = {
val row = ctx.freshName("fastHashMapRow")
ctx.currentVars = null
ctx.INPUT_ROW = row
- val generateKeyRow = GenerateUnsafeProjection.createCode(ctx,
- groupingKeySchema.toAttributes.zipWithIndex
--- End diff --
The indentation was wrong previously.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19869
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84435/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154561232
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -621,34 +622,30 @@ case class HashAggregateExec(
val iterTerm = ctx.freshName("mapIter")
ctx.addMutableState(classOf[KVIterator[UnsafeRow, UnsafeRow]].getName, iterTerm)
- def generateGenerateCode(): String = {
- if (isFastHashMapEnabled) {
- if (isVectorizedHashMapEnabled) {
- s"""
- | ${fastHashMapGenerator.asInstanceOf[VectorizedHashMapGenerator].generate()}
- """.stripMargin
- } else {
- s"""
- | ${fastHashMapGenerator.asInstanceOf[RowBasedHashMapGenerator].generate()}
- """.stripMargin
- }
- } else ""
+ if (isFastHashMapEnabled) {
+ val generatedMap = if (isVectorizedHashMapEnabled) {
+ fastHashMapGenerator.asInstanceOf[VectorizedHashMapGenerator].generate()
+ } else {
+ fastHashMapGenerator.asInstanceOf[RowBasedHashMapGenerator].generate()
+ }
+ ctx.addInnerClass(generatedMap)
}
- ctx.addInnerClass(generateGenerateCode())
val doAgg = ctx.freshName("doAggregateWithKeys")
val peakMemory = metricTerm(ctx, "peakMemory")
val spillSize = metricTerm(ctx, "spillSize")
val avgHashProbe = metricTerm(ctx, "avgHashProbe")
+ val finishFashHashMap = if (isFastHashMapEnabled) {
--- End diff --
nit: can we merge this branch with the the branch above (line 625) for hashmap stuffs?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154569741
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -768,12 +762,8 @@ case class HashAggregateExec(
// generate hash code for key
val hashExpr = Murmur3Hash(groupingExpressions, 42)
- ctx.currentVars = input
val hashEval = BindReferences.bindReference(hashExpr, child.output).genCode(ctx)
- val inputAttr = aggregateBufferAttributes ++ child.output
- ctx.currentVars = new Array[ExprCode](aggregateBufferAttributes.length) ++ input
-
val (checkFallbackForGeneratedHashMap, checkFallbackForBytesToBytesMap, resetCounter,
incCounter) = if (testFallbackStartsAt.isDefined) {
--- End diff --
I'll have another PR for this part, will reformat it at that time
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84403/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84416/testReport)** for PR 19869 at commit [`479fd8d`](https://github.com/apache/spark/commit/479fd8d4cb6ba6f0126d996362125746dfeac8f2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154556638
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
def outputFromRowBasedMap: String = {
s"""
- while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
- }
- $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
+ | UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
+ | UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
+ | $outputFunc($keyTerm, $bufferTerm);
+ |
+ | if (shouldStop()) return;
+ |}
+ |$fastHashMapTerm.close();
+ """.stripMargin
}
// Iterate over the aggregate rows and convert them from ColumnarRow to UnsafeRow
def outputFromVectorizedMap: String = {
val row = ctx.freshName("fastHashMapRow")
--- End diff --
ah good catch!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84435/testReport)** for PR 19869 at commit [`f0320d5`](https://github.com/apache/spark/commit/f0320d5f7b3406334a5250ecc2d9c5248142f34d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84432/testReport)** for PR 19869 at commit [`b1acd35`](https://github.com/apache/spark/commit/b1acd35d24ea449a97434b54759cf8ed441661f2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84410/testReport)** for PR 19869 at commit [`9b8ae3d`](https://github.com/apache/spark/commit/9b8ae3d6635c5ed0323bf088e20d0de55dd1c098).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154566445
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -882,45 +851,65 @@ case class HashAggregateExec(
|${evaluateVariables(unsafeRowBufferEvals)}
|// update unsafe row buffer
|${updateUnsafeRowBuffer.mkString("\n").trim}
- """.stripMargin
+ """.stripMargin
}
+ val updateRowInHashMap: String = {
+ if (isFastHashMapEnabled) {
+ ctx.INPUT_ROW = fastRowBuffer
+ val boundUpdateExpr = updateExpr.map(BindReferences.bindReference(_, inputAttr))
+ val subExprs = ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
+ val effectiveCodes = subExprs.codes.mkString("\n")
+ val fastRowEvals = ctx.withSubExprEliminationExprs(subExprs.states) {
+ boundUpdateExpr.map(_.genCode(ctx))
+ }
+ val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) =>
+ val dt = updateExpr(i).dataType
+ ctx.updateColumn(
+ fastRowBuffer, dt, i, ev, updateExpr(i).nullable, isVectorizedHashMapEnabled)
+ }
+
+ // If fast hash map is on, we first generate code to update row in fast hash map, if the
+ // previous loop up hit fast hash map. Otherwise, update row in regular hash map.
+ s"""
+ |if ($fastRowBuffer != null) {
+ | // common sub-expressions
+ | $effectiveCodes
+ | // evaluate aggregate function
+ | ${evaluateVariables(fastRowEvals)}
+ | // update fast row
+ | ${updateFastRow.mkString("\n").trim}
+ |} else {
+ | $updateRowInRegularHashMap
+ |}
+ """.stripMargin
+ } else {
+ updateRowInRegularHashMap
+ }
+ }
+
+ val declareFastRowBuffer: String = if (isFastHashMapEnabled) {
+ val rowType = if (isVectorizedHashMapEnabled) {
+ classOf[MutableColumnarRow].getName
+ } else {
+ "UnsafeRow"
+ }
+ s"$rowType $fastRowBuffer = null;"
+ } else ""
// We try to do hash map based in-memory aggregation first. If there is not enough memory (the
// hash map will return null for new key), we spill the hash map to disk to free memory, then
// continue to do in-memory aggregation and spilling until all the rows had been processed.
// Finally, sort the spilled aggregate buffers by key, and merge them together for same key.
s"""
UnsafeRow $unsafeRowBuffer = null;
--- End diff --
nit: How about this for just consistency?
```
s"""
$declareRowBuffer
$findOrInsertHashMap
$incCounter
$updateRowInHashMap
"""
```
Then,
```
val declareRowBuffer: String = if (isFastHashMapEnabled) {
val rowType = if (isVectorizedHashMapEnabled) {
classOf[MutableColumnarRow].getName
} else {
"UnsafeRow"
}
s"""
|UnsafeRow $unsafeRowBuffer = null;
|$rowType $fastRowBuffer = null;
""".stripMargin
} else {
s"UnsafeRow $unsafeRowBuffer = null;"
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84419/testReport)** for PR 19869 at commit [`0589e7d`](https://github.com/apache/spark/commit/0589e7d8b57aa71c72dd052f687a4706fb0c5567).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84432/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154576368
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -573,94 +574,84 @@ case class HashAggregateExec(
enableTwoLevelHashMap(ctx)
} else {
sqlContext.getConf("spark.sql.codegen.aggregate.map.vectorized.enable", null) match {
- case "true" => logWarning("Two level hashmap is disabled but vectorized hashmap is " +
- "enabled.")
- case null | "" | "false" => None
+ case "true" =>
+ logWarning("Two level hashmap is disabled but vectorized hashmap is enabled.")
+ case _ =>
}
}
- fastHashMapTerm = ctx.freshName("fastHashMap")
- val fastHashMapClassName = ctx.freshName("FastHashMap")
- val fastHashMapGenerator =
- if (isVectorizedHashMapEnabled) {
- new VectorizedHashMapGenerator(ctx, aggregateExpressions,
- fastHashMapClassName, groupingKeySchema, bufferSchema)
- } else {
- new RowBasedHashMapGenerator(ctx, aggregateExpressions,
- fastHashMapClassName, groupingKeySchema, bufferSchema)
- }
val thisPlan = ctx.addReferenceObj("plan", this)
- // Create a name for iterator from vectorized HashMap
+ // Create a name for the iterator from the fast hash map.
val iterTermForFastHashMap = ctx.freshName("fastHashMapIter")
if (isFastHashMapEnabled) {
+ // Generates the fast hash map class and creates the fash hash map term.
+ fastHashMapTerm = ctx.freshName("fastHashMap")
+ val fastHashMapClassName = ctx.freshName("FastHashMap")
if (isVectorizedHashMapEnabled) {
+ val generatedMap = new VectorizedHashMapGenerator(ctx, aggregateExpressions,
+ fastHashMapClassName, groupingKeySchema, bufferSchema).generate()
+ ctx.addInnerClass(generatedMap)
+
ctx.addMutableState(fastHashMapClassName, fastHashMapTerm,
s"$fastHashMapTerm = new $fastHashMapClassName();")
ctx.addMutableState(
- "java.util.Iterator<org.apache.spark.sql.execution.vectorized.ColumnarRow>",
+ classOf[java.util.Iterator[ColumnarRow]].getName,
--- End diff --
Is this as same as before?
```scala
scala> classOf[java.util.Iterator[Int]].getName
res2: String = java.util.Iterator
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84403/testReport)** for PR 19869 at commit [`174f4ec`](https://github.com/apache/spark/commit/174f4ec2ea000de01da6f494db366dc3ff58ccc4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154528578
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -882,45 +851,65 @@ case class HashAggregateExec(
|${evaluateVariables(unsafeRowBufferEvals)}
|// update unsafe row buffer
|${updateUnsafeRowBuffer.mkString("\n").trim}
- """.stripMargin
+ """.stripMargin
}
+ val updateRowInHashMap: String = {
+ if (isFastHashMapEnabled) {
+ ctx.INPUT_ROW = fastRowBuffer
+ val boundUpdateExpr = updateExpr.map(BindReferences.bindReference(_, inputAttr))
+ val subExprs = ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
+ val effectiveCodes = subExprs.codes.mkString("\n")
+ val fastRowEvals = ctx.withSubExprEliminationExprs(subExprs.states) {
+ boundUpdateExpr.map(_.genCode(ctx))
+ }
+ val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) =>
+ val dt = updateExpr(i).dataType
+ ctx.updateColumn(
+ fastRowBuffer, dt, i, ev, updateExpr(i).nullable, isVectorizedHashMapEnabled)
+ }
+
+ // If fast hash map is on, we first generate code to update row in fast hash map, if the
+ // previous loop up hit fast hash map. Otherwise, update row in regular hash map.
+ s"""
+ |if ($fastRowBuffer != null) {
+ | // common sub-expressions
+ | $effectiveCodes
+ | // evaluate aggregate function
+ | ${evaluateVariables(fastRowEvals)}
+ | // update fast row
+ | ${updateFastRow.mkString("\n").trim}
+ |} else {
+ | $updateRowInRegularHashMap
+ |}
+ """.stripMargin
+ } else {
+ updateRowInRegularHashMap
--- End diff --
Previously we always declare the `fastRowBuffer` and have the `if (fastRowBuffer != null)` check. Now we don't generate then if fast hash map is not enabled.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154534333
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
def outputFromRowBasedMap: String = {
s"""
- while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
- }
- $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
+ | UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
+ | UnsafeRow $bufferTerm = (UnsafeRow) $iterTermForFastHashMap.getValue();
+ | $outputFunc($keyTerm, $bufferTerm);
+ |
+ | if (shouldStop()) return;
+ |}
+ |$fastHashMapTerm.close();
+ """.stripMargin
}
// Iterate over the aggregate rows and convert them from ColumnarRow to UnsafeRow
def outputFromVectorizedMap: String = {
val row = ctx.freshName("fastHashMapRow")
--- End diff --
nit: Is it better to fix indentations for these three lines, too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84419/testReport)** for PR 19869 at commit [`0589e7d`](https://github.com/apache/spark/commit/0589e7d8b57aa71c72dd052f687a4706fb0c5567).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154569858
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -768,12 +762,8 @@ case class HashAggregateExec(
// generate hash code for key
val hashExpr = Murmur3Hash(groupingExpressions, 42)
- ctx.currentVars = input
val hashEval = BindReferences.bindReference(hashExpr, child.output).genCode(ctx)
- val inputAttr = aggregateBufferAttributes ++ child.output
- ctx.currentVars = new Array[ExprCode](aggregateBufferAttributes.length) ++ input
-
val (checkFallbackForGeneratedHashMap, checkFallbackForBytesToBytesMap, resetCounter,
incCounter) = if (testFallbackStartsAt.isDefined) {
--- End diff --
ok, thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154528498
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -784,86 +774,65 @@ case class HashAggregateExec(
("true", "true", "", "")
}
- // We first generate code to probe and update the fast hash map. If the probe is
- // successful the corresponding fast row buffer will hold the mutable row
- val findOrInsertFastHashMap: Option[String] = {
+ val findOrInsertRegularHashMap: String =
+ s"""
+ |// generate grouping key
+ |${unsafeRowKeyCode.code.trim}
+ |${hashEval.code.trim}
+ |if ($checkFallbackForBytesToBytesMap) {
+ | // try to get the buffer from hash map
+ | $unsafeRowBuffer =
+ | $hashMapTerm.getAggregationBufferFromUnsafeRow($unsafeRowKeys, ${hashEval.value});
+ |}
+ |// Can't allocate buffer from the hash map. Spill the map and fallback to sort-based
+ |// aggregation after processing all input rows.
+ |if ($unsafeRowBuffer == null) {
+ | if ($sorterTerm == null) {
+ | $sorterTerm = $hashMapTerm.destructAndCreateExternalSorter();
+ | } else {
+ | $sorterTerm.merge($hashMapTerm.destructAndCreateExternalSorter());
+ | }
+ | $resetCounter
+ | // the hash map had be spilled, it should have enough memory now,
+ | // try to allocate buffer again.
+ | $unsafeRowBuffer = $hashMapTerm.getAggregationBufferFromUnsafeRow(
+ | $unsafeRowKeys, ${hashEval.value});
+ | if ($unsafeRowBuffer == null) {
+ | // failed to allocate the first page
+ | throw new OutOfMemoryError("No enough memory for aggregation");
+ | }
+ |}
+ """.stripMargin
+
+ val findOrInsertHashMap: String = {
if (isFastHashMapEnabled) {
- Option(
- s"""
- |
- |if ($checkFallbackForGeneratedHashMap) {
- | ${fastRowKeys.map(_.code).mkString("\n")}
- | if (${fastRowKeys.map("!" + _.isNull).mkString(" && ")}) {
- | $fastRowBuffer = $fastHashMapTerm.findOrInsert(
- | ${fastRowKeys.map(_.value).mkString(", ")});
- | }
- |}
- """.stripMargin)
+ // If fast hash map is on, we first generate code to probe and update the fast hash map.
+ // If the probe is successful the corresponding fast row buffer will hold the mutable row.
+ s"""
+ |if ($checkFallbackForGeneratedHashMap) {
--- End diff --
moved from https://github.com/apache/spark/pull/19869/files#diff-2eb948516b5beaeb746aadac27fbd5b4L794
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84410/testReport)** for PR 19869 at commit [`9b8ae3d`](https://github.com/apache/spark/commit/9b8ae3d6635c5ed0323bf088e20d0de55dd1c098).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84419/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19869
**[Test build #84403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84403/testReport)** for PR 19869 at commit [`174f4ec`](https://github.com/apache/spark/commit/174f4ec2ea000de01da6f494db366dc3ff58ccc4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154565659
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -768,12 +762,8 @@ case class HashAggregateExec(
// generate hash code for key
val hashExpr = Murmur3Hash(groupingExpressions, 42)
- ctx.currentVars = input
val hashEval = BindReferences.bindReference(hashExpr, child.output).genCode(ctx)
- val inputAttr = aggregateBufferAttributes ++ child.output
- ctx.currentVars = new Array[ExprCode](aggregateBufferAttributes.length) ++ input
-
val (checkFallbackForGeneratedHashMap, checkFallbackForBytesToBytesMap, resetCounter,
incCounter) = if (testFallbackStartsAt.isDefined) {
--- End diff --
nit: need some indents in the head?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154528386
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -444,6 +444,7 @@ case class HashAggregateExec(
val funcName = ctx.freshName("doAggregateWithKeysOutput")
val keyTerm = ctx.freshName("keyTerm")
val bufferTerm = ctx.freshName("bufferTerm")
+ val numOutput = metricTerm(ctx, "numOutputRows")
--- End diff --
update the `numOutputRows` in the result function instead of doing it for both fast hash map and regular hash map.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154528455
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -784,86 +774,65 @@ case class HashAggregateExec(
("true", "true", "", "")
}
- // We first generate code to probe and update the fast hash map. If the probe is
- // successful the corresponding fast row buffer will hold the mutable row
- val findOrInsertFastHashMap: Option[String] = {
+ val findOrInsertRegularHashMap: String =
--- End diff --
moved from https://github.com/apache/spark/pull/19869/files#diff-2eb948516b5beaeb746aadac27fbd5b4L833
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19869
cc @juliuszsompolski @kiszk @viirya @maropu
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19869
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84410/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19869#discussion_r154682180
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
@@ -596,7 +596,7 @@ case class HashAggregateExec(
ctx.addMutableState(fastHashMapClassName, fastHashMapTerm,
s"$fastHashMapTerm = new $fastHashMapClassName();")
ctx.addMutableState(
- classOf[java.util.Iterator[ColumnarRow]].getName,
+ s"java.util.Iterator<${classOf[ColumnarRow]}>",
--- End diff --
damn...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org