You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/08/02 09:28:48 UTC
[GitHub] spark pull request #21966: simplify codegen of array_except
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/21966
simplify codegen of array_except
## What changes were proposed in this pull request?
simplify the codegen:
1. only do real codegen if the type can be specialized by the hash set
2. change the null handling. Before: track the nullElementIndex, and create a new ArrayData to insert the null in the middle. After: track the nullElementIndex, put a null placeholder in the ArrayBuilder, at the end create ArrayData from ArrayBuilder directly.
## How was this patch tested?
existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark minor2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21966.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21966
----
commit 313ad294ec07d54f06db92f635b6bb9a76f17343
Author: Wenchen Fan <we...@...>
Date: 2018-08-02T09:18:21Z
simplify codegen of array_except
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21966: [SPARK-23915][SQL][followup] Add array_except fun...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21966
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21966
**[Test build #94116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94116/testReport)** for PR 21966 at commit [`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1743/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/21966
Thanks! merging to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21966
**[Test build #94116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94116/testReport)** for PR 21966 at commit [`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/21966
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: simplify codegen of array_except
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21966
cc @kiszk @ueshin
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21966
**[Test build #93986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93986/testReport)** for PR 21966 at commit [`313ad29`](https://github.com/apache/spark/commit/313ad294ec07d54f06db92f635b6bb9a76f17343).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1732/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94116/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94099/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21966: [SPARK-23915][SQL][followup] Add array_except fun...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21966#discussion_r207302021
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -4077,81 +4078,84 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val arrayData = classOf[ArrayData].getName
val i = ctx.freshName("i")
- val pos = ctx.freshName("pos")
val value = ctx.freshName("value")
- val hsValue = ctx.freshName("hsValue")
val size = ctx.freshName("size")
- if (elementTypeSupportEquals) {
- val ptName = CodeGenerator.primitiveTypeName(elementType)
- val unsafeArray = ctx.freshName("unsafeArray")
- val (postFix, openHashElementType, hsJavaTypeName, genHsValue,
- getter, setter, javaTypeName, primitiveTypeName, arrayDataBuilder) =
- elementType match {
- case ByteType | ShortType | IntegerType =>
- ("$mcI$sp", "Int", "int", s"(int) $value",
- s"get$ptName($i)", s"set$ptName($pos, $value)",
- CodeGenerator.javaType(elementType), ptName,
- s"""
- |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
- |${ev.value} = $unsafeArray;
- """.stripMargin)
- case LongType | FloatType | DoubleType =>
- val signature = elementType match {
- case LongType => "$mcJ$sp"
- case FloatType => "$mcF$sp"
- case DoubleType => "$mcD$sp"
- }
- (signature, CodeGenerator.boxedType(elementType),
- CodeGenerator.javaType(elementType), value,
- s"get$ptName($i)", s"set$ptName($pos, $value)",
- CodeGenerator.javaType(elementType), ptName,
- s"""
- |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
- |${ev.value} = $unsafeArray;
- """.stripMargin)
- case _ =>
- val genericArrayData = classOf[GenericArrayData].getName
- val et = ctx.addReferenceObj("elementType", elementType)
- ("", "Object", "Object", value,
- s"get($i, $et)", s"update($pos, $value)", "Object", "Ref",
- s"${ev.value} = new $genericArrayData(new Object[$size]);")
- }
+ val canUseSpecializedHashSet = elementType match {
+ case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => true
+ case _ => false
+ }
+ if (canUseSpecializedHashSet) {
+ val jt = CodeGenerator.javaType(elementType)
+ val ptName = CodeGenerator.primitiveTypeName(jt)
+
+ def genGetValue(array: String): String =
+ CodeGenerator.getValue(array, elementType, i)
+
+ val (hsPostFix, hsTypeName) = elementType match {
+ // we cast byte/short to int when writing to the hash set.
+ case ByteType | ShortType | IntegerType => ("$mcI$sp", "Int")
+ case LongType => ("$mcJ$sp", ptName)
+ case FloatType => ("$mcF$sp", ptName)
+ case DoubleType => ("$mcD$sp", ptName)
+ }
+
+ // we cast byte/short to int when writing to the hash set.
+ val hsValueCast = elementType match {
+ case ByteType | ShortType => "(int) "
+ case _ => ""
+ }
nullSafeCodeGen(ctx, ev, (array1, array2) => {
val notFoundNullElement = ctx.freshName("notFoundNullElement")
val nullElementIndex = ctx.freshName("nullElementIndex")
val builder = ctx.freshName("builder")
- val array = ctx.freshName("array")
val openHashSet = classOf[OpenHashSet[_]].getName
- val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$openHashElementType()"
- val hs = ctx.freshName("hs")
+ val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
+ val hashSet = ctx.freshName("hashSet")
val genericArrayData = classOf[GenericArrayData].getName
val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
- val arrayBuilderClass = s"$arrayBuilder$$of$primitiveTypeName"
- val arrayBuilderClassTag = if (primitiveTypeName != "Ref") {
- s"scala.reflect.ClassTag$$.MODULE$$.$primitiveTypeName()"
- } else {
- s"scala.reflect.ClassTag$$.MODULE$$.AnyRef()"
- }
+ val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
+ val arrayBuilderClassTag = s"scala.reflect.ClassTag$$.MODULE$$.$ptName()"
- def withArray2NullCheck(body: String) =
- if (right.dataType.asInstanceOf[ArrayType].containsNull) {
- s"""
- |if ($array2.isNullAt($i)) {
- | $notFoundNullElement = false;
- |} else {
- | $body
- |}
+ def withArray2NullCheck(body: String): String =
+ if (left.dataType.asInstanceOf[ArrayType].containsNull) {
--- End diff --
Is it better to use the following structure to make `else` clause common?
```
if (right.dataType.asInstanceOf[ArrayType].containsNull) {
if (left.dataType.asInstanceOf[ArrayType].containsNull) {
...
} else {
...
}
} else {
body
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21966
**[Test build #94099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94099/testReport)** for PR 21966 at commit [`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21966: [SPARK-23915][SQL][followup] Add array_except fun...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21966#discussion_r207289175
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -4077,81 +4078,84 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val arrayData = classOf[ArrayData].getName
val i = ctx.freshName("i")
- val pos = ctx.freshName("pos")
val value = ctx.freshName("value")
- val hsValue = ctx.freshName("hsValue")
val size = ctx.freshName("size")
- if (elementTypeSupportEquals) {
- val ptName = CodeGenerator.primitiveTypeName(elementType)
- val unsafeArray = ctx.freshName("unsafeArray")
- val (postFix, openHashElementType, hsJavaTypeName, genHsValue,
- getter, setter, javaTypeName, primitiveTypeName, arrayDataBuilder) =
- elementType match {
- case ByteType | ShortType | IntegerType =>
- ("$mcI$sp", "Int", "int", s"(int) $value",
- s"get$ptName($i)", s"set$ptName($pos, $value)",
- CodeGenerator.javaType(elementType), ptName,
- s"""
- |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
- |${ev.value} = $unsafeArray;
- """.stripMargin)
- case LongType | FloatType | DoubleType =>
- val signature = elementType match {
- case LongType => "$mcJ$sp"
- case FloatType => "$mcF$sp"
- case DoubleType => "$mcD$sp"
- }
- (signature, CodeGenerator.boxedType(elementType),
- CodeGenerator.javaType(elementType), value,
- s"get$ptName($i)", s"set$ptName($pos, $value)",
- CodeGenerator.javaType(elementType), ptName,
- s"""
- |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
- |${ev.value} = $unsafeArray;
- """.stripMargin)
- case _ =>
- val genericArrayData = classOf[GenericArrayData].getName
- val et = ctx.addReferenceObj("elementType", elementType)
- ("", "Object", "Object", value,
- s"get($i, $et)", s"update($pos, $value)", "Object", "Ref",
- s"${ev.value} = new $genericArrayData(new Object[$size]);")
- }
+ val canUseSpecializedHashSet = elementType match {
+ case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => true
+ case _ => false
+ }
+ if (canUseSpecializedHashSet) {
+ val jt = CodeGenerator.javaType(elementType)
+ val ptName = CodeGenerator.primitiveTypeName(jt)
+
+ def genGetValue(array: String): String =
+ CodeGenerator.getValue(array, elementType, i)
+
+ val (hsPostFix, hsTypeName) = elementType match {
+ // we cast byte/short to int when writing to the hash set.
+ case ByteType | ShortType | IntegerType => ("$mcI$sp", "Int")
+ case LongType => ("$mcJ$sp", ptName)
+ case FloatType => ("$mcF$sp", ptName)
+ case DoubleType => ("$mcD$sp", ptName)
+ }
+
+ // we cast byte/short to int when writing to the hash set.
+ val hsValueCast = elementType match {
+ case ByteType | ShortType => "(int) "
+ case _ => ""
+ }
nullSafeCodeGen(ctx, ev, (array1, array2) => {
val notFoundNullElement = ctx.freshName("notFoundNullElement")
val nullElementIndex = ctx.freshName("nullElementIndex")
val builder = ctx.freshName("builder")
- val array = ctx.freshName("array")
val openHashSet = classOf[OpenHashSet[_]].getName
- val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$openHashElementType()"
- val hs = ctx.freshName("hs")
+ val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
+ val hashSet = ctx.freshName("hashSet")
val genericArrayData = classOf[GenericArrayData].getName
val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
- val arrayBuilderClass = s"$arrayBuilder$$of$primitiveTypeName"
- val arrayBuilderClassTag = if (primitiveTypeName != "Ref") {
- s"scala.reflect.ClassTag$$.MODULE$$.$primitiveTypeName()"
- } else {
- s"scala.reflect.ClassTag$$.MODULE$$.AnyRef()"
- }
+ val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
+ val arrayBuilderClassTag = s"scala.reflect.ClassTag$$.MODULE$$.$ptName()"
- def withArray2NullCheck(body: String) =
- if (right.dataType.asInstanceOf[ArrayType].containsNull) {
- s"""
- |if ($array2.isNullAt($i)) {
- | $notFoundNullElement = false;
- |} else {
- | $body
- |}
+ def withArray2NullCheck(body: String): String =
+ if (left.dataType.asInstanceOf[ArrayType].containsNull) {
+ if (right.dataType.asInstanceOf[ArrayType].containsNull) {
+ s"""
+ |if ($array2.isNullAt($i)) {
+ | $notFoundNullElement = false;
+ |} else {
+ | $body
+ |}
""".stripMargin
+ } else {
+ body
+ }
} else {
- body
+ // if array1's element is not nullable, we don't need to track the null element index.
+ if (right.dataType.asInstanceOf[ArrayType].containsNull) {
+ s"""
+ |if (!$array2.isNullAt($i)) {
+ | $body
+ |}
+ """.stripMargin
+ } else {
+ body
+ }
}
- val array2Body =
+
+ val writeArray2ToHashSet = withArray2NullCheck(
s"""
- |$javaTypeName $value = $array2.$getter;
- |$hsJavaTypeName $hsValue = $genHsValue;
- |$hs.add$postFix($hsValue);
- """.stripMargin
+ |$jt $value = ${genGetValue(array2)};
+ |$hashSet.add$hsPostFix($hsValueCast$value);
+ """.stripMargin)
+
+ // When hitting a null vale, put a null holder in the ArrayBuilder. Finally we will
+ // convert ArrayBuilder to ArrayData and setNull on the slot with null holder.
+ val nullValueHolder = elementType match {
+ case ByteType => "(byte) 0"
+ case ShortType => "(short ) 0"
--- End diff --
nit: extra space after `short`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21966
**[Test build #93986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93986/testReport)** for PR 21966 at commit [`313ad29`](https://github.com/apache/spark/commit/313ad294ec07d54f06db92f635b6bb9a76f17343).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/21966
LGTM except some comments
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21966: [SPARK-23915][SQL][followup] Add array_except fun...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21966#discussion_r207302241
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -4077,81 +4078,84 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val arrayData = classOf[ArrayData].getName
val i = ctx.freshName("i")
- val pos = ctx.freshName("pos")
val value = ctx.freshName("value")
- val hsValue = ctx.freshName("hsValue")
val size = ctx.freshName("size")
- if (elementTypeSupportEquals) {
- val ptName = CodeGenerator.primitiveTypeName(elementType)
- val unsafeArray = ctx.freshName("unsafeArray")
- val (postFix, openHashElementType, hsJavaTypeName, genHsValue,
- getter, setter, javaTypeName, primitiveTypeName, arrayDataBuilder) =
- elementType match {
- case ByteType | ShortType | IntegerType =>
- ("$mcI$sp", "Int", "int", s"(int) $value",
- s"get$ptName($i)", s"set$ptName($pos, $value)",
- CodeGenerator.javaType(elementType), ptName,
- s"""
- |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
- |${ev.value} = $unsafeArray;
- """.stripMargin)
- case LongType | FloatType | DoubleType =>
- val signature = elementType match {
- case LongType => "$mcJ$sp"
- case FloatType => "$mcF$sp"
- case DoubleType => "$mcD$sp"
- }
- (signature, CodeGenerator.boxedType(elementType),
- CodeGenerator.javaType(elementType), value,
- s"get$ptName($i)", s"set$ptName($pos, $value)",
- CodeGenerator.javaType(elementType), ptName,
- s"""
- |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
- |${ev.value} = $unsafeArray;
- """.stripMargin)
- case _ =>
- val genericArrayData = classOf[GenericArrayData].getName
- val et = ctx.addReferenceObj("elementType", elementType)
- ("", "Object", "Object", value,
- s"get($i, $et)", s"update($pos, $value)", "Object", "Ref",
- s"${ev.value} = new $genericArrayData(new Object[$size]);")
- }
+ val canUseSpecializedHashSet = elementType match {
+ case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => true
+ case _ => false
+ }
+ if (canUseSpecializedHashSet) {
+ val jt = CodeGenerator.javaType(elementType)
+ val ptName = CodeGenerator.primitiveTypeName(jt)
+
+ def genGetValue(array: String): String =
+ CodeGenerator.getValue(array, elementType, i)
+
+ val (hsPostFix, hsTypeName) = elementType match {
+ // we cast byte/short to int when writing to the hash set.
+ case ByteType | ShortType | IntegerType => ("$mcI$sp", "Int")
+ case LongType => ("$mcJ$sp", ptName)
+ case FloatType => ("$mcF$sp", ptName)
+ case DoubleType => ("$mcD$sp", ptName)
+ }
+
+ // we cast byte/short to int when writing to the hash set.
+ val hsValueCast = elementType match {
+ case ByteType | ShortType => "(int) "
+ case _ => ""
+ }
nullSafeCodeGen(ctx, ev, (array1, array2) => {
val notFoundNullElement = ctx.freshName("notFoundNullElement")
val nullElementIndex = ctx.freshName("nullElementIndex")
val builder = ctx.freshName("builder")
- val array = ctx.freshName("array")
val openHashSet = classOf[OpenHashSet[_]].getName
- val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$openHashElementType()"
- val hs = ctx.freshName("hs")
+ val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
+ val hashSet = ctx.freshName("hashSet")
val genericArrayData = classOf[GenericArrayData].getName
val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
- val arrayBuilderClass = s"$arrayBuilder$$of$primitiveTypeName"
- val arrayBuilderClassTag = if (primitiveTypeName != "Ref") {
- s"scala.reflect.ClassTag$$.MODULE$$.$primitiveTypeName()"
- } else {
- s"scala.reflect.ClassTag$$.MODULE$$.AnyRef()"
- }
+ val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
+ val arrayBuilderClassTag = s"scala.reflect.ClassTag$$.MODULE$$.$ptName()"
- def withArray2NullCheck(body: String) =
- if (right.dataType.asInstanceOf[ArrayType].containsNull) {
- s"""
- |if ($array2.isNullAt($i)) {
- | $notFoundNullElement = false;
- |} else {
- | $body
- |}
+ def withArray2NullCheck(body: String): String =
+ if (left.dataType.asInstanceOf[ArrayType].containsNull) {
+ if (right.dataType.asInstanceOf[ArrayType].containsNull) {
+ s"""
+ |if ($array2.isNullAt($i)) {
+ | $notFoundNullElement = false;
+ |} else {
+ | $body
+ |}
""".stripMargin
+ } else {
+ body
+ }
} else {
- body
+ // if array1's element is not nullable, we don't need to track the null element index.
+ if (right.dataType.asInstanceOf[ArrayType].containsNull) {
+ s"""
+ |if (!$array2.isNullAt($i)) {
+ | $body
+ |}
+ """.stripMargin
+ } else {
+ body
+ }
}
- val array2Body =
+
+ val writeArray2ToHashSet = withArray2NullCheck(
s"""
- |$javaTypeName $value = $array2.$getter;
- |$hsJavaTypeName $hsValue = $genHsValue;
- |$hs.add$postFix($hsValue);
- """.stripMargin
+ |$jt $value = ${genGetValue(array2)};
+ |$hashSet.add$hsPostFix($hsValueCast$value);
+ """.stripMargin)
+
+ // When hitting a null vale, put a null holder in the ArrayBuilder. Finally we will
--- End diff --
nit: `vale` -> `value`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1643/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21966
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93986/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21966
**[Test build #94099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94099/testReport)** for PR 21966 at commit [`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org