You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/09/17 05:39:03 UTC
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/22439
[SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreateArrayData method
## What changes were proposed in this pull request?
This PR makes `GenArrayData.genCodeToCreateArrayData` method simple by using `ArrayData.createArrayData` method.
Before this PR, `genCodeToCreateArrayData` method was complicated
* Generated a temporary Java array to create `ArrayData`
* Had separate code generation path to assign values for `GenericArrayData` and `UnsafeArrayData`
After this PR, the method
* Directly generates `GenericArrayData` or `UnsafeArrayData` without a temporary array
* Has only code generation path to assign values
## How was this patch tested?
Existing UTs
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kiszk/spark SPARK-25444
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22439.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22439
----
commit 24fbf742fdd8490f57d29325100036e556847c77
Author: Kazuaki Ishizaki <is...@...>
Date: 2018-09-17T05:28:00Z
initial commit
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22439
**[Test build #96120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96120/testReport)** for PR 22439 at commit [`24fbf74`](https://github.com/apache/spark/commit/24fbf742fdd8490f57d29325100036e556847c77).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r218090048
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +74,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
- * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
+ * @param elementsExpr concatenated set of [[Expression]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
- elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ elementsExpr: Seq[Expression],
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsExpr.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
- val isNullAssignment = if (!isMapKey) {
- s"$arrayName[$i] = null;"
- } else {
- "throw new RuntimeException(\"Cannot use null as map key!\");"
- }
- eval.code + s"""
- if (${eval.isNull}) {
- $isNullAssignment
- } else {
- $arrayName[$i] = ${eval.value};
- }
- """
- }
- val assignmentString = ctx.splitExpressionsWithCurrentInputs(
- expressions = assignments,
- funcName = "apply",
- extraArguments = ("Object[]", arrayName) :: Nil)
-
- (s"Object[] $arrayName = new Object[$numElements];",
- assignmentString,
- s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
- arrayDataName)
- } else {
- val arrayName = ctx.freshName("array")
- val unsafeArraySizeInBytes =
- UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
- ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
- val baseOffset = Platform.BYTE_ARRAY_OFFSET
-
- val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsExpr.zipWithIndex.map { case (expr, i) =>
+ val eval = expr.genCode(ctx)
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (!expr.nullable || eval.isNull == FalseLiteral) {
--- End diff --
I see, looks redudant
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22439
**[Test build #96144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96144/testReport)** for PR 22439 at commit [`fff3361`](https://github.com/apache/spark/commit/fff3361276519ae93c265a029290f5ba9f70945a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r218030505
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
* @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsCode.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (eval.isNull == FalseLiteral) {
--- End diff --
Maybe we don't need `|| eval.isNull == FalseLiteral`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/22439
LGTM.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3146/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r217967158
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
* @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsCode.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (eval.isNull == FalseLiteral) {
--- End diff --
How about taking an argument `elements: Seq[Expression]` instead of `elementCodes: Seq[ExprCode]` and use `element.nullable` instead of `eval.isNull` if we want to skip null-check when the nullable is false?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r218048161
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +74,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
- * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
+ * @param elementsExpr concatenated set of [[Expression]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
- elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ elementsExpr: Seq[Expression],
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsExpr.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
- val isNullAssignment = if (!isMapKey) {
- s"$arrayName[$i] = null;"
- } else {
- "throw new RuntimeException(\"Cannot use null as map key!\");"
- }
- eval.code + s"""
- if (${eval.isNull}) {
- $isNullAssignment
- } else {
- $arrayName[$i] = ${eval.value};
- }
- """
- }
- val assignmentString = ctx.splitExpressionsWithCurrentInputs(
- expressions = assignments,
- funcName = "apply",
- extraArguments = ("Object[]", arrayName) :: Nil)
-
- (s"Object[] $arrayName = new Object[$numElements];",
- assignmentString,
- s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
- arrayDataName)
- } else {
- val arrayName = ctx.freshName("array")
- val unsafeArraySizeInBytes =
- UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
- ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
- val baseOffset = Platform.BYTE_ARRAY_OFFSET
-
- val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsExpr.zipWithIndex.map { case (expr, i) =>
+ val eval = expr.genCode(ctx)
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (!expr.nullable || eval.isNull == FalseLiteral) {
--- End diff --
Maybe we don't need `|| eval.isNull == FalseLiteral`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96144/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/22439
Thanks! merging to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96120/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22439
**[Test build #96133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96133/testReport)** for PR 22439 at commit [`2286b6e`](https://github.com/apache/spark/commit/2286b6e8667dfde38ef5a7a47633ac34e29b7c69).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r217965632
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
* @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsCode.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (eval.isNull == FalseLiteral) {
+ s"\n$setArrayElement\n"
+ } else {
val isNullAssignment = if (!isMapKey) {
- s"$arrayName[$i] = null;"
+ s"$arrayDataName.setNullAt($i);"
} else {
"throw new RuntimeException(\"Cannot use null as map key!\");"
}
- eval.code + s"""
- if (${eval.isNull}) {
- $isNullAssignment
- } else {
- $arrayName[$i] = ${eval.value};
- }
- """
- }
- val assignmentString = ctx.splitExpressionsWithCurrentInputs(
- expressions = assignments,
- funcName = "apply",
- extraArguments = ("Object[]", arrayName) :: Nil)
-
- (s"Object[] $arrayName = new Object[$numElements];",
- assignmentString,
- s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
- arrayDataName)
- } else {
- val arrayName = ctx.freshName("array")
- val unsafeArraySizeInBytes =
- UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
- ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
- val baseOffset = Platform.BYTE_ARRAY_OFFSET
-
- val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
- val isNullAssignment = if (!isMapKey) {
- s"$arrayDataName.setNullAt($i);"
+
+ if (eval.isNull == TrueLiteral) {
--- End diff --
Do we need this case?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r218032925
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
* @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsCode.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (eval.isNull == FalseLiteral) {
+ s"\n$setArrayElement\n"
+ } else {
val isNullAssignment = if (!isMapKey) {
- s"$arrayName[$i] = null;"
+ s"$arrayDataName.setNullAt($i);"
} else {
"throw new RuntimeException(\"Cannot use null as map key!\");"
}
- eval.code + s"""
- if (${eval.isNull}) {
- $isNullAssignment
- } else {
- $arrayName[$i] = ${eval.value};
- }
- """
- }
- val assignmentString = ctx.splitExpressionsWithCurrentInputs(
- expressions = assignments,
- funcName = "apply",
- extraArguments = ("Object[]", arrayName) :: Nil)
-
- (s"Object[] $arrayName = new Object[$numElements];",
- assignmentString,
- s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
- arrayDataName)
- } else {
- val arrayName = ctx.freshName("array")
- val unsafeArraySizeInBytes =
- UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
- ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
- val baseOffset = Platform.BYTE_ARRAY_OFFSET
-
- val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
- val isNullAssignment = if (!isMapKey) {
- s"$arrayDataName.setNullAt($i);"
+
+ if (eval.isNull == TrueLiteral) {
--- End diff --
I don't think we need the case because there are very few expressions that set `isNull` to `TrueLiteral`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r218029860
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
* @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsCode.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (eval.isNull == FalseLiteral) {
--- End diff --
I see. We will do `if (!element.nullable || eval.isNull == FalseLiteral) {`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22439
**[Test build #96120 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96120/testReport)** for PR 22439 at commit [`24fbf74`](https://github.com/apache/spark/commit/24fbf742fdd8490f57d29325100036e556847c77).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r217996706
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
* @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsCode.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (eval.isNull == FalseLiteral) {
+ s"\n$setArrayElement\n"
--- End diff --
It works functionally without `\n`. This is only for readability of generation code to avoid this output:
`arrayData.setInt(0, 0);arrayData.setInt(1, 1);...`
WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22439
**[Test build #96133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96133/testReport)** for PR 22439 at commit [`2286b6e`](https://github.com/apache/spark/commit/2286b6e8667dfde38ef5a7a47633ac34e29b7c69).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3157/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22439
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r217965854
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
* @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsCode.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (eval.isNull == FalseLiteral) {
+ s"\n$setArrayElement\n"
--- End diff --
nit: do we need to surround `setArrayElement` with `\n`s? Only `setArrayElement` should work?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96133/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22439
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3163/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22439#discussion_r217997181
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
@@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
private [sql] object GenArrayData {
/**
- * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
+ * Return Java code pieces based on DataType and array size to allocate ArrayData class
*
* @param ctx a [[CodegenContext]]
* @param elementType data type of underlying array elements
* @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
* @param isMapKey if true, throw an exception when the element is null
- * @return (code pre-assignments, concatenated assignments to each array elements,
- * code post-assignments, arrayData name)
+ * @param functionName string to include in the error message
+ * @return (array allocation, concatenated assignments to each array elements, arrayData name)
*/
def genCodeToCreateArrayData(
ctx: CodegenContext,
elementType: DataType,
elementsCode: Seq[ExprCode],
- isMapKey: Boolean): (String, String, String, String) = {
+ isMapKey: Boolean,
+ functionName: String): (String, String, String) = {
val arrayDataName = ctx.freshName("arrayData")
- val numElements = elementsCode.length
+ val numElements = s"${elementsCode.length}L"
- if (!CodeGenerator.isPrimitiveType(elementType)) {
- val arrayName = ctx.freshName("arrayObject")
- val genericArrayClass = classOf[GenericArrayData].getName
+ val initialization = CodeGenerator.createArrayData(
+ arrayDataName, elementType, numElements, s" $functionName failed.")
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+ val setArrayElement = CodeGenerator.setArrayElement(
+ arrayDataName, elementType, i.toString, eval.value)
+
+ val assignment = if (eval.isNull == FalseLiteral) {
+ s"\n$setArrayElement\n"
+ } else {
val isNullAssignment = if (!isMapKey) {
- s"$arrayName[$i] = null;"
+ s"$arrayDataName.setNullAt($i);"
} else {
"throw new RuntimeException(\"Cannot use null as map key!\");"
}
- eval.code + s"""
- if (${eval.isNull}) {
- $isNullAssignment
- } else {
- $arrayName[$i] = ${eval.value};
- }
- """
- }
- val assignmentString = ctx.splitExpressionsWithCurrentInputs(
- expressions = assignments,
- funcName = "apply",
- extraArguments = ("Object[]", arrayName) :: Nil)
-
- (s"Object[] $arrayName = new Object[$numElements];",
- assignmentString,
- s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
- arrayDataName)
- } else {
- val arrayName = ctx.freshName("array")
- val unsafeArraySizeInBytes =
- UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
- ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
- val baseOffset = Platform.BYTE_ARRAY_OFFSET
-
- val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
- val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
- val isNullAssignment = if (!isMapKey) {
- s"$arrayDataName.setNullAt($i);"
+
+ if (eval.isNull == TrueLiteral) {
--- End diff --
Since I saw the following case, I added this condition to reduce the generated Java byte code size.
```
if (true) {
...
} else {
...
}
```
I am neutral on keeping or removing this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22439
**[Test build #96144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96144/testReport)** for PR 22439 at commit [`fff3361`](https://github.com/apache/spark/commit/fff3361276519ae93c265a029290f5ba9f70945a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org