You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mn-mikke <gi...@git.apache.org> on 2018/05/04 18:50:21 UTC
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
GitHub user mn-mikke opened a pull request:
https://github.com/apache/spark/pull/21236
[SPARK-23935][SQL] Adding map_entries function
## What changes were proposed in this pull request?
This PR adds `map_entries` function that returns an unordered array of all entries in the given map.
## How was this patch tested?
New tests added into:
- `CollectionExpressionSuite`
- `DataFrameFunctionsSuite`
## CodeGen examples
### Primitive types
```
val df = Seq(Map(1 -> 5, 2 -> 6)).toDF("m")
df.filter('m.isNotNull).select(map_entries('m)).debugCodegen
```
Result:
```
/* 042 */ boolean project_isNull_0 = false;
/* 043 */
/* 044 */ ArrayData project_value_0 = null;
/* 045 */
/* 046 */ final int project_numElements_0 = inputadapter_value_0.numElements();
/* 047 */ final ArrayData project_keys_0 = inputadapter_value_0.keyArray();
/* 048 */ final ArrayData project_values_0 = inputadapter_value_0.valueArray();
/* 049 */
/* 050 */ final int project_structSize_0 = 24;
/* 051 */ final long project_byteArraySize_0 = UnsafeArrayData.calculateSizeOfUnderlyingByteArray(project_numElements_0, 8 + project_structSize_0);
/* 052 */ final int project_structsOffset_0 = UnsafeArrayData.calculateHeaderPortionInBytes(project_numElements_0) + project_numElements_0 * 8;
/* 053 */ if (project_byteArraySize_0 > 2147483632) {
/* 054 */ final Object[] project_internalRowArray_0 = new Object[project_numElements_0];
/* 055 */ for (int z = 0; z < project_numElements_0; z++) {
/* 056 */ project_internalRowArray_0[z] = new org.apache.spark.sql.catalyst.expressions.GenericInternalRow(new Object[]{project_keys_0.getInt(z), project_values_0.getInt(z)});
/* 057 */ }
/* 058 */ project_value_0 = new org.apache.spark.sql.catalyst.util.GenericArrayData(project_internalRowArray_0);
/* 059 */
/* 060 */ } else {
/* 061 */ final byte[] project_byteArray_0 = new byte[(int)project_byteArraySize_0];
/* 062 */ UnsafeArrayData project_unsafeArrayData_0 = new UnsafeArrayData();
/* 063 */ Platform.putLong(project_byteArray_0, 16, project_numElements_0);
/* 064 */ project_unsafeArrayData_0.pointTo(project_byteArray_0, 16, (int)project_byteArraySize_0);
/* 065 */ UnsafeRow project_unsafeRow_0 = new UnsafeRow(2);
/* 066 */ for (int z = 0; z < project_numElements_0; z++) {
/* 067 */ long offset = project_structsOffset_0 + z * project_structSize_0;
/* 068 */ project_unsafeArrayData_0.setLong(z, (offset << 32) + project_structSize_0);
/* 069 */ project_unsafeRow_0.pointTo(project_byteArray_0, 16 + offset, project_structSize_0);
/* 070 */ project_unsafeRow_0.setInt(0, project_keys_0.getInt(z));
/* 071 */ project_unsafeRow_0.setInt(1, project_values_0.getInt(z));
/* 072 */ }
/* 073 */ project_value_0 = project_unsafeArrayData_0;
/* 074 */ }
```
### Non-primitive types
```
val df = Seq(Map("a" -> "foo", "b" -> null)).toDF("m")
df.filter('m.isNotNull).select(map_entries('m)).debug
```
Result:
```
/* 042 */ boolean project_isNull_0 = false;
/* 043 */
/* 044 */ ArrayData project_value_0 = null;
/* 045 */
/* 046 */ final int project_numElements_0 = inputadapter_value_0.numElements();
/* 047 */ final ArrayData project_keys_0 = inputadapter_value_0.keyArray();
/* 048 */ final ArrayData project_values_0 = inputadapter_value_0.valueArray();
/* 049 */
/* 050 */ final Object[] project_internalRowArray_0 = new Object[project_numElements_0];
/* 051 */ for (int z = 0; z < project_numElements_0; z++) {
/* 052 */ project_internalRowArray_0[z] = new org.apache.spark.sql.catalyst.expressions.GenericInternalRow(new Object[]{project_keys_0.getUTF8String(z), project_values_0.getUTF8String(z)});
/* 053 */ }
/* 054 */ project_value_0 = new org.apache.spark.sql.catalyst.util.GenericArrayData(project_internalRowArray_0);
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AbsaOSS/spark feature/array-api-map_entries-to-master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21236.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21236
----
commit 086e223e89ce0cf56145f4aa9a7aef5421a98810
Author: Marek Novotny <mn...@...>
Date: 2018-05-04T18:00:40Z
[SPARK-23935][SQL] Adding map_entries function
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r186377811
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +118,162 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val structSize = ctx.freshName("structSize")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+ val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+
+ val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
+ val valueAssignmentChecked = if (childDataType.valueContainsNull) {
+ s"""
+ |if ($values.isNullAt(z)) {
+ | $unsafeRow.setNullAt(1);
+ |} else {
+ | $valueAssignment
+ |}
+ """.stripMargin
+ } else {
+ valueAssignment
+ }
+
+ s"""
+ |final int $structSize = ${UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2};
+ |final long $byteArraySize = $calculateArraySize($numElements, $longSize + $structSize);
+ |final int $structsOffset = $calculateHeader($numElements) + $numElements * $longSize;
--- End diff --
We can move this into else-clause?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90887/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187001475
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,161 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
--- End diff --
@ueshin Really good question. I'm eager to learn about the true purpose of the `DataType.defaultSize` function. Currently, it's used in this meaning at more places (e.g.`GenArrayData.genCodeToCreateArrayData` and `CodeGenerator.createUnsafeArray`.)
What about using `Long.BYTES` from Java 8 instead?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90720/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r186371562
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +118,162 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val structSize = ctx.freshName("structSize")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+ val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+
+ val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
+ val valueAssignmentChecked = if (childDataType.valueContainsNull) {
+ s"""
+ |if ($values.isNullAt(z)) {
+ | $unsafeRow.setNullAt(1);
+ |} else {
+ | $valueAssignment
+ |}
+ """.stripMargin
+ } else {
+ valueAssignment
+ }
+
+ s"""
+ |final int $structSize = ${UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2};
--- End diff --
We can calculate `structSize` beforehand and inline it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187689540
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,161 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
--- End diff --
IMHO, `8` is the better choice since it is not related to an element size of `long`.
To my best guess, it would be best to define a new constant.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90881/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r207247448
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala ---
@@ -98,6 +98,9 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks {
if (expected.isNaN) result.isNaN else expected == result
case (result: Float, expected: Float) =>
if (expected.isNaN) result.isNaN else expected == result
+ case (result: UnsafeRow, expected: GenericInternalRow) =>
--- End diff --
Roger that, looks like Wenchen just did so. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:
https://github.com/apache/spark/pull/21236
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90318/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/21236
Jenkins, retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90557/testReport)** for PR 21236 at commit [`56ff20a`](https://github.com/apache/spark/commit/56ff20ac977ca1a305e96a7582789e2e75e6718c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90557/testReport)** for PR 21236 at commit [`56ff20a`](https://github.com/apache/spark/commit/56ff20ac977ca1a305e96a7582789e2e75e6718c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90887/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90888/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90557/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187995110
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,161 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
--- End diff --
Oh OK, I misunderstood the comments. Thanks guys!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187853879
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,162 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
+ val structSizeAsLong = structSize + "L"
+ val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+ val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+
+ val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
+ val valueAssignmentChecked = if (childDataType.valueContainsNull) {
+ s"""
+ |if ($values.isNullAt(z)) {
+ | $unsafeRow.setNullAt(1);
+ |} else {
+ | $valueAssignment
+ |}
+ """.stripMargin
+ } else {
+ valueAssignment
+ }
+
+ s"""
+ |final long $byteArraySize = $calculateArraySize($numElements, ${longSize + structSize});
+ |if ($byteArraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
+ | ${genCodeForAnyElements(ctx, keys, values, arrayData, numElements)}
+ |} else {
+ | final int $structsOffset = $calculateHeader($numElements) + $numElements * $longSize;
+ | final byte[] $data = new byte[(int)$byteArraySize];
+ | UnsafeArrayData $unsafeArrayData = new UnsafeArrayData();
+ | Platform.putLong($data, $baseOffset, $numElements);
+ | $unsafeArrayData.pointTo($data, $baseOffset, (int)$byteArraySize);
+ | UnsafeRow $unsafeRow = new UnsafeRow(2);
+ | for (int z = 0; z < $numElements; z++) {
+ | long offset = $structsOffset + z * $structSizeAsLong;
+ | $unsafeArrayData.setLong(z, (offset << 32) + $structSizeAsLong);
+ | $unsafeRow.pointTo($data, $baseOffset + offset, $structSize);
+ | $unsafeRow.set$keyTypeName(0, ${getKey(keys)});
+ | $valueAssignmentChecked
+ | }
+ | $arrayData = $unsafeArrayData;
+ |}
+ """.stripMargin
+ }
+
+ private def genCodeForAnyElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val genericArrayClass = classOf[GenericArrayData].getName
+ val rowClass = classOf[GenericInternalRow].getName
+ val data = ctx.freshName("internalRowArray")
+
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val getValueWithCheck = if (childDataType.valueContainsNull && isValuePrimitive) {
+ s"$values.isNullAt(z) ? null : (Object)${getValue(values)}"
+ } else {
+ getValue(values)
+ }
--- End diff --
nit: indent
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90596/testReport)** for PR 21236 at commit [`1bd0d5e`](https://github.com/apache/spark/commit/1bd0d5ea78908820140bd13c624a8e90f1e737fe).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187852992
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,161 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
--- End diff --
This is not for the element size of arrays. I agree with @kiszk to use `8`.
Maybe we need to add a constant to represent the word size in `UnsafeRow` or somewhere in the future pr.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90367/testReport)** for PR 21236 at commit [`6aa90ef`](https://github.com/apache/spark/commit/6aa90efaa2c06bf0f0f150d7c39a143a96ba8a2d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187854341
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,162 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
+ val structSizeAsLong = structSize + "L"
+ val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+ val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+
+ val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
+ val valueAssignmentChecked = if (childDataType.valueContainsNull) {
+ s"""
+ |if ($values.isNullAt(z)) {
+ | $unsafeRow.setNullAt(1);
+ |} else {
+ | $valueAssignment
+ |}
+ """.stripMargin
+ } else {
+ valueAssignment
+ }
+
+ s"""
+ |final long $byteArraySize = $calculateArraySize($numElements, ${longSize + structSize});
+ |if ($byteArraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
+ | ${genCodeForAnyElements(ctx, keys, values, arrayData, numElements)}
--- End diff --
Hmm, should we use this idiom for other array functions? WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90216/testReport)** for PR 21236 at commit [`086e223`](https://github.com/apache/spark/commit/086e223e89ce0cf56145f4aa9a7aef5421a98810).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:
https://github.com/apache/spark/pull/21236
@ueshin, @kiszk Thank you for valuable comments! Do you have any more?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90596/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90318/testReport)** for PR 21236 at commit [`d05ad9b`](https://github.com/apache/spark/commit/d05ad9be40064c61b05d838b3ba96b02267d5ee1).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90367/testReport)** for PR 21236 at commit [`6aa90ef`](https://github.com/apache/spark/commit/6aa90efaa2c06bf0f0f150d7c39a143a96ba8a2d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21236
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90881/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90221/testReport)** for PR 21236 at commit [`b9e2409`](https://github.com/apache/spark/commit/b9e240944e41674a25fd80e4c80b2f7c83af14ef).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r207078480
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala ---
@@ -98,6 +98,9 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks {
if (expected.isNaN) result.isNaN else expected == result
case (result: Float, expected: Float) =>
if (expected.isNaN) result.isNaN else expected == result
+ case (result: UnsafeRow, expected: GenericInternalRow) =>
--- End diff --
@mn-mikke I was just looking over compiler warnings, and noticed it claims this case is never triggered. I think it's because it would always first match the (InternalRow, InternalRow) case above. Should it go before that then?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90216/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90720/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90720/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187996983
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,162 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
+ val structSizeAsLong = structSize + "L"
+ val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+ val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+
+ val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
+ val valueAssignmentChecked = if (childDataType.valueContainsNull) {
+ s"""
+ |if ($values.isNullAt(z)) {
+ | $unsafeRow.setNullAt(1);
+ |} else {
+ | $valueAssignment
+ |}
+ """.stripMargin
+ } else {
+ valueAssignment
+ }
+
+ s"""
+ |final long $byteArraySize = $calculateArraySize($numElements, ${longSize + structSize});
+ |if ($byteArraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
+ | ${genCodeForAnyElements(ctx, keys, values, arrayData, numElements)}
--- End diff --
For now, I separated the logic that I can leverage for `map_from_entries` function. Moreover, I think it should be possible to replace `UnsafeArrayData.createUnsafeArray` with that logic, but will do it in a different PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:
https://github.com/apache/spark/pull/21236
cc @ueshin @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90880/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90596/testReport)** for PR 21236 at commit [`1bd0d5e`](https://github.com/apache/spark/commit/1bd0d5ea78908820140bd13c624a8e90f1e737fe).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187690481
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,161 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
+ val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+ val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
+
+ val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
+ val valueAssignmentChecked = if (childDataType.valueContainsNull) {
+ s"""
+ |if ($values.isNullAt(z)) {
+ | $unsafeRow.setNullAt(1);
+ |} else {
+ | $valueAssignment
+ |}
+ """.stripMargin
+ } else {
+ valueAssignment
+ }
+
+ s"""
+ |final long $byteArraySize = $calculateArraySize($numElements, ${longSize + structSize});
+ |if ($byteArraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
+ | ${genCodeForAnyElements(ctx, keys, values, arrayData, numElements)}
+ |} else {
+ | final int $structsOffset = $calculateHeader($numElements) + $numElements * $longSize;
+ | final byte[] $data = new byte[(int)$byteArraySize];
+ | UnsafeArrayData $unsafeArrayData = new UnsafeArrayData();
+ | Platform.putLong($data, $baseOffset, $numElements);
+ | $unsafeArrayData.pointTo($data, $baseOffset, (int)$byteArraySize);
+ | UnsafeRow $unsafeRow = new UnsafeRow(2);
+ | for (int z = 0; z < $numElements; z++) {
+ | long offset = $structsOffset + z * $structSize;
--- End diff --
nit: `$structSize` -> `${$structSize}L`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90216/testReport)** for PR 21236 at commit [`086e223`](https://github.com/apache/spark/commit/086e223e89ce0cf56145f4aa9a7aef5421a98810).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes `
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:
https://github.com/apache/spark/pull/21236
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/21236
Jenkins, retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90880/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r186986884
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,161 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
--- End diff --
I'm wondering it is right to use `longSize` here?
I know the value is `8` and is same as the word size, but feel like the meaning is different?
cc @gatorsmile @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/21236
Thanks! merging to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/21236
I'd retrigger the build for just checking again.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90881/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90888/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90318/testReport)** for PR 21236 at commit [`d05ad9b`](https://github.com/apache/spark/commit/d05ad9be40064c61b05d838b3ba96b02267d5ee1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90888/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90887/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90367/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r207148777
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala ---
@@ -98,6 +98,9 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks {
if (expected.isNaN) result.isNaN else expected == result
case (result: Float, expected: Float) =>
if (expected.isNaN) result.isNaN else expected == result
+ case (result: UnsafeRow, expected: GenericInternalRow) =>
--- End diff --
Hi @srowen,
```(InternalRow, InternalRow)``` case was introduced later in [21838](https://github.com/apache/spark/pull/21838) and covers the logic of the case with ```UnsafeRow```. So we can just remove the unreachable piece of code.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90221/testReport)** for PR 21236 at commit [`b9e2409`](https://github.com/apache/spark/commit/b9e240944e41674a25fd80e4c80b2f7c83af14ef).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21236
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90221/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21236
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:
https://github.com/apache/spark/pull/21236#discussion_r187813418
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -118,6 +119,161 @@ case class MapValues(child: Expression)
override def prettyName: String = "map_values"
}
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+ usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+ [(1,"a"),(2,"b")]
+ """,
+ since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
+
+ override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+ lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+ override def dataType: DataType = {
+ ArrayType(
+ StructType(
+ StructField("key", childDataType.keyType, false) ::
+ StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
+ Nil),
+ false)
+ }
+
+ override protected def nullSafeEval(input: Any): Any = {
+ val childMap = input.asInstanceOf[MapData]
+ val keys = childMap.keyArray()
+ val values = childMap.valueArray()
+ val length = childMap.numElements()
+ val resultData = new Array[AnyRef](length)
+ var i = 0;
+ while (i < length) {
+ val key = keys.get(i, childDataType.keyType)
+ val value = values.get(i, childDataType.valueType)
+ val row = new GenericInternalRow(Array[Any](key, value))
+ resultData.update(i, row)
+ i += 1
+ }
+ new GenericArrayData(resultData)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ nullSafeCodeGen(ctx, ev, c => {
+ val numElements = ctx.freshName("numElements")
+ val keys = ctx.freshName("keys")
+ val values = ctx.freshName("values")
+ val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
+ val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
+ val code = if (isKeyPrimitive && isValuePrimitive) {
+ genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
+ } else {
+ genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+ }
+ s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+ """.stripMargin
+ })
+ }
+
+ private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
+
+ private def getValue(varName: String) = {
+ CodeGenerator.getValue(varName, childDataType.valueType, "z")
+ }
+
+ private def genCodeForPrimitiveElements(
+ ctx: CodegenContext,
+ keys: String,
+ values: String,
+ arrayData: String,
+ numElements: String): String = {
+ val byteArraySize = ctx.freshName("byteArraySize")
+ val data = ctx.freshName("byteArray")
+ val unsafeRow = ctx.freshName("unsafeRow")
+ val unsafeArrayData = ctx.freshName("unsafeArrayData")
+ val structsOffset = ctx.freshName("structsOffset")
+ val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+ val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+ val baseOffset = Platform.BYTE_ARRAY_OFFSET
+ val longSize = LongType.defaultSize
+ val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
--- End diff --
@kiszk Thanks for your suggestion, but it seems to me that `LongType.defaultSize` could be used in this case. It seems that the purpose of `defaultSize` is not only the calculation of estimated data size in statistics. `GenerateUnsafeProjection.writeArrayToBuffer`, `InterpretedUnsafeProjection.getElementSize` and other parts utilize `defaultSize` in the same way.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21236
**[Test build #90880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90880/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org