You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mn-mikke <gi...@git.apache.org> on 2018/05/04 18:50:21 UTC

[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

GitHub user mn-mikke opened a pull request:

    https://github.com/apache/spark/pull/21236

    [SPARK-23935][SQL] Adding map_entries function

    ## What changes were proposed in this pull request?
    
    This PR adds `map_entries` function that returns an unordered array of all entries in the given map.
    
    ## How was this patch tested?
    
    New tests added into:
    - `CollectionExpressionSuite`
    - `DataFrameFunctionsSuite`
    
    ## CodeGen examples
    ### Primitive types
    ```
    val df = Seq(Map(1 -> 5, 2 -> 6)).toDF("m")
    df.filter('m.isNotNull).select(map_entries('m)).debugCodegen
    ```
    Result:
    ```
    /* 042 */         boolean project_isNull_0 = false;
    /* 043 */
    /* 044 */         ArrayData project_value_0 = null;
    /* 045 */
    /* 046 */         final int project_numElements_0 = inputadapter_value_0.numElements();
    /* 047 */         final ArrayData project_keys_0 = inputadapter_value_0.keyArray();
    /* 048 */         final ArrayData project_values_0 = inputadapter_value_0.valueArray();
    /* 049 */
    /* 050 */         final int project_structSize_0 = 24;
    /* 051 */         final long project_byteArraySize_0 = UnsafeArrayData.calculateSizeOfUnderlyingByteArray(project_numElements_0, 8 + project_structSize_0);
    /* 052 */         final int project_structsOffset_0 = UnsafeArrayData.calculateHeaderPortionInBytes(project_numElements_0) + project_numElements_0 * 8;
    /* 053 */         if (project_byteArraySize_0 > 2147483632) {
    /* 054 */           final Object[] project_internalRowArray_0 = new Object[project_numElements_0];
    /* 055 */           for (int z = 0; z < project_numElements_0; z++) {
    /* 056 */             project_internalRowArray_0[z] = new org.apache.spark.sql.catalyst.expressions.GenericInternalRow(new Object[]{project_keys_0.getInt(z), project_values_0.getInt(z)});
    /* 057 */           }
    /* 058 */           project_value_0 = new org.apache.spark.sql.catalyst.util.GenericArrayData(project_internalRowArray_0);
    /* 059 */
    /* 060 */         } else {
    /* 061 */           final byte[] project_byteArray_0 = new byte[(int)project_byteArraySize_0];
    /* 062 */           UnsafeArrayData project_unsafeArrayData_0 = new UnsafeArrayData();
    /* 063 */           Platform.putLong(project_byteArray_0, 16, project_numElements_0);
    /* 064 */           project_unsafeArrayData_0.pointTo(project_byteArray_0, 16, (int)project_byteArraySize_0);
    /* 065 */           UnsafeRow project_unsafeRow_0 = new UnsafeRow(2);
    /* 066 */           for (int z = 0; z < project_numElements_0; z++) {
    /* 067 */             long offset = project_structsOffset_0 + z * project_structSize_0;
    /* 068 */             project_unsafeArrayData_0.setLong(z, (offset << 32) + project_structSize_0);
    /* 069 */             project_unsafeRow_0.pointTo(project_byteArray_0, 16 + offset, project_structSize_0);
    /* 070 */             project_unsafeRow_0.setInt(0, project_keys_0.getInt(z));
    /* 071 */             project_unsafeRow_0.setInt(1, project_values_0.getInt(z));
    /* 072 */           }
    /* 073 */           project_value_0 = project_unsafeArrayData_0;
    /* 074 */         }
    ```
    ### Non-primitive types
    ```
    val df = Seq(Map("a" -> "foo", "b" -> null)).toDF("m")
    df.filter('m.isNotNull).select(map_entries('m)).debug
    ```
    Result:
    ```
    /* 042 */         boolean project_isNull_0 = false;
    /* 043 */
    /* 044 */         ArrayData project_value_0 = null;
    /* 045 */
    /* 046 */         final int project_numElements_0 = inputadapter_value_0.numElements();
    /* 047 */         final ArrayData project_keys_0 = inputadapter_value_0.keyArray();
    /* 048 */         final ArrayData project_values_0 = inputadapter_value_0.valueArray();
    /* 049 */
    /* 050 */         final Object[] project_internalRowArray_0 = new Object[project_numElements_0];
    /* 051 */         for (int z = 0; z < project_numElements_0; z++) {
    /* 052 */           project_internalRowArray_0[z] = new org.apache.spark.sql.catalyst.expressions.GenericInternalRow(new Object[]{project_keys_0.getUTF8String(z), project_values_0.getUTF8String(z)});
    /* 053 */         }
    /* 054 */         project_value_0 = new org.apache.spark.sql.catalyst.util.GenericArrayData(project_internalRowArray_0);
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AbsaOSS/spark feature/array-api-map_entries-to-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21236
    
----
commit 086e223e89ce0cf56145f4aa9a7aef5421a98810
Author: Marek Novotny <mn...@...>
Date:   2018-05-04T18:00:40Z

    [SPARK-23935][SQL] Adding map_entries function

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r186377811
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +118,162 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val structSize = ctx.freshName("structSize")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +    val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +
    +    val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
    +    val valueAssignmentChecked = if (childDataType.valueContainsNull) {
    +      s"""
    +         |if ($values.isNullAt(z)) {
    +         |  $unsafeRow.setNullAt(1);
    +         |} else {
    +         |  $valueAssignment
    +         |}
    +       """.stripMargin
    +    } else {
    +      valueAssignment
    +    }
    +
    +    s"""
    +       |final int $structSize = ${UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2};
    +       |final long $byteArraySize = $calculateArraySize($numElements, $longSize + $structSize);
    +       |final int $structsOffset = $calculateHeader($numElements) + $numElements * $longSize;
    --- End diff --
    
    We can move this into else-clause?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90887/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187001475
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,161 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    --- End diff --
    
    @ueshin Really good question. I'm eager to learn about the true purpose of the `DataType.defaultSize` function. Currently, it's used in this meaning at more places (e.g.`GenArrayData.genCodeToCreateArrayData` and `CodeGenerator.createUnsafeArray`.)
    
    What about using `Long.BYTES` from Java 8 instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90720/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r186371562
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +118,162 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val structSize = ctx.freshName("structSize")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +    val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +
    +    val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
    +    val valueAssignmentChecked = if (childDataType.valueContainsNull) {
    +      s"""
    +         |if ($values.isNullAt(z)) {
    +         |  $unsafeRow.setNullAt(1);
    +         |} else {
    +         |  $valueAssignment
    +         |}
    +       """.stripMargin
    +    } else {
    +      valueAssignment
    +    }
    +
    +    s"""
    +       |final int $structSize = ${UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2};
    --- End diff --
    
    We can calculate `structSize` beforehand and inline it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187689540
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,161 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    --- End diff --
    
    IMHO, `8` is the better choice since it is not related to an element size of `long`.
    To my best guess, it would be best to define a new constant.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90881/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r207247448
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala ---
    @@ -98,6 +98,9 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks {
             if (expected.isNaN) result.isNaN else expected == result
           case (result: Float, expected: Float) =>
             if (expected.isNaN) result.isNaN else expected == result
    +      case (result: UnsafeRow, expected: GenericInternalRow) =>
    --- End diff --
    
    Roger that, looks like Wenchen just did so. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90318/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90557/testReport)** for PR 21236 at commit [`56ff20a`](https://github.com/apache/spark/commit/56ff20ac977ca1a305e96a7582789e2e75e6718c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90557/testReport)** for PR 21236 at commit [`56ff20a`](https://github.com/apache/spark/commit/56ff20ac977ca1a305e96a7582789e2e75e6718c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90887/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90888/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90557/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187995110
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,161 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    --- End diff --
    
    Oh OK, I misunderstood the comments. Thanks guys!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187853879
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,162 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    +    val structSizeAsLong = structSize + "L"
    +    val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +    val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +
    +    val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
    +    val valueAssignmentChecked = if (childDataType.valueContainsNull) {
    +      s"""
    +         |if ($values.isNullAt(z)) {
    +         |  $unsafeRow.setNullAt(1);
    +         |} else {
    +         |  $valueAssignment
    +         |}
    +       """.stripMargin
    +    } else {
    +      valueAssignment
    +    }
    +
    +    s"""
    +       |final long $byteArraySize = $calculateArraySize($numElements, ${longSize + structSize});
    +       |if ($byteArraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
    +       |  ${genCodeForAnyElements(ctx, keys, values, arrayData, numElements)}
    +       |} else {
    +       |  final int $structsOffset = $calculateHeader($numElements) + $numElements * $longSize;
    +       |  final byte[] $data = new byte[(int)$byteArraySize];
    +       |  UnsafeArrayData $unsafeArrayData = new UnsafeArrayData();
    +       |  Platform.putLong($data, $baseOffset, $numElements);
    +       |  $unsafeArrayData.pointTo($data, $baseOffset, (int)$byteArraySize);
    +       |  UnsafeRow $unsafeRow = new UnsafeRow(2);
    +       |  for (int z = 0; z < $numElements; z++) {
    +       |    long offset = $structsOffset + z * $structSizeAsLong;
    +       |    $unsafeArrayData.setLong(z, (offset << 32) + $structSizeAsLong);
    +       |    $unsafeRow.pointTo($data, $baseOffset + offset, $structSize);
    +       |    $unsafeRow.set$keyTypeName(0, ${getKey(keys)});
    +       |    $valueAssignmentChecked
    +       |  }
    +       |  $arrayData = $unsafeArrayData;
    +       |}
    +     """.stripMargin
    +  }
    +
    +  private def genCodeForAnyElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val genericArrayClass = classOf[GenericArrayData].getName
    +    val rowClass = classOf[GenericInternalRow].getName
    +    val data = ctx.freshName("internalRowArray")
    +
    +    val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +    val getValueWithCheck = if (childDataType.valueContainsNull && isValuePrimitive) {
    +        s"$values.isNullAt(z) ? null : (Object)${getValue(values)}"
    +      } else {
    +        getValue(values)
    +      }
    --- End diff --
    
    nit: indent


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90596/testReport)** for PR 21236 at commit [`1bd0d5e`](https://github.com/apache/spark/commit/1bd0d5ea78908820140bd13c624a8e90f1e737fe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187852992
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,161 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    --- End diff --
    
    This is not for the element size of arrays. I agree with @kiszk to use `8`.
    Maybe we need to add a constant to represent the word size in `UnsafeRow` or somewhere in the future pr.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90367/testReport)** for PR 21236 at commit [`6aa90ef`](https://github.com/apache/spark/commit/6aa90efaa2c06bf0f0f150d7c39a143a96ba8a2d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187854341
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,162 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    +    val structSizeAsLong = structSize + "L"
    +    val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +    val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +
    +    val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
    +    val valueAssignmentChecked = if (childDataType.valueContainsNull) {
    +      s"""
    +         |if ($values.isNullAt(z)) {
    +         |  $unsafeRow.setNullAt(1);
    +         |} else {
    +         |  $valueAssignment
    +         |}
    +       """.stripMargin
    +    } else {
    +      valueAssignment
    +    }
    +
    +    s"""
    +       |final long $byteArraySize = $calculateArraySize($numElements, ${longSize + structSize});
    +       |if ($byteArraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
    +       |  ${genCodeForAnyElements(ctx, keys, values, arrayData, numElements)}
    --- End diff --
    
    Hmm, should we use this idiom for other array functions? WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90216/testReport)** for PR 21236 at commit [`086e223`](https://github.com/apache/spark/commit/086e223e89ce0cf56145f4aa9a7aef5421a98810).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    @ueshin, @kiszk Thank you for valuable comments! Do you have any more?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90596/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90318/testReport)** for PR 21236 at commit [`d05ad9b`](https://github.com/apache/spark/commit/d05ad9be40064c61b05d838b3ba96b02267d5ee1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90367/testReport)** for PR 21236 at commit [`6aa90ef`](https://github.com/apache/spark/commit/6aa90efaa2c06bf0f0f150d7c39a143a96ba8a2d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90881/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90221/testReport)** for PR 21236 at commit [`b9e2409`](https://github.com/apache/spark/commit/b9e240944e41674a25fd80e4c80b2f7c83af14ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r207078480
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala ---
    @@ -98,6 +98,9 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks {
             if (expected.isNaN) result.isNaN else expected == result
           case (result: Float, expected: Float) =>
             if (expected.isNaN) result.isNaN else expected == result
    +      case (result: UnsafeRow, expected: GenericInternalRow) =>
    --- End diff --
    
    @mn-mikke I was just looking over compiler warnings, and noticed it claims this case is never triggered. I think it's because it would always first match the (InternalRow, InternalRow) case above. Should it go before that then?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90216/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90720/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90720/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187996983
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,162 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    +    val structSizeAsLong = structSize + "L"
    +    val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +    val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +
    +    val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
    +    val valueAssignmentChecked = if (childDataType.valueContainsNull) {
    +      s"""
    +         |if ($values.isNullAt(z)) {
    +         |  $unsafeRow.setNullAt(1);
    +         |} else {
    +         |  $valueAssignment
    +         |}
    +       """.stripMargin
    +    } else {
    +      valueAssignment
    +    }
    +
    +    s"""
    +       |final long $byteArraySize = $calculateArraySize($numElements, ${longSize + structSize});
    +       |if ($byteArraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
    +       |  ${genCodeForAnyElements(ctx, keys, values, arrayData, numElements)}
    --- End diff --
    
    For now, I separated the logic that I can leverage for `map_from_entries` function. Moreover, I think it should be possible to replace `UnsafeArrayData.createUnsafeArray` with that logic, but will do it in a different PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    cc @ueshin @gatorsmile


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90880/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90596/testReport)** for PR 21236 at commit [`1bd0d5e`](https://github.com/apache/spark/commit/1bd0d5ea78908820140bd13c624a8e90f1e737fe).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187690481
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,161 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    +    val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +    val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType)
    +
    +    val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});"
    +    val valueAssignmentChecked = if (childDataType.valueContainsNull) {
    +      s"""
    +         |if ($values.isNullAt(z)) {
    +         |  $unsafeRow.setNullAt(1);
    +         |} else {
    +         |  $valueAssignment
    +         |}
    +       """.stripMargin
    +    } else {
    +      valueAssignment
    +    }
    +
    +    s"""
    +       |final long $byteArraySize = $calculateArraySize($numElements, ${longSize + structSize});
    +       |if ($byteArraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
    +       |  ${genCodeForAnyElements(ctx, keys, values, arrayData, numElements)}
    +       |} else {
    +       |  final int $structsOffset = $calculateHeader($numElements) + $numElements * $longSize;
    +       |  final byte[] $data = new byte[(int)$byteArraySize];
    +       |  UnsafeArrayData $unsafeArrayData = new UnsafeArrayData();
    +       |  Platform.putLong($data, $baseOffset, $numElements);
    +       |  $unsafeArrayData.pointTo($data, $baseOffset, (int)$byteArraySize);
    +       |  UnsafeRow $unsafeRow = new UnsafeRow(2);
    +       |  for (int z = 0; z < $numElements; z++) {
    +       |    long offset = $structsOffset + z * $structSize;
    --- End diff --
    
    nit: `$structSize` -> `${$structSize}L`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90216/testReport)** for PR 21236 at commit [`086e223`](https://github.com/apache/spark/commit/086e223e89ce0cf56145f4aa9a7aef5421a98810).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90880/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r186986884
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,161 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    --- End diff --
    
    I'm wondering it is right to use `longSize` here?
    I know the value is `8` and is same as the word size, but feel like the meaning is different?
    cc @gatorsmile @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Thanks! merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    I'd retrigger the build for just checking again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90881/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90888/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90318/testReport)** for PR 21236 at commit [`d05ad9b`](https://github.com/apache/spark/commit/d05ad9be40064c61b05d838b3ba96b02267d5ee1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90888/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90887/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90367/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r207148777
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala ---
    @@ -98,6 +98,9 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks {
             if (expected.isNaN) result.isNaN else expected == result
           case (result: Float, expected: Float) =>
             if (expected.isNaN) result.isNaN else expected == result
    +      case (result: UnsafeRow, expected: GenericInternalRow) =>
    --- End diff --
    
    Hi @srowen,
    ```(InternalRow, InternalRow)``` case was introduced later in [21838](https://github.com/apache/spark/pull/21838) and covers the logic of the case with ```UnsafeRow```. So we can just remove the unreachable piece of code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90221/testReport)** for PR 21236 at commit [`b9e2409`](https://github.com/apache/spark/commit/b9e240944e41674a25fd80e4c80b2f7c83af14ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90221/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21236


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21236#discussion_r187813418
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -118,6 +119,161 @@ case class MapValues(child: Expression)
       override def prettyName: String = "map_values"
     }
     
    +/**
    + * Returns an unordered array of all entries in the given map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(map) - Returns an unordered array of all entries in the given map.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(1, 'a', 2, 'b'));
    +       [(1,"a"),(2,"b")]
    +  """,
    +  since = "2.4.0")
    +case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
    +
    +  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
    +
    +  override def dataType: DataType = {
    +    ArrayType(
    +      StructType(
    +        StructField("key", childDataType.keyType, false) ::
    +        StructField("value", childDataType.valueType, childDataType.valueContainsNull) ::
    +        Nil),
    +      false)
    +  }
    +
    +  override protected def nullSafeEval(input: Any): Any = {
    +    val childMap = input.asInstanceOf[MapData]
    +    val keys = childMap.keyArray()
    +    val values = childMap.valueArray()
    +    val length = childMap.numElements()
    +    val resultData = new Array[AnyRef](length)
    +    var i = 0;
    +    while (i < length) {
    +      val key = keys.get(i, childDataType.keyType)
    +      val value = values.get(i, childDataType.valueType)
    +      val row = new GenericInternalRow(Array[Any](key, value))
    +      resultData.update(i, row)
    +      i += 1
    +    }
    +    new GenericArrayData(resultData)
    +  }
    +
    +  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val numElements = ctx.freshName("numElements")
    +      val keys = ctx.freshName("keys")
    +      val values = ctx.freshName("values")
    +      val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType)
    +      val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType)
    +      val code = if (isKeyPrimitive && isValuePrimitive) {
    +        genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements)
    +      } else {
    +        genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
    +      }
    +      s"""
    +         |final int $numElements = $c.numElements();
    +         |final ArrayData $keys = $c.keyArray();
    +         |final ArrayData $values = $c.valueArray();
    +         |$code
    +       """.stripMargin
    +    })
    +  }
    +
    +  private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z")
    +
    +  private def getValue(varName: String) = {
    +    CodeGenerator.getValue(varName, childDataType.valueType, "z")
    +  }
    +
    +  private def genCodeForPrimitiveElements(
    +      ctx: CodegenContext,
    +      keys: String,
    +      values: String,
    +      arrayData: String,
    +      numElements: String): String = {
    +    val byteArraySize = ctx.freshName("byteArraySize")
    +    val data = ctx.freshName("byteArray")
    +    val unsafeRow = ctx.freshName("unsafeRow")
    +    val unsafeArrayData = ctx.freshName("unsafeArrayData")
    +    val structsOffset = ctx.freshName("structsOffset")
    +    val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
    +    val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
    +
    +    val baseOffset = Platform.BYTE_ARRAY_OFFSET
    +    val longSize = LongType.defaultSize
    +    val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + longSize * 2
    --- End diff --
    
    @kiszk Thanks for your suggestion, but it seems to me that `LongType.defaultSize` could be used in this case. It seems that the purpose of `defaultSize` is not only the calculation of estimated data size in statistics. `GenerateUnsafeProjection.writeArrayToBuffer`, `InterpretedUnsafeProjection.getElementSize` and other parts utilize `defaultSize` in the same way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21236
  
    **[Test build #90880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90880/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org