You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2015/10/08 04:11:02 UTC

[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/9019

    [SPARK-10993] [SQL] Inital code generated encoder for product types

    This PR is a first cut at code generating an encoder that takes a Scala `Product` type and converts it directly into the tungsten binary format.  This is done through the addition of a new set of expression that can be used to invoke methods on raw JVM objects, extracting fields and converting the result into the required format.  These can then be used directly in an `UnsafeProjection` allowing us to leverage the existing encoding logic.
    
    According to some simple benchmarks, this can significantly speed up conversion (~4x).  However, replacing CatalystConverters is deferred to a later PR to keep this PR at a reasonable size.
    
    ```scala
    case class SomeInts(a: Int, b: Int, c: Int, d: Int, e: Int)
    
    val data = SomeInts(1, 2, 3, 4, 5)
    val encoder = ProductEncoder[SomeInts]
    val converter = CatalystTypeConverters.createToCatalystConverter(ScalaReflection.schemaFor[SomeInts].dataType)
    
    
    (1 to 5).foreach {iter =>
      benchmark(s"converter $iter") {
        var i = 100000000
        while (i > 0) {
          val res = converter(data).asInstanceOf[InternalRow]
          assert(res.getInt(0) == 1)
          assert(res.getInt(1) == 2)
          i -= 1
        }
      }
    
      benchmark(s"encoder $iter") {
        var i = 100000000
        while (i > 0) {
          val res = encoder.toRow(data)
          assert(res.getInt(0) == 1)
          assert(res.getInt(1) == 2)
          i -= 1
        }
      }
    }
    ```
    
    Results:
    ```
    [info] converter 1: 7170ms
    [info] encoder 1: 1888ms
    [info] converter 2: 6763ms
    [info] encoder 2: 1824ms
    [info] converter 3: 6912ms
    [info] encoder 3: 1802ms
    [info] converter 4: 7131ms
    [info] encoder 4: 1798ms
    [info] converter 5: 7350ms
    [info] encoder 5: 1912ms
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark productEncoder

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9019.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9019
    
----
commit 768055df45d9a80ff39230352c3349d948444422
Author: Michael Armbrust <mi...@databricks.com>
Date:   2015-10-08T01:58:19Z

    [SPARK-10993] [SQL] Inital code generated encoder for product types

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9019


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146397983
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41565813
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -75,6 +76,242 @@ trait ScalaReflection {
        */
       private def localTypeOf[T: TypeTag]: `Type` = typeTag[T].in(mirror).tpe
     
    +  /**
    +   * Returns the Spark SQL DataType for a given scala type.  Where this is not an exact mapping
    +   * to a native type, an ObjectType is returned. Special handling is also used for Arrays including
    +   * those that hold primitive types.
    +   */
    +  def dataTypeFor(tpe: `Type`): DataType = tpe match {
    --- End diff --
    
    What is the difference between this method and `def schemaFor(tpe: `Type`): Schema`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146397989
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43376/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146393923
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41566709
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -75,6 +76,242 @@ trait ScalaReflection {
        */
       private def localTypeOf[T: TypeTag]: `Type` = typeTag[T].in(mirror).tpe
     
    +  /**
    +   * Returns the Spark SQL DataType for a given scala type.  Where this is not an exact mapping
    +   * to a native type, an ObjectType is returned. Special handling is also used for Arrays including
    +   * those that hold primitive types.
    +   */
    +  def dataTypeFor(tpe: `Type`): DataType = tpe match {
    +    case t if t <:< definitions.IntTpe => IntegerType
    +    case t if t <:< definitions.LongTpe => LongType
    +    case t if t <:< definitions.DoubleTpe => DoubleType
    +    case t if t <:< definitions.FloatTpe => FloatType
    +    case t if t <:< definitions.ShortTpe => ShortType
    +    case t if t <:< definitions.ByteTpe => ByteType
    +    case t if t <:< definitions.BooleanTpe => BooleanType
    +    case t if t <:< localTypeOf[Array[Byte]] => BinaryType
    +    case _ =>
    +      val className: String = tpe.erasure.typeSymbol.asClass.fullName
    +      className match {
    +        case "scala.Array" =>
    +          val TypeRef(_, _, Seq(arrayType)) = tpe
    +          val cls = arrayType match {
    +            case t if t <:< definitions.IntTpe => classOf[Array[Int]]
    +            case t if t <:< definitions.LongTpe => classOf[Array[Long]]
    +            case t if t <:< definitions.DoubleTpe => classOf[Array[Double]]
    +            case t if t <:< definitions.FloatTpe => classOf[Array[Float]]
    +            case t if t <:< definitions.ShortTpe => classOf[Array[Short]]
    +            case t if t <:< definitions.ByteTpe => classOf[Array[Byte]]
    +            case t if t <:< definitions.BooleanTpe => classOf[Array[Boolean]]
    +            case other =>
    +              // There is probably a better way to do this, but I couldn't find it...
    +              val elementType = dataTypeFor(other).asInstanceOf[ObjectType].cls
    +              java.lang.reflect.Array.newInstance(elementType, 1).getClass
    +
    +          }
    +          ObjectType(cls)
    +        case other => ObjectType(Utils.classForName(className))
    +      }
    +  }
    +
    +  /** Returns expressions for extracting all the fields from the given type. */
    +  def extractorsFor[T : TypeTag](inputObject: Expression): Seq[Expression] = {
    +    ScalaReflectionLock.synchronized {
    +      extractorFor(inputObject, typeTag[T].tpe).asInstanceOf[CreateStruct].children
    +    }
    +  }
    +
    +  /** Helper for extracting internal fields from a case class. */
    +  protected def extractorFor(
    +      inputObject: Expression,
    +      tpe: `Type`): Expression = ScalaReflectionLock.synchronized {
    +    if (!inputObject.dataType.isInstanceOf[ObjectType]) {
    +      inputObject
    +    } else {
    +      tpe match {
    +        case t if t <:< localTypeOf[Option[_]] =>
    +          val TypeRef(_, _, Seq(optType)) = t
    +          optType match {
    +            // For primitive types we must manually unbox the value of the object.
    +            case t if t <:< definitions.IntTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Integer]), inputObject),
    +                "intValue",
    +                IntegerType)
    +            case t if t <:< definitions.LongTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Long]), inputObject),
    +                "longValue",
    +                LongType)
    +            case t if t <:< definitions.DoubleTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Double]), inputObject),
    +                "doubleValue",
    +                DoubleType)
    +            case t if t <:< definitions.FloatTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Float]), inputObject),
    +                "floatValue",
    +                FloatType)
    +            case t if t <:< definitions.ShortTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Short]), inputObject),
    +                "shortValue",
    +                ShortType)
    +            case t if t <:< definitions.ByteTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Byte]), inputObject),
    +                "byteValue",
    +                ByteType)
    +            case t if t <:< definitions.BooleanTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Boolean]), inputObject),
    +                "booleanValue",
    +                BooleanType)
    --- End diff --
    
    Do we need to handle String, Date, Timestamp, and Array[Byte] at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41566743
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -75,6 +76,242 @@ trait ScalaReflection {
        */
       private def localTypeOf[T: TypeTag]: `Type` = typeTag[T].in(mirror).tpe
     
    +  /**
    +   * Returns the Spark SQL DataType for a given scala type.  Where this is not an exact mapping
    +   * to a native type, an ObjectType is returned. Special handling is also used for Arrays including
    +   * those that hold primitive types.
    +   */
    +  def dataTypeFor(tpe: `Type`): DataType = tpe match {
    +    case t if t <:< definitions.IntTpe => IntegerType
    +    case t if t <:< definitions.LongTpe => LongType
    +    case t if t <:< definitions.DoubleTpe => DoubleType
    +    case t if t <:< definitions.FloatTpe => FloatType
    +    case t if t <:< definitions.ShortTpe => ShortType
    +    case t if t <:< definitions.ByteTpe => ByteType
    +    case t if t <:< definitions.BooleanTpe => BooleanType
    +    case t if t <:< localTypeOf[Array[Byte]] => BinaryType
    +    case _ =>
    +      val className: String = tpe.erasure.typeSymbol.asClass.fullName
    +      className match {
    +        case "scala.Array" =>
    +          val TypeRef(_, _, Seq(arrayType)) = tpe
    +          val cls = arrayType match {
    +            case t if t <:< definitions.IntTpe => classOf[Array[Int]]
    +            case t if t <:< definitions.LongTpe => classOf[Array[Long]]
    +            case t if t <:< definitions.DoubleTpe => classOf[Array[Double]]
    +            case t if t <:< definitions.FloatTpe => classOf[Array[Float]]
    +            case t if t <:< definitions.ShortTpe => classOf[Array[Short]]
    +            case t if t <:< definitions.ByteTpe => classOf[Array[Byte]]
    +            case t if t <:< definitions.BooleanTpe => classOf[Array[Boolean]]
    +            case other =>
    +              // There is probably a better way to do this, but I couldn't find it...
    +              val elementType = dataTypeFor(other).asInstanceOf[ObjectType].cls
    +              java.lang.reflect.Array.newInstance(elementType, 1).getClass
    +
    +          }
    +          ObjectType(cls)
    +        case other => ObjectType(Utils.classForName(className))
    +      }
    +  }
    +
    +  /** Returns expressions for extracting all the fields from the given type. */
    +  def extractorsFor[T : TypeTag](inputObject: Expression): Seq[Expression] = {
    +    ScalaReflectionLock.synchronized {
    +      extractorFor(inputObject, typeTag[T].tpe).asInstanceOf[CreateStruct].children
    +    }
    +  }
    +
    +  /** Helper for extracting internal fields from a case class. */
    +  protected def extractorFor(
    +      inputObject: Expression,
    +      tpe: `Type`): Expression = ScalaReflectionLock.synchronized {
    +    if (!inputObject.dataType.isInstanceOf[ObjectType]) {
    +      inputObject
    +    } else {
    +      tpe match {
    +        case t if t <:< localTypeOf[Option[_]] =>
    +          val TypeRef(_, _, Seq(optType)) = t
    +          optType match {
    +            // For primitive types we must manually unbox the value of the object.
    +            case t if t <:< definitions.IntTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Integer]), inputObject),
    +                "intValue",
    +                IntegerType)
    +            case t if t <:< definitions.LongTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Long]), inputObject),
    +                "longValue",
    +                LongType)
    +            case t if t <:< definitions.DoubleTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Double]), inputObject),
    +                "doubleValue",
    +                DoubleType)
    +            case t if t <:< definitions.FloatTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Float]), inputObject),
    +                "floatValue",
    +                FloatType)
    +            case t if t <:< definitions.ShortTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Short]), inputObject),
    +                "shortValue",
    +                ShortType)
    +            case t if t <:< definitions.ByteTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Byte]), inputObject),
    +                "byteValue",
    +                ByteType)
    +            case t if t <:< definitions.BooleanTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Boolean]), inputObject),
    +                "booleanValue",
    +                BooleanType)
    --- End diff --
    
    nvm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41925843
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    +case class StaticInvoke(
    +    staticObject: Any,
    +    dataType: DataType,
    +    functionName: String,
    +    arguments: Seq[Expression] = Nil,
    +    propagateNull: Boolean = true) extends Expression {
    +
    +  val objectName = staticObject match {
    +    case c: Class[_] => c.getName
    +    case other => other.getClass.getName.stripSuffix("$")
    +  }
    +  override def nullable: Boolean = true
    +  override def children: Seq[Expression] = Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    if (propagateNull) {
    +      val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +        s"${ev.isNull} = ${ev.value} == null;"
    +      } else {
    +        ""
    +      }
    +
    +      val argsNonNull = s"!(${argGen.map(_.isNull).mkString(" || ")})"
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        boolean ${ev.isNull} = true;
    +        $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
    +
    +        if ($argsNonNull) {
    +          ${ev.value} = $objectName.$functionName($argString);
    +          $objNullCheck
    +        }
    +       """
    +    } else {
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        final boolean ${ev.isNull} = ${ev.value} == null;
    +        $javaType ${ev.value} = $objectName.$functionName($argString);
    +      """
    +    }
    +  }
    +}
    +
    +/**
    + * Calls the specified function on an object, optionally passing arguments.  If the `targetObject`
    + * expression evaluates to null then null will be returned.
    + *
    + * @param targetObject An expression that will return the object to call the method on.
    + * @param functionName The name of the method to call.
    + * @param dataType The expected return type of the function.
    + * @param arguments An optional list of expressions, whos evaluation will be passed to the function.
    + */
    +case class Invoke(
    +    targetObject: Expression,
    +    functionName: String,
    +    dataType: DataType,
    +    arguments: Seq[Expression] = Nil) extends Expression {
    +
    +  override def nullable: Boolean = true
    +  override def children: Seq[Expression] = targetObject :: Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val obj = targetObject.gen(ctx)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    // If the function can return null, we do an extra check to make sure our null bit is still set
    +    // correctly.
    +    val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +      s"${ev.isNull} = ${ev.value} == null;"
    +    } else {
    +      ""
    +    }
    +
    +    s"""
    +      ${obj.code}
    +      ${argGen.map(_.code).mkString("\n")}
    +
    +      boolean ${ev.isNull} = ${obj.value} == null;
    +      $javaType ${ev.value} =
    +        ${ev.isNull} ?
    +        ${ctx.defaultValue(dataType)} : ($javaType) ${obj.value}.$functionName($argString);
    +      $objNullCheck
    +    """
    +  }
    +}
    +
    +/**
    + * Constructs a new instance of the given class, using the result of evaluating the specified
    + * expressions as arguments.
    + *
    + * @param cls The class to construct.
    + * @param arguments A list of expression to use as arguments to the constructor.
    + * @param propagateNull When true, if any of the arguments is null, then null will be returned
    + *                      instead of trying to construct the object.
    + * @param dataType The type of object being constructed, as a Spark SQL datatype.  This allows you
    + *                 to manually specify the type when the object in question is a valid internal
    + *                 representation (i.e. ArrayData) instead of an object.
    + */
    +case class NewInstance(
    +    cls: Class[_],
    +    arguments: Seq[Expression],
    +    propagateNull: Boolean = true,
    +    dataType: DataType) extends Expression {
    +  private val className = cls.getName
    +
    +  override def nullable: Boolean = propagateNull
    +
    +  override def children: Seq[Expression] = arguments
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    if (propagateNull) {
    +      val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +        s"${ev.isNull} = ${ev.value} == null;"
    +      } else {
    +        ""
    +      }
    +
    +      val argsNonNull = s"!(${argGen.map(_.isNull).mkString(" || ")})"
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        boolean ${ev.isNull} = true;
    +        $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
    +
    +        if ($argsNonNull) {
    +          ${ev.value} = new $className($argString);
    +          ${ev.isNull} = false;
    +        }
    +       """
    +    } else {
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        final boolean ${ev.isNull} = ${ev.value} == null;
    +        $javaType ${ev.value} = new $className($argString);
    +      """
    +    }
    +  }
    +}
    +
    +/**
    + * Given an expression that returns on object of type `Option[_]`, this expression unwraps the
    + * option into the specified Spark SQL datatype.  In the case of `None`, the nullbit is set instead.
    + *
    + * @param dataType The expected unwrapped option type.
    + * @param child An expression that returns an `Option`
    + */
    +case class UnwrapOption(
    +    dataType: DataType,
    +    child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def nullable: Boolean = true
    +
    +  override def children: Seq[Expression] = Nil
    +
    +  override def inputTypes: Seq[AbstractDataType] = ObjectType :: Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val inputObject = child.gen(ctx)
    +
    +    s"""
    +      ${inputObject.code}
    +
    +      boolean ${ev.isNull} = ${inputObject.value} == null || ${inputObject.value}.isEmpty();
    +      $javaType ${ev.value} =
    +        ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($javaType)${inputObject.value}.get();
    +    """
    +  }
    +}
    +
    +case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends Expression {
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String =
    +    throw new UnsupportedOperationException("Only calling gen() is supported.")
    +
    +  override def children: Seq[Expression] = Nil
    +  override def gen(ctx: CodeGenContext): GeneratedExpressionCode =
    +    GeneratedExpressionCode(code = "", value = value, isNull = isNull)
    +
    +  override def nullable: Boolean = false
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +}
    +
    +/**
    + * Applies the given expression to every element of a collection of items, returning the result
    + * as an ArrayType.  This is similar to a typical map operation, but where the lambda function
    + * is expressed using catalyst expressions.
    + *
    + * The following collection ObjectTypes are currently supported: Seq, Array
    + *
    + * @param function A function that returns an expression, given an attribute that can be used
    + *                 to access the current value.  This is does as a lambda function so that
    + *                 a unique attribute reference can be provided for each expression (thus allowing
    + *                 us to nest multiple MapObject calls).
    + * @param inputData An expression that when evaluted returns a collection object.
    + * @param elementType The type of element in the collection, expressed as a DataType.
    + */
    +case class MapObjects(
    +    function: AttributeReference => Expression,
    +    inputData: Expression,
    +    elementType: DataType) extends Expression {
    +
    +  private val loopAttribute = AttributeReference("loopVar", elementType)()
    +  private val completeFunction = function(loopAttribute)
    +
    +  private val (lengthFunction, itemAccessor) = inputData.dataType match {
    +    case ObjectType(cls) if cls.isAssignableFrom(classOf[Seq[_]]) =>
    +      (".size()", (i: String) => s".apply($i)")
    +    case ObjectType(cls) if cls.isArray =>
    +      (".length", (i: String) => s"[$i]")
    +  }
    +
    +  override def nullable: Boolean = true
    +
    +  override def children: Seq[Expression] = completeFunction :: inputData :: Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported")
    +
    +  override def dataType: DataType = ArrayType(completeFunction.dataType)
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val elementJavaType = ctx.javaType(elementType)
    +    val genInputData = inputData.gen(ctx)
    +
    +    // Variables to hold the element that is currently being processed.
    +    val loopValue = ctx.freshName("loopValue")
    +    val loopIsNull = ctx.freshName("loopIsNull")
    +
    +    val loopVariable = LambdaVariable(loopValue, loopIsNull, elementType)
    +    val boundFunction = completeFunction transform {
    +      case a: AttributeReference if a == loopAttribute => loopVariable
    +    }
    --- End diff --
    
    Exactly, this is how we link whatever code the inner expression generates into the loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146685224
  
    Overall looks good! Left a few clarification questions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41916319
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -75,6 +76,242 @@ trait ScalaReflection {
        */
       private def localTypeOf[T: TypeTag]: `Type` = typeTag[T].in(mirror).tpe
     
    +  /**
    +   * Returns the Spark SQL DataType for a given scala type.  Where this is not an exact mapping
    +   * to a native type, an ObjectType is returned. Special handling is also used for Arrays including
    +   * those that hold primitive types.
    +   */
    +  def dataTypeFor(tpe: `Type`): DataType = tpe match {
    +    case t if t <:< definitions.IntTpe => IntegerType
    +    case t if t <:< definitions.LongTpe => LongType
    +    case t if t <:< definitions.DoubleTpe => DoubleType
    +    case t if t <:< definitions.FloatTpe => FloatType
    +    case t if t <:< definitions.ShortTpe => ShortType
    +    case t if t <:< definitions.ByteTpe => ByteType
    +    case t if t <:< definitions.BooleanTpe => BooleanType
    +    case t if t <:< localTypeOf[Array[Byte]] => BinaryType
    +    case _ =>
    +      val className: String = tpe.erasure.typeSymbol.asClass.fullName
    +      className match {
    +        case "scala.Array" =>
    +          val TypeRef(_, _, Seq(arrayType)) = tpe
    +          val cls = arrayType match {
    +            case t if t <:< definitions.IntTpe => classOf[Array[Int]]
    +            case t if t <:< definitions.LongTpe => classOf[Array[Long]]
    +            case t if t <:< definitions.DoubleTpe => classOf[Array[Double]]
    +            case t if t <:< definitions.FloatTpe => classOf[Array[Float]]
    +            case t if t <:< definitions.ShortTpe => classOf[Array[Short]]
    +            case t if t <:< definitions.ByteTpe => classOf[Array[Byte]]
    +            case t if t <:< definitions.BooleanTpe => classOf[Array[Boolean]]
    +            case other =>
    +              // There is probably a better way to do this, but I couldn't find it...
    +              val elementType = dataTypeFor(other).asInstanceOf[ObjectType].cls
    +              java.lang.reflect.Array.newInstance(elementType, 1).getClass
    +
    +          }
    +          ObjectType(cls)
    +        case other => ObjectType(Utils.classForName(className))
    +      }
    +  }
    +
    +  /** Returns expressions for extracting all the fields from the given type. */
    +  def extractorsFor[T : TypeTag](inputObject: Expression): Seq[Expression] = {
    +    ScalaReflectionLock.synchronized {
    +      extractorFor(inputObject, typeTag[T].tpe).asInstanceOf[CreateStruct].children
    +    }
    +  }
    +
    +  /** Helper for extracting internal fields from a case class. */
    +  protected def extractorFor(
    +      inputObject: Expression,
    +      tpe: `Type`): Expression = ScalaReflectionLock.synchronized {
    +    if (!inputObject.dataType.isInstanceOf[ObjectType]) {
    +      inputObject
    +    } else {
    +      tpe match {
    +        case t if t <:< localTypeOf[Option[_]] =>
    +          val TypeRef(_, _, Seq(optType)) = t
    +          optType match {
    +            // For primitive types we must manually unbox the value of the object.
    +            case t if t <:< definitions.IntTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Integer]), inputObject),
    +                "intValue",
    +                IntegerType)
    +            case t if t <:< definitions.LongTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Long]), inputObject),
    +                "longValue",
    +                LongType)
    +            case t if t <:< definitions.DoubleTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Double]), inputObject),
    +                "doubleValue",
    +                DoubleType)
    +            case t if t <:< definitions.FloatTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Float]), inputObject),
    +                "floatValue",
    +                FloatType)
    +            case t if t <:< definitions.ShortTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Short]), inputObject),
    +                "shortValue",
    +                ShortType)
    +            case t if t <:< definitions.ByteTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Byte]), inputObject),
    +                "byteValue",
    +                ByteType)
    +            case t if t <:< definitions.BooleanTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Boolean]), inputObject),
    +                "booleanValue",
    +                BooleanType)
    +
    +            // For non-primitives, we can just extract the object from the Option and then recurse.
    +            case other =>
    +              val className: String = optType.erasure.typeSymbol.asClass.fullName
    +              val classObj = Utils.classForName(className)
    +              val optionObjectType = ObjectType(classObj)
    +
    +              val unwrapped = UnwrapOption(optionObjectType, inputObject)
    +              expressions.If(
    +                IsNull(unwrapped),
    +                expressions.Literal.create(null, schemaFor(optType).dataType),
    +                extractorFor(unwrapped, optType))
    +          }
    +
    +        case t if t <:< localTypeOf[Product] =>
    +          val formalTypeArgs = t.typeSymbol.asClass.typeParams
    +          val TypeRef(_, _, actualTypeArgs) = t
    +          val constructorSymbol = t.member(nme.CONSTRUCTOR)
    +          val params = if (constructorSymbol.isMethod) {
    +            constructorSymbol.asMethod.paramss
    +          } else {
    +            // Find the primary constructor, and use its parameter ordering.
    +            val primaryConstructorSymbol: Option[Symbol] =
    +              constructorSymbol.asTerm.alternatives.find(s =>
    +                s.isMethod && s.asMethod.isPrimaryConstructor)
    +
    +            if (primaryConstructorSymbol.isEmpty) {
    +              sys.error("Internal SQL error: Product object did not have a primary constructor.")
    +            } else {
    +              primaryConstructorSymbol.get.asMethod.paramss
    +            }
    +          }
    +
    +          CreateStruct(params.head.map { p =>
    --- End diff --
    
    Haha, this was copied from code that I coped from a stack overflow article a long time ago.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41583304
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    +case class StaticInvoke(
    +    staticObject: Any,
    +    dataType: DataType,
    +    functionName: String,
    +    arguments: Seq[Expression] = Nil,
    +    propagateNull: Boolean = true) extends Expression {
    --- End diff --
    
    maybe `extend LeafExpression`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146397969
  
      [Test build #43376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43376/console) for   PR 9019 at commit [`768055d`](https://github.com/apache/spark/commit/768055df45d9a80ff39230352c3349d948444422).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasWeightCol(Params):`
      * `class IsotonicRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,`
      * `class IsotonicRegressionModel(JavaModel):`
      * `trait Encoder[T] `
      * `case class ClassEncoder[T](`
      * `case class Average(child: Expression) extends DeclarativeAggregate `
      * `case class Count(child: Expression) extends DeclarativeAggregate `
      * `case class First(child: Expression) extends DeclarativeAggregate `
      * `case class Last(child: Expression) extends DeclarativeAggregate `
      * `case class Max(child: Expression) extends DeclarativeAggregate `
      * `case class Min(child: Expression) extends DeclarativeAggregate `
      * `abstract class StddevAgg(child: Expression) extends DeclarativeAggregate `
      * `case class Sum(child: Expression) extends DeclarativeAggregate `
      * `case class StaticInvoke(`
      * `case class Invoke(`
      * `case class NewInstance(`
      * `case class UnwrapOption(`
      * `case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends Expression `
      * `case class MapObjects(`
      * `case class RoundRobinPartitioning(numPartitions: Int) extends Partitioning `
      * `case class Coalesce(numPartitions: Int, child: SparkPlan) extends UnaryNode `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41567674
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -75,6 +76,242 @@ trait ScalaReflection {
        */
       private def localTypeOf[T: TypeTag]: `Type` = typeTag[T].in(mirror).tpe
     
    +  /**
    +   * Returns the Spark SQL DataType for a given scala type.  Where this is not an exact mapping
    +   * to a native type, an ObjectType is returned. Special handling is also used for Arrays including
    +   * those that hold primitive types.
    +   */
    +  def dataTypeFor(tpe: `Type`): DataType = tpe match {
    +    case t if t <:< definitions.IntTpe => IntegerType
    +    case t if t <:< definitions.LongTpe => LongType
    +    case t if t <:< definitions.DoubleTpe => DoubleType
    +    case t if t <:< definitions.FloatTpe => FloatType
    +    case t if t <:< definitions.ShortTpe => ShortType
    +    case t if t <:< definitions.ByteTpe => ByteType
    +    case t if t <:< definitions.BooleanTpe => BooleanType
    +    case t if t <:< localTypeOf[Array[Byte]] => BinaryType
    +    case _ =>
    +      val className: String = tpe.erasure.typeSymbol.asClass.fullName
    +      className match {
    +        case "scala.Array" =>
    +          val TypeRef(_, _, Seq(arrayType)) = tpe
    +          val cls = arrayType match {
    +            case t if t <:< definitions.IntTpe => classOf[Array[Int]]
    +            case t if t <:< definitions.LongTpe => classOf[Array[Long]]
    +            case t if t <:< definitions.DoubleTpe => classOf[Array[Double]]
    +            case t if t <:< definitions.FloatTpe => classOf[Array[Float]]
    +            case t if t <:< definitions.ShortTpe => classOf[Array[Short]]
    +            case t if t <:< definitions.ByteTpe => classOf[Array[Byte]]
    +            case t if t <:< definitions.BooleanTpe => classOf[Array[Boolean]]
    +            case other =>
    +              // There is probably a better way to do this, but I couldn't find it...
    +              val elementType = dataTypeFor(other).asInstanceOf[ObjectType].cls
    +              java.lang.reflect.Array.newInstance(elementType, 1).getClass
    +
    +          }
    +          ObjectType(cls)
    +        case other => ObjectType(Utils.classForName(className))
    +      }
    +  }
    +
    +  /** Returns expressions for extracting all the fields from the given type. */
    +  def extractorsFor[T : TypeTag](inputObject: Expression): Seq[Expression] = {
    +    ScalaReflectionLock.synchronized {
    +      extractorFor(inputObject, typeTag[T].tpe).asInstanceOf[CreateStruct].children
    +    }
    +  }
    +
    +  /** Helper for extracting internal fields from a case class. */
    +  protected def extractorFor(
    +      inputObject: Expression,
    +      tpe: `Type`): Expression = ScalaReflectionLock.synchronized {
    +    if (!inputObject.dataType.isInstanceOf[ObjectType]) {
    +      inputObject
    +    } else {
    +      tpe match {
    +        case t if t <:< localTypeOf[Option[_]] =>
    +          val TypeRef(_, _, Seq(optType)) = t
    +          optType match {
    +            // For primitive types we must manually unbox the value of the object.
    +            case t if t <:< definitions.IntTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Integer]), inputObject),
    +                "intValue",
    +                IntegerType)
    +            case t if t <:< definitions.LongTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Long]), inputObject),
    +                "longValue",
    +                LongType)
    +            case t if t <:< definitions.DoubleTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Double]), inputObject),
    +                "doubleValue",
    +                DoubleType)
    +            case t if t <:< definitions.FloatTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Float]), inputObject),
    +                "floatValue",
    +                FloatType)
    +            case t if t <:< definitions.ShortTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Short]), inputObject),
    +                "shortValue",
    +                ShortType)
    +            case t if t <:< definitions.ByteTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Byte]), inputObject),
    +                "byteValue",
    +                ByteType)
    +            case t if t <:< definitions.BooleanTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Boolean]), inputObject),
    +                "booleanValue",
    +                BooleanType)
    +
    +            // For non-primitives, we can just extract the object from the Option and then recurse.
    +            case other =>
    +              val className: String = optType.erasure.typeSymbol.asClass.fullName
    +              val classObj = Utils.classForName(className)
    +              val optionObjectType = ObjectType(classObj)
    +
    +              val unwrapped = UnwrapOption(optionObjectType, inputObject)
    +              expressions.If(
    +                IsNull(unwrapped),
    +                expressions.Literal.create(null, schemaFor(optType).dataType),
    +                extractorFor(unwrapped, optType))
    +          }
    +
    +        case t if t <:< localTypeOf[Product] =>
    +          val formalTypeArgs = t.typeSymbol.asClass.typeParams
    +          val TypeRef(_, _, actualTypeArgs) = t
    +          val constructorSymbol = t.member(nme.CONSTRUCTOR)
    +          val params = if (constructorSymbol.isMethod) {
    +            constructorSymbol.asMethod.paramss
    +          } else {
    +            // Find the primary constructor, and use its parameter ordering.
    +            val primaryConstructorSymbol: Option[Symbol] =
    +              constructorSymbol.asTerm.alternatives.find(s =>
    +                s.isMethod && s.asMethod.isPrimaryConstructor)
    +
    +            if (primaryConstructorSymbol.isEmpty) {
    +              sys.error("Internal SQL error: Product object did not have a primary constructor.")
    +            } else {
    +              primaryConstructorSymbol.get.asMethod.paramss
    +            }
    +          }
    +
    +          CreateStruct(params.head.map { p =>
    --- End diff --
    
    Maybe explain why we need to use `params.head`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146404765
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146404752
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41568528
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -75,6 +76,242 @@ trait ScalaReflection {
        */
       private def localTypeOf[T: TypeTag]: `Type` = typeTag[T].in(mirror).tpe
     
    +  /**
    +   * Returns the Spark SQL DataType for a given scala type.  Where this is not an exact mapping
    +   * to a native type, an ObjectType is returned. Special handling is also used for Arrays including
    +   * those that hold primitive types.
    +   */
    +  def dataTypeFor(tpe: `Type`): DataType = tpe match {
    +    case t if t <:< definitions.IntTpe => IntegerType
    +    case t if t <:< definitions.LongTpe => LongType
    +    case t if t <:< definitions.DoubleTpe => DoubleType
    +    case t if t <:< definitions.FloatTpe => FloatType
    +    case t if t <:< definitions.ShortTpe => ShortType
    +    case t if t <:< definitions.ByteTpe => ByteType
    +    case t if t <:< definitions.BooleanTpe => BooleanType
    +    case t if t <:< localTypeOf[Array[Byte]] => BinaryType
    +    case _ =>
    +      val className: String = tpe.erasure.typeSymbol.asClass.fullName
    +      className match {
    +        case "scala.Array" =>
    +          val TypeRef(_, _, Seq(arrayType)) = tpe
    +          val cls = arrayType match {
    +            case t if t <:< definitions.IntTpe => classOf[Array[Int]]
    +            case t if t <:< definitions.LongTpe => classOf[Array[Long]]
    +            case t if t <:< definitions.DoubleTpe => classOf[Array[Double]]
    +            case t if t <:< definitions.FloatTpe => classOf[Array[Float]]
    +            case t if t <:< definitions.ShortTpe => classOf[Array[Short]]
    +            case t if t <:< definitions.ByteTpe => classOf[Array[Byte]]
    +            case t if t <:< definitions.BooleanTpe => classOf[Array[Boolean]]
    +            case other =>
    +              // There is probably a better way to do this, but I couldn't find it...
    +              val elementType = dataTypeFor(other).asInstanceOf[ObjectType].cls
    +              java.lang.reflect.Array.newInstance(elementType, 1).getClass
    +
    +          }
    +          ObjectType(cls)
    +        case other => ObjectType(Utils.classForName(className))
    +      }
    +  }
    +
    +  /** Returns expressions for extracting all the fields from the given type. */
    +  def extractorsFor[T : TypeTag](inputObject: Expression): Seq[Expression] = {
    +    ScalaReflectionLock.synchronized {
    +      extractorFor(inputObject, typeTag[T].tpe).asInstanceOf[CreateStruct].children
    +    }
    +  }
    +
    +  /** Helper for extracting internal fields from a case class. */
    +  protected def extractorFor(
    +      inputObject: Expression,
    +      tpe: `Type`): Expression = ScalaReflectionLock.synchronized {
    +    if (!inputObject.dataType.isInstanceOf[ObjectType]) {
    +      inputObject
    +    } else {
    +      tpe match {
    +        case t if t <:< localTypeOf[Option[_]] =>
    +          val TypeRef(_, _, Seq(optType)) = t
    +          optType match {
    +            // For primitive types we must manually unbox the value of the object.
    +            case t if t <:< definitions.IntTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Integer]), inputObject),
    +                "intValue",
    +                IntegerType)
    +            case t if t <:< definitions.LongTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Long]), inputObject),
    +                "longValue",
    +                LongType)
    +            case t if t <:< definitions.DoubleTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Double]), inputObject),
    +                "doubleValue",
    +                DoubleType)
    +            case t if t <:< definitions.FloatTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Float]), inputObject),
    +                "floatValue",
    +                FloatType)
    +            case t if t <:< definitions.ShortTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Short]), inputObject),
    +                "shortValue",
    +                ShortType)
    +            case t if t <:< definitions.ByteTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Byte]), inputObject),
    +                "byteValue",
    +                ByteType)
    +            case t if t <:< definitions.BooleanTpe =>
    +              Invoke(
    +                UnwrapOption(ObjectType(classOf[java.lang.Boolean]), inputObject),
    +                "booleanValue",
    +                BooleanType)
    +
    +            // For non-primitives, we can just extract the object from the Option and then recurse.
    +            case other =>
    +              val className: String = optType.erasure.typeSymbol.asClass.fullName
    +              val classObj = Utils.classForName(className)
    +              val optionObjectType = ObjectType(classObj)
    +
    +              val unwrapped = UnwrapOption(optionObjectType, inputObject)
    +              expressions.If(
    +                IsNull(unwrapped),
    +                expressions.Literal.create(null, schemaFor(optType).dataType),
    +                extractorFor(unwrapped, optType))
    +          }
    +
    +        case t if t <:< localTypeOf[Product] =>
    +          val formalTypeArgs = t.typeSymbol.asClass.typeParams
    +          val TypeRef(_, _, actualTypeArgs) = t
    +          val constructorSymbol = t.member(nme.CONSTRUCTOR)
    +          val params = if (constructorSymbol.isMethod) {
    +            constructorSymbol.asMethod.paramss
    +          } else {
    +            // Find the primary constructor, and use its parameter ordering.
    +            val primaryConstructorSymbol: Option[Symbol] =
    +              constructorSymbol.asTerm.alternatives.find(s =>
    +                s.isMethod && s.asMethod.isPrimaryConstructor)
    +
    +            if (primaryConstructorSymbol.isEmpty) {
    +              sys.error("Internal SQL error: Product object did not have a primary constructor.")
    +            } else {
    +              primaryConstructorSymbol.get.asMethod.paramss
    +            }
    +          }
    +
    +          CreateStruct(params.head.map { p =>
    +            val fieldName = p.name.toString
    +            val fieldType = p.typeSignature.substituteTypes(formalTypeArgs, actualTypeArgs)
    +            val fieldValue = Invoke(inputObject, fieldName, dataTypeFor(fieldType))
    +            extractorFor(fieldValue, fieldType)
    +          })
    +
    +        case t if t <:< localTypeOf[Array[_]] =>
    +          val TypeRef(_, _, Seq(elementType)) = t
    +          val elementDataType = dataTypeFor(elementType)
    +          val Schema(dataType, nullable) = schemaFor(elementType)
    +
    +          if (!elementDataType.isInstanceOf[AtomicType]) {
    +            MapObjects(extractorFor(_, elementType), inputObject, elementDataType)
    +          } else {
    +            NewInstance(
    +              classOf[GenericArrayData],
    +              inputObject :: Nil,
    +              dataType = ArrayType(dataType, nullable))
    +          }
    +
    +        case t if t <:< localTypeOf[Seq[_]] =>
    +          val TypeRef(_, _, Seq(elementType)) = t
    +          val elementDataType = dataTypeFor(elementType)
    +          val Schema(dataType, nullable) = schemaFor(elementType)
    +
    +          if (!elementDataType.isInstanceOf[AtomicType]) {
    +            MapObjects(extractorFor(_, elementType), inputObject, elementDataType)
    +          } else {
    +            NewInstance(
    +              classOf[GenericArrayData],
    +              inputObject :: Nil,
    +              dataType = ArrayType(dataType, nullable))
    +          }
    +
    +        case t if t <:< localTypeOf[Map[_, _]] =>
    +          val TypeRef(_, _, Seq(keyType, valueType)) = t
    +          val Schema(keyDataType, _) = schemaFor(keyType)
    +          val Schema(valueDataType, valueNullable) = schemaFor(valueType)
    +
    +          val rawMap = inputObject
    +          val keys =
    +            NewInstance(
    +              classOf[GenericArrayData],
    +              Invoke(rawMap, "keys", ObjectType(classOf[scala.collection.GenIterable[_]])) :: Nil,
    +              dataType = ObjectType(classOf[ArrayData]))
    +          val values =
    +            NewInstance(
    +              classOf[GenericArrayData],
    +              Invoke(rawMap, "values", ObjectType(classOf[scala.collection.GenIterable[_]])) :: Nil,
    +              dataType = ObjectType(classOf[ArrayData]))
    +          NewInstance(
    +            classOf[ArrayBasedMapData],
    +            keys :: values :: Nil,
    +            dataType = MapType(keyDataType, valueDataType, valueNullable))
    --- End diff --
    
    Do we need to use `MapObjects` for non-primitive types?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41925685
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    +case class StaticInvoke(
    +    staticObject: Any,
    +    dataType: DataType,
    +    functionName: String,
    +    arguments: Seq[Expression] = Nil,
    +    propagateNull: Boolean = true) extends Expression {
    --- End diff --
    
    It's not a `LeafExpression` as there are children (fixed in the followup).  Its not `Unevaluable`, but instead *only* supports codegen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41583796
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    +case class StaticInvoke(
    +    staticObject: Any,
    +    dataType: DataType,
    +    functionName: String,
    +    arguments: Seq[Expression] = Nil,
    +    propagateNull: Boolean = true) extends Expression {
    +
    +  val objectName = staticObject match {
    +    case c: Class[_] => c.getName
    +    case other => other.getClass.getName.stripSuffix("$")
    +  }
    +  override def nullable: Boolean = true
    +  override def children: Seq[Expression] = Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    if (propagateNull) {
    +      val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +        s"${ev.isNull} = ${ev.value} == null;"
    +      } else {
    +        ""
    +      }
    +
    +      val argsNonNull = s"!(${argGen.map(_.isNull).mkString(" || ")})"
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        boolean ${ev.isNull} = true;
    +        $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
    +
    +        if ($argsNonNull) {
    +          ${ev.value} = $objectName.$functionName($argString);
    +          $objNullCheck
    +        }
    +       """
    +    } else {
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        final boolean ${ev.isNull} = ${ev.value} == null;
    --- End diff --
    
    `ev.value` is defined next line, I guess we should move this line down.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146421116
  
      [Test build #43378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43378/console) for   PR 9019 at commit [`89d35cb`](https://github.com/apache/spark/commit/89d35cb340bbbafc8b948a3406c356cc9c5e5d57).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait Encoder[T] `
      * `case class ClassEncoder[T](`
      * `case class StaticInvoke(`
      * `case class Invoke(`
      * `case class NewInstance(`
      * `case class UnwrapOption(`
      * `case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends Expression `
      * `case class MapObjects(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41583685
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    +case class StaticInvoke(
    +    staticObject: Any,
    +    dataType: DataType,
    +    functionName: String,
    +    arguments: Seq[Expression] = Nil,
    +    propagateNull: Boolean = true) extends Expression {
    --- End diff --
    
    ah you remind me that we have `Unevaluable`, which is used for expressions that do not support code-gen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146690958
  
    Since this is the infrastructural work of encoder/decoder, let's merge this and use a follow-up pr to address the comments. So, other people can start to play with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146393911
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41566470
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -75,6 +76,242 @@ trait ScalaReflection {
        */
       private def localTypeOf[T: TypeTag]: `Type` = typeTag[T].in(mirror).tpe
     
    +  /**
    +   * Returns the Spark SQL DataType for a given scala type.  Where this is not an exact mapping
    +   * to a native type, an ObjectType is returned. Special handling is also used for Arrays including
    +   * those that hold primitive types.
    +   */
    +  def dataTypeFor(tpe: `Type`): DataType = tpe match {
    +    case t if t <:< definitions.IntTpe => IntegerType
    +    case t if t <:< definitions.LongTpe => LongType
    +    case t if t <:< definitions.DoubleTpe => DoubleType
    +    case t if t <:< definitions.FloatTpe => FloatType
    +    case t if t <:< definitions.ShortTpe => ShortType
    +    case t if t <:< definitions.ByteTpe => ByteType
    +    case t if t <:< definitions.BooleanTpe => BooleanType
    +    case t if t <:< localTypeOf[Array[Byte]] => BinaryType
    +    case _ =>
    +      val className: String = tpe.erasure.typeSymbol.asClass.fullName
    +      className match {
    +        case "scala.Array" =>
    +          val TypeRef(_, _, Seq(arrayType)) = tpe
    +          val cls = arrayType match {
    +            case t if t <:< definitions.IntTpe => classOf[Array[Int]]
    +            case t if t <:< definitions.LongTpe => classOf[Array[Long]]
    +            case t if t <:< definitions.DoubleTpe => classOf[Array[Double]]
    +            case t if t <:< definitions.FloatTpe => classOf[Array[Float]]
    +            case t if t <:< definitions.ShortTpe => classOf[Array[Short]]
    +            case t if t <:< definitions.ByteTpe => classOf[Array[Byte]]
    +            case t if t <:< definitions.BooleanTpe => classOf[Array[Boolean]]
    +            case other =>
    +              // There is probably a better way to do this, but I couldn't find it...
    +              val elementType = dataTypeFor(other).asInstanceOf[ObjectType].cls
    +              java.lang.reflect.Array.newInstance(elementType, 1).getClass
    +
    +          }
    +          ObjectType(cls)
    +        case other => ObjectType(Utils.classForName(className))
    +      }
    +  }
    +
    +  /** Returns expressions for extracting all the fields from the given type. */
    +  def extractorsFor[T : TypeTag](inputObject: Expression): Seq[Expression] = {
    +    ScalaReflectionLock.synchronized {
    +      extractorFor(inputObject, typeTag[T].tpe).asInstanceOf[CreateStruct].children
    +    }
    +  }
    +
    +  /** Helper for extracting internal fields from a case class. */
    +  protected def extractorFor(
    +      inputObject: Expression,
    +      tpe: `Type`): Expression = ScalaReflectionLock.synchronized {
    --- End diff --
    
    Do we need to take care java types like `Integer` at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41568645
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    --- End diff --
    
    Maybe add examples to expressions defined in this file to make how they will be used more obvious? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41583866
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    +case class StaticInvoke(
    +    staticObject: Any,
    +    dataType: DataType,
    +    functionName: String,
    +    arguments: Seq[Expression] = Nil,
    +    propagateNull: Boolean = true) extends Expression {
    +
    +  val objectName = staticObject match {
    +    case c: Class[_] => c.getName
    +    case other => other.getClass.getName.stripSuffix("$")
    +  }
    +  override def nullable: Boolean = true
    +  override def children: Seq[Expression] = Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    if (propagateNull) {
    +      val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +        s"${ev.isNull} = ${ev.value} == null;"
    +      } else {
    +        ""
    +      }
    +
    +      val argsNonNull = s"!(${argGen.map(_.isNull).mkString(" || ")})"
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        boolean ${ev.isNull} = true;
    +        $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
    +
    +        if ($argsNonNull) {
    +          ${ev.value} = $objectName.$functionName($argString);
    +          $objNullCheck
    +        }
    +       """
    +    } else {
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        final boolean ${ev.isNull} = ${ev.value} == null;
    --- End diff --
    
    and the null check should consider primitive type too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41565489
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    +case class StaticInvoke(
    +    staticObject: Any,
    +    dataType: DataType,
    +    functionName: String,
    +    arguments: Seq[Expression] = Nil,
    +    propagateNull: Boolean = true) extends Expression {
    +
    +  val objectName = staticObject match {
    +    case c: Class[_] => c.getName
    +    case other => other.getClass.getName.stripSuffix("$")
    +  }
    +  override def nullable: Boolean = true
    +  override def children: Seq[Expression] = Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    if (propagateNull) {
    +      val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +        s"${ev.isNull} = ${ev.value} == null;"
    +      } else {
    +        ""
    +      }
    +
    +      val argsNonNull = s"!(${argGen.map(_.isNull).mkString(" || ")})"
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        boolean ${ev.isNull} = true;
    +        $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
    +
    +        if ($argsNonNull) {
    +          ${ev.value} = $objectName.$functionName($argString);
    +          $objNullCheck
    +        }
    +       """
    +    } else {
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        final boolean ${ev.isNull} = ${ev.value} == null;
    +        $javaType ${ev.value} = $objectName.$functionName($argString);
    +      """
    +    }
    +  }
    +}
    +
    +/**
    + * Calls the specified function on an object, optionally passing arguments.  If the `targetObject`
    + * expression evaluates to null then null will be returned.
    + *
    + * @param targetObject An expression that will return the object to call the method on.
    + * @param functionName The name of the method to call.
    + * @param dataType The expected return type of the function.
    + * @param arguments An optional list of expressions, whos evaluation will be passed to the function.
    + */
    +case class Invoke(
    +    targetObject: Expression,
    +    functionName: String,
    +    dataType: DataType,
    +    arguments: Seq[Expression] = Nil) extends Expression {
    +
    +  override def nullable: Boolean = true
    +  override def children: Seq[Expression] = targetObject :: Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val obj = targetObject.gen(ctx)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    // If the function can return null, we do an extra check to make sure our null bit is still set
    +    // correctly.
    +    val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +      s"${ev.isNull} = ${ev.value} == null;"
    +    } else {
    +      ""
    +    }
    +
    +    s"""
    +      ${obj.code}
    +      ${argGen.map(_.code).mkString("\n")}
    +
    +      boolean ${ev.isNull} = ${obj.value} == null;
    +      $javaType ${ev.value} =
    +        ${ev.isNull} ?
    +        ${ctx.defaultValue(dataType)} : ($javaType) ${obj.value}.$functionName($argString);
    +      $objNullCheck
    +    """
    +  }
    +}
    +
    +/**
    + * Constructs a new instance of the given class, using the result of evaluating the specified
    + * expressions as arguments.
    + *
    + * @param cls The class to construct.
    + * @param arguments A list of expression to use as arguments to the constructor.
    + * @param propagateNull When true, if any of the arguments is null, then null will be returned
    + *                      instead of trying to construct the object.
    + * @param dataType The type of object being constructed, as a Spark SQL datatype.  This allows you
    + *                 to manually specify the type when the object in question is a valid internal
    + *                 representation (i.e. ArrayData) instead of an object.
    + */
    +case class NewInstance(
    +    cls: Class[_],
    +    arguments: Seq[Expression],
    +    propagateNull: Boolean = true,
    +    dataType: DataType) extends Expression {
    +  private val className = cls.getName
    +
    +  override def nullable: Boolean = propagateNull
    +
    +  override def children: Seq[Expression] = arguments
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    if (propagateNull) {
    +      val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +        s"${ev.isNull} = ${ev.value} == null;"
    +      } else {
    +        ""
    +      }
    +
    +      val argsNonNull = s"!(${argGen.map(_.isNull).mkString(" || ")})"
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        boolean ${ev.isNull} = true;
    +        $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
    +
    +        if ($argsNonNull) {
    +          ${ev.value} = new $className($argString);
    +          ${ev.isNull} = false;
    +        }
    +       """
    +    } else {
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        final boolean ${ev.isNull} = ${ev.value} == null;
    +        $javaType ${ev.value} = new $className($argString);
    +      """
    +    }
    +  }
    +}
    +
    +/**
    + * Given an expression that returns on object of type `Option[_]`, this expression unwraps the
    + * option into the specified Spark SQL datatype.  In the case of `None`, the nullbit is set instead.
    + *
    + * @param dataType The expected unwrapped option type.
    + * @param child An expression that returns an `Option`
    + */
    +case class UnwrapOption(
    +    dataType: DataType,
    +    child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def nullable: Boolean = true
    +
    +  override def children: Seq[Expression] = Nil
    +
    +  override def inputTypes: Seq[AbstractDataType] = ObjectType :: Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val inputObject = child.gen(ctx)
    +
    +    s"""
    +      ${inputObject.code}
    +
    +      boolean ${ev.isNull} = ${inputObject.value} == null || ${inputObject.value}.isEmpty();
    +      $javaType ${ev.value} =
    +        ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($javaType)${inputObject.value}.get();
    +    """
    +  }
    +}
    +
    +case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends Expression {
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String =
    +    throw new UnsupportedOperationException("Only calling gen() is supported.")
    +
    +  override def children: Seq[Expression] = Nil
    +  override def gen(ctx: CodeGenContext): GeneratedExpressionCode =
    +    GeneratedExpressionCode(code = "", value = value, isNull = isNull)
    +
    +  override def nullable: Boolean = false
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +}
    +
    +/**
    + * Applies the given expression to every element of a collection of items, returning the result
    + * as an ArrayType.  This is similar to a typical map operation, but where the lambda function
    + * is expressed using catalyst expressions.
    + *
    + * The following collection ObjectTypes are currently supported: Seq, Array
    + *
    + * @param function A function that returns an expression, given an attribute that can be used
    + *                 to access the current value.  This is does as a lambda function so that
    + *                 a unique attribute reference can be provided for each expression (thus allowing
    + *                 us to nest multiple MapObject calls).
    + * @param inputData An expression that when evaluted returns a collection object.
    + * @param elementType The type of element in the collection, expressed as a DataType.
    + */
    +case class MapObjects(
    +    function: AttributeReference => Expression,
    +    inputData: Expression,
    +    elementType: DataType) extends Expression {
    +
    +  private val loopAttribute = AttributeReference("loopVar", elementType)()
    +  private val completeFunction = function(loopAttribute)
    +
    +  private val (lengthFunction, itemAccessor) = inputData.dataType match {
    +    case ObjectType(cls) if cls.isAssignableFrom(classOf[Seq[_]]) =>
    +      (".size()", (i: String) => s".apply($i)")
    +    case ObjectType(cls) if cls.isArray =>
    +      (".length", (i: String) => s"[$i]")
    +  }
    +
    +  override def nullable: Boolean = true
    +
    +  override def children: Seq[Expression] = completeFunction :: inputData :: Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported")
    +
    +  override def dataType: DataType = ArrayType(completeFunction.dataType)
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val elementJavaType = ctx.javaType(elementType)
    +    val genInputData = inputData.gen(ctx)
    +
    +    // Variables to hold the element that is currently being processed.
    +    val loopValue = ctx.freshName("loopValue")
    +    val loopIsNull = ctx.freshName("loopIsNull")
    +
    +    val loopVariable = LambdaVariable(loopValue, loopIsNull, elementType)
    +    val boundFunction = completeFunction transform {
    +      case a: AttributeReference if a == loopAttribute => loopVariable
    +    }
    --- End diff --
    
    So, we use a `LambdaVariable` to let `genFunction` have a way to access every element of the input data?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146397119
  
      [Test build #43376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43376/consoleFull) for   PR 9019 at commit [`768055d`](https://github.com/apache/spark/commit/768055df45d9a80ff39230352c3349d948444422).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146421173
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43378/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41469930
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -91,7 +328,6 @@ trait ScalaReflection {
           case t if t <:< localTypeOf[Option[_]] =>
             val TypeRef(_, _, Seq(optType)) = t
             Schema(schemaFor(optType).dataType, nullable = true)
    -      // Need to decide if we actually need a special type here.
    --- End diff --
    
    Haha, probably too late to change this now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146405461
  
      [Test build #43378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43378/consoleFull) for   PR 9019 at commit [`89d35cb`](https://github.com/apache/spark/commit/89d35cb340bbbafc8b948a3406c356cc9c5e5d57).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9019#discussion_r41925769
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala ---
    @@ -0,0 +1,334 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import scala.language.existentials
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Invokes a static function, returning the result.  By default, any of the arguments being null
    + * will result in returning null instead of calling the function.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call
    + * @param functionName The name of the method to call.
    + * @param arguments An optional list of expressions to pass as arguments to the function.
    + * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    + *                      of calling the function.
    + */
    +case class StaticInvoke(
    +    staticObject: Any,
    +    dataType: DataType,
    +    functionName: String,
    +    arguments: Seq[Expression] = Nil,
    +    propagateNull: Boolean = true) extends Expression {
    +
    +  val objectName = staticObject match {
    +    case c: Class[_] => c.getName
    +    case other => other.getClass.getName.stripSuffix("$")
    +  }
    +  override def nullable: Boolean = true
    +  override def children: Seq[Expression] = Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    val javaType = ctx.javaType(dataType)
    +    val argGen = arguments.map(_.gen(ctx))
    +    val argString = argGen.map(_.value).mkString(", ")
    +
    +    if (propagateNull) {
    +      val objNullCheck = if (ctx.defaultValue(dataType) == "null") {
    +        s"${ev.isNull} = ${ev.value} == null;"
    +      } else {
    +        ""
    +      }
    +
    +      val argsNonNull = s"!(${argGen.map(_.isNull).mkString(" || ")})"
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        boolean ${ev.isNull} = true;
    +        $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
    +
    +        if ($argsNonNull) {
    +          ${ev.value} = $objectName.$functionName($argString);
    +          $objNullCheck
    +        }
    +       """
    +    } else {
    +      s"""
    +        ${argGen.map(_.code).mkString("\n")}
    +
    +        final boolean ${ev.isNull} = ${ev.value} == null;
    --- End diff --
    
    Yeah, good call.  This is also fixed in my followup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10993] [SQL] Inital code generated enco...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9019#issuecomment-146421172
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org