You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bdrillard <gi...@git.apache.org> on 2017/12/26 19:43:38 UTC

[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

GitHub user bdrillard opened a pull request:

    https://github.com/apache/spark/pull/20085

    [SPARK-22739][Catalyst][WIP] Additional Expression Support for Objects

    ## What changes were proposed in this pull request?
    
    This PR is a work-in-progress adding additional `Expression` support for object types. It intends to provide necessary expressions to support custom encoders (see discussion in [Spark-Avro](https://github.com/databricks/spark-avro/pull/217#issuecomment-342856719)). 
    
    This is an initial review, looking for feedback concerning a few questions and guidance concerning best unit-testing practices for new `Expression` classes in Catalyst.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bdrillard/spark spark_expressions

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20085.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20085
    
----
commit b1973842c7e3fd3e5b8fb0190368c86446b29003
Author: ALeksander Eskilson <al...@...>
Date:   2017-12-11T19:14:36Z

    adding new expressions

commit 74fdb9b1079f2cf60616855278ffc27c0a380b8e
Author: ALeksander Eskilson <al...@...>
Date:   2017-12-26T16:52:04Z

    adding test case for initialize object generalization

commit 135712f9072b56cbe857c6da64a342481bf00318
Author: ALeksander Eskilson <al...@...>
Date:   2017-12-26T17:28:05Z

    fixup

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r158759244
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -106,27 +106,27 @@ trait InvokeLike extends Expression with NonSQLExpression {
     }
     
     /**
    - * Invokes a static function, returning the result.  By default, any of the arguments being null
    - * will result in returning null instead of calling the function.
    - *
    - * @param staticObject The target of the static call.  This can either be the object itself
    - *                     (methods defined on scala objects), or the class object
    - *                     (static methods defined in java).
    - * @param dataType The expected return type of the function call
    - * @param functionName The name of the method to call.
    - * @param arguments An optional list of expressions to pass as arguments to the function.
    - * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    - *                      of calling the function.
    - * @param returnNullable When false, indicating the invoked method will always return
    - *                       non-null value.
    - */
    +  * Invokes a static function, returning the result.  By default, any of the arguments being null
    --- End diff --
    
    Why we change the comment style? Looks not consistent with others.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    Seems to me this doesn't need to be urgent to be in 2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r161607649
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B
     }
     
     /**
    - * Initialize a Java Bean instance by setting its field values via setters.
    + * Initialize an object by invoking the given sequence of method names and method arguments.
    + *
    + * @param objectInstance An expression evaluating to a new instance of the object to initialize
    + * @param setters A sequence of method names and their sequence of argument expressions to apply in
    + *                series to the object instance
      */
    -case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Expression])
    +case class InitializeObject(
    +  objectInstance: Expression,
    +  setters: Seq[(String, Seq[Expression])])
    --- End diff --
    
    We can make use of `NewInstance` which just creates an object of a class, but it's not clear how we can make use of a sequence of `Invoke`, since all these setter methods would have `void` return types, we can't chain them in a fluent manner.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    @bdrillard @viirya @cloud-fan are we still targeting this for 2.3?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r159519915
  
    --- Diff: sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/GenericBean.java ---
    @@ -0,0 +1,56 @@
    +package org.apache.spark.sql.catalyst.expressions;
    +
    +/**
    + *
    + */
    +public class GenericBean {
    +  private int field1;
    +  private String field2;
    +
    +  public GenericBean() {}
    +
    +  public GenericBean(int field1, String field2) {
    +    this.field1 = field1;
    +    this.field2 = field2;
    +  }
    +
    +  public int getField1() {
    +    return field1;
    +  }
    +
    +  public void setField1(int field1) {
    +    this.field1 = field1;
    +  }
    +
    +  public String getField2() {
    +    return field2;
    +  }
    +
    +  public void setField2(String field2) {
    +    this.field2 = field2;
    +  }
    +
    +  @Override
    +  public boolean equals(Object o) {
    +    if (this == o) {
    +      return true;
    +    }
    +    if (o == null || getClass() != o.getClass()) {
    +      return false;
    +    }
    +
    +    GenericBean that = (GenericBean) o;
    +
    +    if (field1 != that.field1) {
    +      return false;
    +    }
    +    return field2 != null ? field2.equals(that.field2) : that.field2 == null;
    +  }
    +
    +  @Override
    +  public int hashCode() {
    +    int result = field1;
    +    result = 31 * result + (field2 != null ? field2.hashCode() : 0);
    +    return result;
    +  }
    +}
    --- End diff --
    
    This object here exists just as an easy unit test for the `InitializeObject` problem I describe above, it doesn't necessarily need to stay as a test resource.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r158761511
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala ---
    @@ -390,8 +391,8 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
     
       test("SPARK-22696: InitializeJavaBean should not use global variables") {
    --- End diff --
    
    Fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r158761155
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -106,27 +106,27 @@ trait InvokeLike extends Expression with NonSQLExpression {
     }
     
     /**
    - * Invokes a static function, returning the result.  By default, any of the arguments being null
    - * will result in returning null instead of calling the function.
    - *
    - * @param staticObject The target of the static call.  This can either be the object itself
    - *                     (methods defined on scala objects), or the class object
    - *                     (static methods defined in java).
    - * @param dataType The expected return type of the function call
    - * @param functionName The name of the method to call.
    - * @param arguments An optional list of expressions to pass as arguments to the function.
    - * @param propagateNull When true, and any of the arguments is null, null will be returned instead
    - *                      of calling the function.
    - * @param returnNullable When false, indicating the invoked method will always return
    - *                       non-null value.
    - */
    +  * Invokes a static function, returning the result.  By default, any of the arguments being null
    --- End diff --
    
    Those additional spaces shouldn't be there, I've fixed them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard closed the pull request at:

    https://github.com/apache/spark/pull/20085


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    jenkins add to whitelist


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r158760292
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -182,6 +182,114 @@ case class StaticInvoke(
       }
     }
     
    +/**
    + * Invokes a call to reference to a static field.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call.
    + * @param fieldName The field to reference.
    + */
    +case class StaticField(
    +  staticObject: Class[_],
    +  dataType: DataType,
    +  fieldName: String) extends Expression with NonSQLExpression {
    +
    +  val objectName = staticObject.getName.stripSuffix("$")
    +
    +  override def nullable: Boolean = false
    +  override def children: Seq[Expression] = Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val javaType = ctx.javaType(dataType)
    +
    +    val code = s"""
    +      final $javaType ${ev.value} = $objectName.$fieldName;
    +    """
    +
    +    ev.copy(code = code, isNull = "false")
    +  }
    +}
    +
    +/**
    + * Wraps an expression in a try-catch block, which can be used if the body expression may throw a
    + * exception.
    + *
    + * @param body The expression body to wrap in a try-catch block.
    + * @param dataType The return type of the try block.
    + * @param returnNullable When false, indicating the invoked method will always return
    + *                       non-null value.
    +  */
    +case class WrapException(
    +    body: Expression,
    +    dataType: DataType,
    +    returnNullable: Boolean = true) extends Expression with NonSQLExpression {
    +
    +  override def nullable: Boolean = returnNullable
    +  override def children: Seq[Expression] = Seq(body)
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val javaType = ctx.javaType(dataType)
    +    val returnName = ctx.freshName("returnName")
    +
    +    val bodyExpr = body.genCode(ctx)
    +
    +    val code =
    +      s"""
    +         |final $javaType $returnName;
    +         |try {
    +         |  ${bodyExpr.code}
    +         |  $returnName = ${bodyExpr.value};
    +         |} catch (Exception e) {
    +         |  org.apache.spark.unsafe.Platform.throwException(e);
    +         |}
    +       """.stripMargin
    +
    +    ev.copy(code = code, isNull = bodyExpr.isNull, value = returnName)
    +  }
    +}
    +
    +/**
    + * Returns the value if it is of the specified type, or null otherwise
    + *
    + * @param value       The value to returned
    + * @param checkedType The type to check against the value via instanceOf
    + * @param dataType    The type returned by the expression
    +  */
    +case class ValueIfType(
    +  value: Expression,
    +  checkedType: Class[_],
    +  dataType: DataType) extends Expression with NonSQLExpression {
    --- End diff --
    
    Will we have different data type other than `value.dataType`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r160349007
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B
     }
     
     /**
    - * Initialize a Java Bean instance by setting its field values via setters.
    + * Initialize an object by invoking the given sequence of method names and method arguments.
    + *
    + * @param objectInstance An expression evaluating to a new instance of the object to initialize
    + * @param setters A sequence of method names and their sequence of argument expressions to apply in
    + *                series to the object instance
      */
    -case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Expression])
    +case class InitializeObject(
    +  objectInstance: Expression,
    +  setters: Seq[(String, Seq[Expression])])
    --- End diff --
    
    To generalize, I think we can just have a `NewObject` expression, which just do `new SomeClass`, the setters are just a bunch of `Invoke`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r158760302
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -182,6 +182,114 @@ case class StaticInvoke(
       }
     }
     
    +/**
    + * Invokes a call to reference to a static field.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call.
    + * @param fieldName The field to reference.
    + */
    +case class StaticField(
    +  staticObject: Class[_],
    +  dataType: DataType,
    +  fieldName: String) extends Expression with NonSQLExpression {
    +
    +  val objectName = staticObject.getName.stripSuffix("$")
    +
    +  override def nullable: Boolean = false
    +  override def children: Seq[Expression] = Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val javaType = ctx.javaType(dataType)
    +
    +    val code = s"""
    +      final $javaType ${ev.value} = $objectName.$fieldName;
    +    """
    +
    +    ev.copy(code = code, isNull = "false")
    +  }
    +}
    +
    +/**
    + * Wraps an expression in a try-catch block, which can be used if the body expression may throw a
    + * exception.
    + *
    + * @param body The expression body to wrap in a try-catch block.
    + * @param dataType The return type of the try block.
    + * @param returnNullable When false, indicating the invoked method will always return
    + *                       non-null value.
    +  */
    +case class WrapException(
    +    body: Expression,
    +    dataType: DataType,
    +    returnNullable: Boolean = true) extends Expression with NonSQLExpression {
    +
    +  override def nullable: Boolean = returnNullable
    +  override def children: Seq[Expression] = Seq(body)
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val javaType = ctx.javaType(dataType)
    +    val returnName = ctx.freshName("returnName")
    +
    +    val bodyExpr = body.genCode(ctx)
    +
    +    val code =
    +      s"""
    +         |final $javaType $returnName;
    +         |try {
    +         |  ${bodyExpr.code}
    +         |  $returnName = ${bodyExpr.value};
    +         |} catch (Exception e) {
    +         |  org.apache.spark.unsafe.Platform.throwException(e);
    +         |}
    +       """.stripMargin
    +
    +    ev.copy(code = code, isNull = bodyExpr.isNull, value = returnName)
    +  }
    +}
    +
    +/**
    + * Returns the value if it is of the specified type, or null otherwise
    + *
    + * @param value       The value to returned
    + * @param checkedType The type to check against the value via instanceOf
    + * @param dataType    The type returned by the expression
    +  */
    +case class ValueIfType(
    +  value: Expression,
    --- End diff --
    
    Should we limit the data type of `value` to `ObjectType`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r159519672
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B
     }
     
    --- End diff --
    
    In order to support initializations on more complicated objects, it makes sense to generalize `InitializeJavaBean` to an `InitializeObject` that can take a sequence of method names associated with a sequence of those methods' arguments. It seems thought that on plan analysis, Spark fails to resolve the column names against the Expression `children` when those child expressions are gathered from a `Seq[Expression]`, yielding errors like:
    
    ```
    Resolved attribute(s) 'field1,'field2 missing from field1#2,field2#3 in operator 'DeserializeToObject initializeobject(newInstance(class org.apache.spark.sql.catalyst.expressions.GenericBean), (setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))), obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with the same name appear in the operation: field1,field2. Please check if the right attribute(s) are used.;
    org.apache.spark.sql.AnalysisException: Resolved attribute(s) 'field1,'field2 missing from field1#2,field2#3 in operator 'DeserializeToObject initializeobject(newInstance(class org.apache.spark.sql.catalyst.expressions.GenericBean), (setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))), obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with the same name appear in the operation: field1,field2. Please check if the right attribute(s) are used.;
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
    ```
    
    Interestingly, if we change the `setters` signature from `Seq[(String, Seq[Expression])]` to `Seq[(String, (Expression, Expression)]`, (the use case for Spark-Avro, where objects are initialized by calling `put` with an integer index argument and then some object argument), the plan will resolve. But of course, such a function signature would in a sense be hard-coded for Avro.
    
    Any ideas why passing a sequence of child expressions would yield the analysis error above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    /cc @cloud-fan @sameeragarwal 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    @viirya I've found the same intent of a `ValueIfType` function can be attained by adding a simpler `InstanceOf` [expressions](https://github.com/apache/spark/pull/20085/files#diff-e436c96ea839dfe446837ab2a3531f93R265) that can be used as the predicate to the existing `If` expression, and then using `ObjectCast` on the results. That approach handles your [first question](https://github.com/apache/spark/pull/20085#discussion_r158760292). To your [second question](https://github.com/apache/spark/pull/20085#discussion_r158760302), it makes sense the input value expression should always have a DataType of ObjectType. Is there a way you'd prefer to make that check? Or throw some kind of exception of `value.dataType != ObjectType`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    cc: @marmbrus 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    This blocks better support for encoders on spark-avro, and seems safe, so I'd really like to include it in possible.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    I've added some comments describing an issue I've had with generalizing `InitializeJavaBean`, which I thought I'd added to this PR earlier but seem to have not been submitted.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r160347559
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -182,6 +182,111 @@ case class StaticInvoke(
       }
     }
     
    +/**
    + * Invokes a call to reference to a static field.
    + *
    + * @param staticObject The target of the static call.  This can either be the object itself
    + *                     (methods defined on scala objects), or the class object
    + *                     (static methods defined in java).
    + * @param dataType The expected return type of the function call.
    + * @param fieldName The field to reference.
    + */
    +case class StaticField(
    +  staticObject: Class[_],
    +  dataType: DataType,
    +  fieldName: String) extends Expression with NonSQLExpression {
    +
    +  val objectName = staticObject.getName.stripSuffix("$")
    +
    +  override def nullable: Boolean = false
    +  override def children: Seq[Expression] = Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val javaType = ctx.javaType(dataType)
    +
    +    val code = s"""
    +      final $javaType ${ev.value} = $objectName.$fieldName;
    --- End diff --
    
    do we need this expression for such a simple function?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    **[Test build #85642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85642/testReport)** for PR 20085 at commit [`4b07b66`](https://github.com/apache/spark/commit/4b07b6639ef08f0ba7560c7027c1dbdae8e2f090).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85642/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r159520020
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala ---
    @@ -436,4 +437,16 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
         ctx.addImmutableStateIfNotExists("String", mutableState2)
         assert(ctx.inlinedMutableStates.length == 2)
       }
    +
    +  test("InitializeObject") {
    +    val bean = new GenericBean(1, "a")
    +
    +    val encoder = Encoders.bean(classOf[GenericBean])
    +    val expressionEncoder = encoder.asInstanceOf[ExpressionEncoder[GenericBean]]
    +    val row = expressionEncoder.toRow(bean)
    +    val beanFromRow = expressionEncoder.resolveAndBind().fromRow(row)
    +
    +    assert(beanFromRow.getField1 == bean.getField1)
    +    assert(beanFromRow.getField2 == bean.getField2)
    +  }
    --- End diff --
    
    This test case above demonstrates the issue I encountered with using a sequence of initialization arguments on an object.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    Hi @bdrillard , sorry for the late reply, as I was thinking hard about this problem. I think we all agree that we should have more object-related expressions, so that it's more flexible for Spark and other projects to do many things with the codegen-able expressions.
    
    However we should think hard about what object-related expressions Spark should provide, and make sure they are orthogonal and composable. I'm OK with most of the expressions you added, but have some other thoughts about `InitializeObject`.
    
    I propose to improve the existing `NewObject`, and introduce a new phase: initialize phase. So `NewObject` should have a list of expression as its constructor parameters, and another list of expressions as post-hoc initializaion. To create a case class, the construct parameters expressions are not empty and initializing expressions are empty. To create a java bean, it's the opposite.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r158759848
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala ---
    @@ -390,8 +391,8 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
     
       test("SPARK-22696: InitializeJavaBean should not use global variables") {
    --- End diff --
    
    `InitializeJavaBean` -> `InitializeObject`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    **[Test build #85642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85642/testReport)** for PR 20085 at commit [`4b07b66`](https://github.com/apache/spark/commit/4b07b6639ef08f0ba7560c7027c1dbdae8e2f090).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class InstanceOf(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:

    https://github.com/apache/spark/pull/20085
  
    Closing this PR in favor of #21348.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org