You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bdrillard <gi...@git.apache.org> on 2018/05/17 00:32:12 UTC

[GitHub] spark pull request #21348: [SPARK-22739][Catalyst Additional Expression Supp...

GitHub user bdrillard opened a pull request:

    https://github.com/apache/spark/pull/21348

    [SPARK-22739][Catalyst Additional Expression Support for Objects

    ## What changes were proposed in this pull request?
    
    This PR is a working followup to the expression work begun in #20085. It provides necessary `Expression` definitions to support custom encoders (see this discussion in the [Spark-Avro](https://github.com/databricks/spark-avro/pull/217#issuecomment-342856719) project).
    
    It adds the following expressions:
    
    * `ObjectCast` - performs explicit casting of an `Expression` result to a `DataType`
    * `StaticField` - retrieves a static field against a class that otherwise has no accessor method
    * `InstanceOf` - an `Expression` for the Java `instanceof` operation
    
    Modifies `NewInstance` to take a sequence of method-name and arguments initialization tuples, which are executed against the newly constructed object instance.
    
    Removes `InitializeJavaBean`, as the generalized `NewInstance` subsumes its use-case. 
    
    ## How was this patch tested?
    
    Adds unit test for `NewInstance` supporting post-constructor initializations. All previous "JavaBean" tests were refactored to use `NewInstance`.
    
    Additional examples of working encoders that would use these new expressions can be seen in the [Spark-Avro](https://github.com/bdrillard/spark-avro/blob/avro_encoder_2-4/src/main/scala/com/databricks/spark/avro/AvroEncoder.scala) and [Bunsen](https://github.com/bdrillard/bunsen/blob/issue-23/bunsen-core/src/main/scala/com/cerner/bunsen/EncoderBuilder.scala) projects.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bdrillard/spark SPARK-22739

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21348.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21348
    
----
commit 1e25a4ededa38796f739fe949b7795f753b5f2aa
Author: ALeksander Eskilson <al...@...>
Date:   2018-05-09T19:06:01Z

    [SPARK-22739] adding new expressions

commit f8643e3ea8dcae58e8af739801c4819f6ca40490
Author: ALeksander Eskilson <al...@...>
Date:   2018-05-17T00:18:57Z

    [SPARK-22739] adding new expressions

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21348: [SPARK-22739][Catalyst Additional Expression Supp...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21348#discussion_r188810286
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -408,16 +439,19 @@ object NewInstance {
           cls: Class[_],
           arguments: Seq[Expression],
           dataType: DataType,
    +      initializations: Seq[(String, Seq[Expression])] = Nil,
    --- End diff --
    
    @cloud-fan, see here an implementation of the modifications you described [previously](https://github.com/apache/spark/pull/20085#issuecomment-364043282).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    **[Test build #90701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90701/testReport)** for PR 21348 at commit [`ab51b9f`](https://github.com/apache/spark/commit/ab51b9fc4b41cdd1b59c95738ad90ed8dcb92919).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90699/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90701/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst] Additional Expression Support fo...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    @bdrillard Thanks for our reply, of cause I'll request your review and make sure I understand your implement correctly.
    ```
    Should we go ahead and re-open it and follow through with this PR, since it's required work for SPARK-25789?
    ```
    As avro has been included in Spark, maybe it's ok to just add these new-added expression in external?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21348: [SPARK-22739][Catalyst] Additional Expression Sup...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21348#discussion_r228563440
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -1799,3 +1805,65 @@ case class ValidateExternalType(child: Expression, expected: DataType)
         ev.copy(code = code, isNull = input.isNull)
       }
     }
    +
    +/**
    + * Determines if the given value is an instanceof a given class.
    + *
    + * @param value the value to check
    + * @param checkedType the class to check the value against
    + */
    +case class InstanceOf(
    +    value: Expression,
    +    checkedType: Class[_]) extends Expression with NonSQLExpression {
    +
    +  override def nullable: Boolean = false
    +  override def children: Seq[Expression] = value :: Nil
    +  override def dataType: DataType = BooleanType
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +
    +    val obj = value.genCode(ctx)
    +
    +    val code =
    +      s"""
    +        ${obj.code}
    +        final boolean ${ev.value} = ${obj.value} instanceof ${checkedType.getName};
    +      """
    +
    +    ev.copy(code = code, isNull = FalseLiteral)
    +  }
    +}
    +
    +/**
    + * Casts the result of an expression to another type.
    + *
    + * @param value The value to cast
    + * @param resultType The type to which the value should be cast
    + */
    +case class ObjectCast(value: Expression, resultType: DataType)
    +  extends Expression with NonSQLExpression {
    +
    +  override def nullable: Boolean = value.nullable
    +  override def dataType: DataType = resultType
    +  override def children: Seq[Expression] = value :: Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    --- End diff --
    
    Any problem on implementing none code-generated evaluation?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    **[Test build #90700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90700/testReport)** for PR 21348 at commit [`cd7e6f3`](https://github.com/apache/spark/commit/cd7e6f361935278dad3ad58b12ccd0ec8bccbed1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    **[Test build #90699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90699/testReport)** for PR 21348 at commit [`f8643e3`](https://github.com/apache/spark/commit/f8643e3ea8dcae58e8af739801c4819f6ca40490).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `            s\"in any enclosing class nor any supertype of $`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90700/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    **[Test build #90700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90700/testReport)** for PR 21348 at commit [`cd7e6f3`](https://github.com/apache/spark/commit/cd7e6f361935278dad3ad58b12ccd0ec8bccbed1).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21348: [SPARK-22739][Catalyst] Additional Expression Sup...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21348#discussion_r228577979
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -1799,3 +1805,65 @@ case class ValidateExternalType(child: Expression, expected: DataType)
         ev.copy(code = code, isNull = input.isNull)
       }
     }
    +
    +/**
    + * Determines if the given value is an instanceof a given class.
    + *
    + * @param value the value to check
    + * @param checkedType the class to check the value against
    + */
    +case class InstanceOf(
    +    value: Expression,
    +    checkedType: Class[_]) extends Expression with NonSQLExpression {
    +
    +  override def nullable: Boolean = false
    +  override def children: Seq[Expression] = value :: Nil
    +  override def dataType: DataType = BooleanType
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +
    +    val obj = value.genCode(ctx)
    +
    +    val code =
    +      s"""
    +        ${obj.code}
    +        final boolean ${ev.value} = ${obj.value} instanceof ${checkedType.getName};
    +      """
    +
    +    ev.copy(code = code, isNull = FalseLiteral)
    +  }
    +}
    +
    +/**
    + * Casts the result of an expression to another type.
    + *
    + * @param value The value to cast
    + * @param resultType The type to which the value should be cast
    + */
    +case class ObjectCast(value: Expression, resultType: DataType)
    +  extends Expression with NonSQLExpression {
    +
    +  override def nullable: Boolean = value.nullable
    +  override def dataType: DataType = resultType
    +  override def children: Seq[Expression] = value :: Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
    --- End diff --
    
    That's not a pattern I'd seen for `eval` at the time of writing this PR. Is there another expression that has an example? I could refactor and find out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst] Additional Expression Support fo...

Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Hi @xuanyuanking, if you'd like to take on the work to fold my prior work in databricks/spark-avro#217 into Spark, that sounds good to me. Please do include me in pull-requests on this topic. Our work with an `Encoder` for Avro fulfills particularly heavy use-cases. I'm happy to review the work and I'd also be able to test the feature branch against our workflows as it's being discussed.
    
    The parent ticket was previously marked as `Resolved`. Should we go ahead and re-open it and follow through with this PR, since it's required work for [SPARK-25789](https://issues.apache.org/jira/browse/SPARK-25789)? cc: @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst] Additional Expression Support fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    **[Test build #90699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90699/testReport)** for PR 21348 at commit [`f8643e3`](https://github.com/apache/spark/commit/f8643e3ea8dcae58e8af739801c4819f6ca40490).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    **[Test build #90701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90701/testReport)** for PR 21348 at commit [`ab51b9f`](https://github.com/apache/spark/commit/ab51b9fc4b41cdd1b59c95738ad90ed8dcb92919).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst] Additional Expression Support fo...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    The jira SPARK-25789 guide me here, thanks for @bdrillard your great job, we also meet the requirement on supporting dataset of avro during Structure Streaming. I'm adapting your code in https://github.com/databricks/spark-avro/pull/217 to newer spark version, it seems work well. Could you give me credit to give a following up work of dataset of avro support? Look forward to your reply :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21348
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org