You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bdrillard <gi...@git.apache.org> on 2018/05/17 00:32:12 UTC
[GitHub] spark pull request #21348: [SPARK-22739][Catalyst Additional Expression Supp...
GitHub user bdrillard opened a pull request:
https://github.com/apache/spark/pull/21348
[SPARK-22739][Catalyst Additional Expression Support for Objects
## What changes were proposed in this pull request?
This PR is a working followup to the expression work begun in #20085. It provides necessary `Expression` definitions to support custom encoders (see this discussion in the [Spark-Avro](https://github.com/databricks/spark-avro/pull/217#issuecomment-342856719) project).
It adds the following expressions:
* `ObjectCast` - performs explicit casting of an `Expression` result to a `DataType`
* `StaticField` - retrieves a static field against a class that otherwise has no accessor method
* `InstanceOf` - an `Expression` for the Java `instanceof` operation
Modifies `NewInstance` to take a sequence of method-name and arguments initialization tuples, which are executed against the newly constructed object instance.
Removes `InitializeJavaBean`, as the generalized `NewInstance` subsumes its use-case.
## How was this patch tested?
Adds unit test for `NewInstance` supporting post-constructor initializations. All previous "JavaBean" tests were refactored to use `NewInstance`.
Additional examples of working encoders that would use these new expressions can be seen in the [Spark-Avro](https://github.com/bdrillard/spark-avro/blob/avro_encoder_2-4/src/main/scala/com/databricks/spark/avro/AvroEncoder.scala) and [Bunsen](https://github.com/bdrillard/bunsen/blob/issue-23/bunsen-core/src/main/scala/com/cerner/bunsen/EncoderBuilder.scala) projects.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bdrillard/spark SPARK-22739
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21348.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21348
----
commit 1e25a4ededa38796f739fe949b7795f753b5f2aa
Author: ALeksander Eskilson <al...@...>
Date: 2018-05-09T19:06:01Z
[SPARK-22739] adding new expressions
commit f8643e3ea8dcae58e8af739801c4819f6ca40490
Author: ALeksander Eskilson <al...@...>
Date: 2018-05-17T00:18:57Z
[SPARK-22739] adding new expressions
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21348: [SPARK-22739][Catalyst Additional Expression Supp...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/21348#discussion_r188810286
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -408,16 +439,19 @@ object NewInstance {
cls: Class[_],
arguments: Seq[Expression],
dataType: DataType,
+ initializations: Seq[(String, Seq[Expression])] = Nil,
--- End diff --
@cloud-fan, see here an implementation of the modifications you described [previously](https://github.com/apache/spark/pull/20085#issuecomment-364043282).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21348
**[Test build #90701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90701/testReport)** for PR 21348 at commit [`ab51b9f`](https://github.com/apache/spark/commit/ab51b9fc4b41cdd1b59c95738ad90ed8dcb92919).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21348
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90699/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21348
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90701/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst] Additional Expression Support fo...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/21348
@bdrillard Thanks for our reply, of cause I'll request your review and make sure I understand your implement correctly.
```
Should we go ahead and re-open it and follow through with this PR, since it's required work for SPARK-25789?
```
As avro has been included in Spark, maybe it's ok to just add these new-added expression in external?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21348
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21348: [SPARK-22739][Catalyst] Additional Expression Sup...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/21348#discussion_r228563440
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -1799,3 +1805,65 @@ case class ValidateExternalType(child: Expression, expected: DataType)
ev.copy(code = code, isNull = input.isNull)
}
}
+
+/**
+ * Determines if the given value is an instanceof a given class.
+ *
+ * @param value the value to check
+ * @param checkedType the class to check the value against
+ */
+case class InstanceOf(
+ value: Expression,
+ checkedType: Class[_]) extends Expression with NonSQLExpression {
+
+ override def nullable: Boolean = false
+ override def children: Seq[Expression] = value :: Nil
+ override def dataType: DataType = BooleanType
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+ val obj = value.genCode(ctx)
+
+ val code =
+ s"""
+ ${obj.code}
+ final boolean ${ev.value} = ${obj.value} instanceof ${checkedType.getName};
+ """
+
+ ev.copy(code = code, isNull = FalseLiteral)
+ }
+}
+
+/**
+ * Casts the result of an expression to another type.
+ *
+ * @param value The value to cast
+ * @param resultType The type to which the value should be cast
+ */
+case class ObjectCast(value: Expression, resultType: DataType)
+ extends Expression with NonSQLExpression {
+
+ override def nullable: Boolean = value.nullable
+ override def dataType: DataType = resultType
+ override def children: Seq[Expression] = value :: Nil
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
--- End diff --
Any problem on implementing none code-generated evaluation?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21348
**[Test build #90700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90700/testReport)** for PR 21348 at commit [`cd7e6f3`](https://github.com/apache/spark/commit/cd7e6f361935278dad3ad58b12ccd0ec8bccbed1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21348
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21348
**[Test build #90699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90699/testReport)** for PR 21348 at commit [`f8643e3`](https://github.com/apache/spark/commit/f8643e3ea8dcae58e8af739801c4819f6ca40490).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* ` s\"in any enclosing class nor any supertype of $`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21348
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90700/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21348
**[Test build #90700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90700/testReport)** for PR 21348 at commit [`cd7e6f3`](https://github.com/apache/spark/commit/cd7e6f361935278dad3ad58b12ccd0ec8bccbed1).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21348: [SPARK-22739][Catalyst] Additional Expression Sup...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/21348#discussion_r228577979
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -1799,3 +1805,65 @@ case class ValidateExternalType(child: Expression, expected: DataType)
ev.copy(code = code, isNull = input.isNull)
}
}
+
+/**
+ * Determines if the given value is an instanceof a given class.
+ *
+ * @param value the value to check
+ * @param checkedType the class to check the value against
+ */
+case class InstanceOf(
+ value: Expression,
+ checkedType: Class[_]) extends Expression with NonSQLExpression {
+
+ override def nullable: Boolean = false
+ override def children: Seq[Expression] = value :: Nil
+ override def dataType: DataType = BooleanType
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+ val obj = value.genCode(ctx)
+
+ val code =
+ s"""
+ ${obj.code}
+ final boolean ${ev.value} = ${obj.value} instanceof ${checkedType.getName};
+ """
+
+ ev.copy(code = code, isNull = FalseLiteral)
+ }
+}
+
+/**
+ * Casts the result of an expression to another type.
+ *
+ * @param value The value to cast
+ * @param resultType The type to which the value should be cast
+ */
+case class ObjectCast(value: Expression, resultType: DataType)
+ extends Expression with NonSQLExpression {
+
+ override def nullable: Boolean = value.nullable
+ override def dataType: DataType = resultType
+ override def children: Seq[Expression] = value :: Nil
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
--- End diff --
That's not a pattern I'd seen for `eval` at the time of writing this PR. Is there another expression that has an example? I could refactor and find out.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst] Additional Expression Support fo...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/21348
Hi @xuanyuanking, if you'd like to take on the work to fold my prior work in databricks/spark-avro#217 into Spark, that sounds good to me. Please do include me in pull-requests on this topic. Our work with an `Encoder` for Avro fulfills particularly heavy use-cases. I'm happy to review the work and I'd also be able to test the feature branch against our workflows as it's being discussed.
The parent ticket was previously marked as `Resolved`. Should we go ahead and re-open it and follow through with this PR, since it's required work for [SPARK-25789](https://issues.apache.org/jira/browse/SPARK-25789)? cc: @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst] Additional Expression Support fo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21348
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21348
**[Test build #90699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90699/testReport)** for PR 21348 at commit [`f8643e3`](https://github.com/apache/spark/commit/f8643e3ea8dcae58e8af739801c4819f6ca40490).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21348
**[Test build #90701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90701/testReport)** for PR 21348 at commit [`ab51b9f`](https://github.com/apache/spark/commit/ab51b9fc4b41cdd1b59c95738ad90ed8dcb92919).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst] Additional Expression Support fo...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/21348
The jira SPARK-25789 guide me here, thanks for @bdrillard your great job, we also meet the requirement on supporting dataset of avro during Structure Streaming. I'm adapting your code in https://github.com/databricks/spark-avro/pull/217 to newer spark version, it seems work well. Could you give me credit to give a following up work of dataset of avro support? Look forward to your reply :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21348
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21348: [SPARK-22739][Catalyst Additional Expression Support for...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21348
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org