You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bdrillard <gi...@git.apache.org> on 2017/12/26 19:43:38 UTC
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
GitHub user bdrillard opened a pull request:
https://github.com/apache/spark/pull/20085
[SPARK-22739][Catalyst][WIP] Additional Expression Support for Objects
## What changes were proposed in this pull request?
This PR is a work-in-progress adding additional `Expression` support for object types. It intends to provide necessary expressions to support custom encoders (see discussion in [Spark-Avro](https://github.com/databricks/spark-avro/pull/217#issuecomment-342856719)).
This is an initial review, looking for feedback concerning a few questions and guidance concerning best unit-testing practices for new `Expression` classes in Catalyst.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bdrillard/spark spark_expressions
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20085.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20085
----
commit b1973842c7e3fd3e5b8fb0190368c86446b29003
Author: ALeksander Eskilson <al...@...>
Date: 2017-12-11T19:14:36Z
adding new expressions
commit 74fdb9b1079f2cf60616855278ffc27c0a380b8e
Author: ALeksander Eskilson <al...@...>
Date: 2017-12-26T16:52:04Z
adding test case for initialize object generalization
commit 135712f9072b56cbe857c6da64a342481bf00318
Author: ALeksander Eskilson <al...@...>
Date: 2017-12-26T17:28:05Z
fixup
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r158759244
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -106,27 +106,27 @@ trait InvokeLike extends Expression with NonSQLExpression {
}
/**
- * Invokes a static function, returning the result. By default, any of the arguments being null
- * will result in returning null instead of calling the function.
- *
- * @param staticObject The target of the static call. This can either be the object itself
- * (methods defined on scala objects), or the class object
- * (static methods defined in java).
- * @param dataType The expected return type of the function call
- * @param functionName The name of the method to call.
- * @param arguments An optional list of expressions to pass as arguments to the function.
- * @param propagateNull When true, and any of the arguments is null, null will be returned instead
- * of calling the function.
- * @param returnNullable When false, indicating the invoked method will always return
- * non-null value.
- */
+ * Invokes a static function, returning the result. By default, any of the arguments being null
--- End diff --
Why we change the comment style? Looks not consistent with others.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20085
Seems to me this doesn't need to be urgent to be in 2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r161607649
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B
}
/**
- * Initialize a Java Bean instance by setting its field values via setters.
+ * Initialize an object by invoking the given sequence of method names and method arguments.
+ *
+ * @param objectInstance An expression evaluating to a new instance of the object to initialize
+ * @param setters A sequence of method names and their sequence of argument expressions to apply in
+ * series to the object instance
*/
-case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Expression])
+case class InitializeObject(
+ objectInstance: Expression,
+ setters: Seq[(String, Seq[Expression])])
--- End diff --
We can make use of `NewInstance` which just creates an object of a class, but it's not clear how we can make use of a sequence of `Invoke`, since all these setter methods would have `void` return types, we can't chain them in a fluent manner.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20085
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the issue:
https://github.com/apache/spark/pull/20085
@bdrillard @viirya @cloud-fan are we still targeting this for 2.3?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r159519915
--- Diff: sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/GenericBean.java ---
@@ -0,0 +1,56 @@
+package org.apache.spark.sql.catalyst.expressions;
+
+/**
+ *
+ */
+public class GenericBean {
+ private int field1;
+ private String field2;
+
+ public GenericBean() {}
+
+ public GenericBean(int field1, String field2) {
+ this.field1 = field1;
+ this.field2 = field2;
+ }
+
+ public int getField1() {
+ return field1;
+ }
+
+ public void setField1(int field1) {
+ this.field1 = field1;
+ }
+
+ public String getField2() {
+ return field2;
+ }
+
+ public void setField2(String field2) {
+ this.field2 = field2;
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (this == o) {
+ return true;
+ }
+ if (o == null || getClass() != o.getClass()) {
+ return false;
+ }
+
+ GenericBean that = (GenericBean) o;
+
+ if (field1 != that.field1) {
+ return false;
+ }
+ return field2 != null ? field2.equals(that.field2) : that.field2 == null;
+ }
+
+ @Override
+ public int hashCode() {
+ int result = field1;
+ result = 31 * result + (field2 != null ? field2.hashCode() : 0);
+ return result;
+ }
+}
--- End diff --
This object here exists just as an easy unit test for the `InitializeObject` problem I describe above, it doesn't necessarily need to stay as a test resource.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r158761511
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala ---
@@ -390,8 +391,8 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
test("SPARK-22696: InitializeJavaBean should not use global variables") {
--- End diff --
Fixed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r158761155
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -106,27 +106,27 @@ trait InvokeLike extends Expression with NonSQLExpression {
}
/**
- * Invokes a static function, returning the result. By default, any of the arguments being null
- * will result in returning null instead of calling the function.
- *
- * @param staticObject The target of the static call. This can either be the object itself
- * (methods defined on scala objects), or the class object
- * (static methods defined in java).
- * @param dataType The expected return type of the function call
- * @param functionName The name of the method to call.
- * @param arguments An optional list of expressions to pass as arguments to the function.
- * @param propagateNull When true, and any of the arguments is null, null will be returned instead
- * of calling the function.
- * @param returnNullable When false, indicating the invoked method will always return
- * non-null value.
- */
+ * Invokes a static function, returning the result. By default, any of the arguments being null
--- End diff --
Those additional spaces shouldn't be there, I've fixed them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20085
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard closed the pull request at:
https://github.com/apache/spark/pull/20085
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20085
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the issue:
https://github.com/apache/spark/pull/20085
jenkins add to whitelist
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20085
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r158760292
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -182,6 +182,114 @@ case class StaticInvoke(
}
}
+/**
+ * Invokes a call to reference to a static field.
+ *
+ * @param staticObject The target of the static call. This can either be the object itself
+ * (methods defined on scala objects), or the class object
+ * (static methods defined in java).
+ * @param dataType The expected return type of the function call.
+ * @param fieldName The field to reference.
+ */
+case class StaticField(
+ staticObject: Class[_],
+ dataType: DataType,
+ fieldName: String) extends Expression with NonSQLExpression {
+
+ val objectName = staticObject.getName.stripSuffix("$")
+
+ override def nullable: Boolean = false
+ override def children: Seq[Expression] = Nil
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val javaType = ctx.javaType(dataType)
+
+ val code = s"""
+ final $javaType ${ev.value} = $objectName.$fieldName;
+ """
+
+ ev.copy(code = code, isNull = "false")
+ }
+}
+
+/**
+ * Wraps an expression in a try-catch block, which can be used if the body expression may throw a
+ * exception.
+ *
+ * @param body The expression body to wrap in a try-catch block.
+ * @param dataType The return type of the try block.
+ * @param returnNullable When false, indicating the invoked method will always return
+ * non-null value.
+ */
+case class WrapException(
+ body: Expression,
+ dataType: DataType,
+ returnNullable: Boolean = true) extends Expression with NonSQLExpression {
+
+ override def nullable: Boolean = returnNullable
+ override def children: Seq[Expression] = Seq(body)
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val javaType = ctx.javaType(dataType)
+ val returnName = ctx.freshName("returnName")
+
+ val bodyExpr = body.genCode(ctx)
+
+ val code =
+ s"""
+ |final $javaType $returnName;
+ |try {
+ | ${bodyExpr.code}
+ | $returnName = ${bodyExpr.value};
+ |} catch (Exception e) {
+ | org.apache.spark.unsafe.Platform.throwException(e);
+ |}
+ """.stripMargin
+
+ ev.copy(code = code, isNull = bodyExpr.isNull, value = returnName)
+ }
+}
+
+/**
+ * Returns the value if it is of the specified type, or null otherwise
+ *
+ * @param value The value to returned
+ * @param checkedType The type to check against the value via instanceOf
+ * @param dataType The type returned by the expression
+ */
+case class ValueIfType(
+ value: Expression,
+ checkedType: Class[_],
+ dataType: DataType) extends Expression with NonSQLExpression {
--- End diff --
Will we have different data type other than `value.dataType`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r160349007
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B
}
/**
- * Initialize a Java Bean instance by setting its field values via setters.
+ * Initialize an object by invoking the given sequence of method names and method arguments.
+ *
+ * @param objectInstance An expression evaluating to a new instance of the object to initialize
+ * @param setters A sequence of method names and their sequence of argument expressions to apply in
+ * series to the object instance
*/
-case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Expression])
+case class InitializeObject(
+ objectInstance: Expression,
+ setters: Seq[(String, Seq[Expression])])
--- End diff --
To generalize, I think we can just have a `NewObject` expression, which just do `new SomeClass`, the setters are just a bunch of `Invoke`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r158760302
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -182,6 +182,114 @@ case class StaticInvoke(
}
}
+/**
+ * Invokes a call to reference to a static field.
+ *
+ * @param staticObject The target of the static call. This can either be the object itself
+ * (methods defined on scala objects), or the class object
+ * (static methods defined in java).
+ * @param dataType The expected return type of the function call.
+ * @param fieldName The field to reference.
+ */
+case class StaticField(
+ staticObject: Class[_],
+ dataType: DataType,
+ fieldName: String) extends Expression with NonSQLExpression {
+
+ val objectName = staticObject.getName.stripSuffix("$")
+
+ override def nullable: Boolean = false
+ override def children: Seq[Expression] = Nil
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val javaType = ctx.javaType(dataType)
+
+ val code = s"""
+ final $javaType ${ev.value} = $objectName.$fieldName;
+ """
+
+ ev.copy(code = code, isNull = "false")
+ }
+}
+
+/**
+ * Wraps an expression in a try-catch block, which can be used if the body expression may throw a
+ * exception.
+ *
+ * @param body The expression body to wrap in a try-catch block.
+ * @param dataType The return type of the try block.
+ * @param returnNullable When false, indicating the invoked method will always return
+ * non-null value.
+ */
+case class WrapException(
+ body: Expression,
+ dataType: DataType,
+ returnNullable: Boolean = true) extends Expression with NonSQLExpression {
+
+ override def nullable: Boolean = returnNullable
+ override def children: Seq[Expression] = Seq(body)
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val javaType = ctx.javaType(dataType)
+ val returnName = ctx.freshName("returnName")
+
+ val bodyExpr = body.genCode(ctx)
+
+ val code =
+ s"""
+ |final $javaType $returnName;
+ |try {
+ | ${bodyExpr.code}
+ | $returnName = ${bodyExpr.value};
+ |} catch (Exception e) {
+ | org.apache.spark.unsafe.Platform.throwException(e);
+ |}
+ """.stripMargin
+
+ ev.copy(code = code, isNull = bodyExpr.isNull, value = returnName)
+ }
+}
+
+/**
+ * Returns the value if it is of the specified type, or null otherwise
+ *
+ * @param value The value to returned
+ * @param checkedType The type to check against the value via instanceOf
+ * @param dataType The type returned by the expression
+ */
+case class ValueIfType(
+ value: Expression,
--- End diff --
Should we limit the data type of `value` to `ObjectType`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r159519672
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B
}
--- End diff --
In order to support initializations on more complicated objects, it makes sense to generalize `InitializeJavaBean` to an `InitializeObject` that can take a sequence of method names associated with a sequence of those methods' arguments. It seems thought that on plan analysis, Spark fails to resolve the column names against the Expression `children` when those child expressions are gathered from a `Seq[Expression]`, yielding errors like:
```
Resolved attribute(s) 'field1,'field2 missing from field1#2,field2#3 in operator 'DeserializeToObject initializeobject(newInstance(class org.apache.spark.sql.catalyst.expressions.GenericBean), (setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))), obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with the same name appear in the operation: field1,field2. Please check if the right attribute(s) are used.;
org.apache.spark.sql.AnalysisException: Resolved attribute(s) 'field1,'field2 missing from field1#2,field2#3 in operator 'DeserializeToObject initializeobject(newInstance(class org.apache.spark.sql.catalyst.expressions.GenericBean), (setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))), obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with the same name appear in the operation: field1,field2. Please check if the right attribute(s) are used.;
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
```
Interestingly, if we change the `setters` signature from `Seq[(String, Seq[Expression])]` to `Seq[(String, (Expression, Expression)]`, (the use case for Spark-Avro, where objects are initialized by calling `put` with an integer index argument and then some object argument), the plan will resolve. But of course, such a function signature would in a sense be hard-coded for Avro.
Any ideas why passing a sequence of child expressions would yield the analysis error above?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/20085
/cc @cloud-fan @sameeragarwal
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/20085
@viirya I've found the same intent of a `ValueIfType` function can be attained by adding a simpler `InstanceOf` [expressions](https://github.com/apache/spark/pull/20085/files#diff-e436c96ea839dfe446837ab2a3531f93R265) that can be used as the predicate to the existing `If` expression, and then using `ObjectCast` on the results. That approach handles your [first question](https://github.com/apache/spark/pull/20085#discussion_r158760292). To your [second question](https://github.com/apache/spark/pull/20085#discussion_r158760302), it makes sense the input value expression should always have a DataType of ObjectType. Is there a way you'd prefer to make that check? Or throw some kind of exception of `value.dataType != ObjectType`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/20085
cc: @marmbrus
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/20085
This blocks better support for encoders on spark-avro, and seems safe, so I'd really like to include it in possible.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/20085
I've added some comments describing an issue I've had with generalizing `InitializeJavaBean`, which I thought I'd added to this PR earlier but seem to have not been submitted.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r160347559
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -182,6 +182,111 @@ case class StaticInvoke(
}
}
+/**
+ * Invokes a call to reference to a static field.
+ *
+ * @param staticObject The target of the static call. This can either be the object itself
+ * (methods defined on scala objects), or the class object
+ * (static methods defined in java).
+ * @param dataType The expected return type of the function call.
+ * @param fieldName The field to reference.
+ */
+case class StaticField(
+ staticObject: Class[_],
+ dataType: DataType,
+ fieldName: String) extends Expression with NonSQLExpression {
+
+ val objectName = staticObject.getName.stripSuffix("$")
+
+ override def nullable: Boolean = false
+ override def children: Seq[Expression] = Nil
+
+ override def eval(input: InternalRow): Any =
+ throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val javaType = ctx.javaType(dataType)
+
+ val code = s"""
+ final $javaType ${ev.value} = $objectName.$fieldName;
--- End diff --
do we need this expression for such a simple function?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20085
**[Test build #85642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85642/testReport)** for PR 20085 at commit [`4b07b66`](https://github.com/apache/spark/commit/4b07b6639ef08f0ba7560c7027c1dbdae8e2f090).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20085
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85642/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r159520020
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala ---
@@ -436,4 +437,16 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
ctx.addImmutableStateIfNotExists("String", mutableState2)
assert(ctx.inlinedMutableStates.length == 2)
}
+
+ test("InitializeObject") {
+ val bean = new GenericBean(1, "a")
+
+ val encoder = Encoders.bean(classOf[GenericBean])
+ val expressionEncoder = encoder.asInstanceOf[ExpressionEncoder[GenericBean]]
+ val row = expressionEncoder.toRow(bean)
+ val beanFromRow = expressionEncoder.resolveAndBind().fromRow(row)
+
+ assert(beanFromRow.getField1 == bean.getField1)
+ assert(beanFromRow.getField2 == bean.getField2)
+ }
--- End diff --
This test case above demonstrates the issue I encountered with using a sequence of initialization arguments on an object.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20085
Hi @bdrillard , sorry for the late reply, as I was thinking hard about this problem. I think we all agree that we should have more object-related expressions, so that it's more flexible for Spark and other projects to do many things with the codegen-able expressions.
However we should think hard about what object-related expressions Spark should provide, and make sure they are orthogonal and composable. I'm OK with most of the expressions you added, but have some other thoughts about `InitializeObject`.
I propose to improve the existing `NewObject`, and introduce a new phase: initialize phase. So `NewObject` should have a list of expression as its constructor parameters, and another list of expressions as post-hoc initializaion. To create a case class, the construct parameters expressions are not empty and initializing expressions are empty. To create a java bean, it's the opposite.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r158759848
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala ---
@@ -390,8 +391,8 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
test("SPARK-22696: InitializeJavaBean should not use global variables") {
--- End diff --
`InitializeJavaBean` -> `InitializeObject`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20085
**[Test build #85642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85642/testReport)** for PR 20085 at commit [`4b07b66`](https://github.com/apache/spark/commit/4b07b6639ef08f0ba7560c7027c1dbdae8e2f090).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class InstanceOf(`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...
Posted by bdrillard <gi...@git.apache.org>.
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/20085
Closing this PR in favor of #21348.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org