You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bdrillard <gi...@git.apache.org> on 2018/01/03 20:29:34 UTC

[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r159519672
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B
     }
     
    --- End diff --
    
    In order to support initializations on more complicated objects, it makes sense to generalize `InitializeJavaBean` to an `InitializeObject` that can take a sequence of method names associated with a sequence of those methods' arguments. It seems thought that on plan analysis, Spark fails to resolve the column names against the Expression `children` when those child expressions are gathered from a `Seq[Expression]`, yielding errors like:
    
    ```
    Resolved attribute(s) 'field1,'field2 missing from field1#2,field2#3 in operator 'DeserializeToObject initializeobject(newInstance(class org.apache.spark.sql.catalyst.expressions.GenericBean), (setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))), obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with the same name appear in the operation: field1,field2. Please check if the right attribute(s) are used.;
    org.apache.spark.sql.AnalysisException: Resolved attribute(s) 'field1,'field2 missing from field1#2,field2#3 in operator 'DeserializeToObject initializeobject(newInstance(class org.apache.spark.sql.catalyst.expressions.GenericBean), (setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))), obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with the same name appear in the operation: field1,field2. Please check if the right attribute(s) are used.;
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
    ```
    
    Interestingly, if we change the `setters` signature from `Seq[(String, Seq[Expression])]` to `Seq[(String, (Expression, Expression)]`, (the use case for Spark-Avro, where objects are initialized by calling `put` with an integer index argument and then some object argument), the plan will resolve. But of course, such a function signature would in a sense be hard-coded for Avro.
    
    Any ideas why passing a sequence of child expressions would yield the analysis error above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org