You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2017/07/27 06:32:00 UTC

[jira] [Commented] (SPARK-21538) Attribute resolution inconsistency in Dataset API

    [ https://issues.apache.org/jira/browse/SPARK-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102782#comment-16102782 ] 

Xiao Li commented on SPARK-21538:
---------------------------------

https://github.com/apache/spark/pull/18740

> Attribute resolution inconsistency in Dataset API
> -------------------------------------------------
>
>                 Key: SPARK-21538
>                 URL: https://issues.apache.org/jira/browse/SPARK-21538
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Adrian Ionescu
>
> {code}
> spark.range(1).withColumnRenamed("id", "x").sort(col("id"))  // works
> spark.range(1).withColumnRenamed("id", "x").sort($"id")  // works
> spark.range(1).withColumnRenamed("id", "x").sort('id) // works
> spark.range(1).withColumnRenamed("id", "x").sort("id") // fails with:
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among (x);
> ...
> {code}
> It looks like the Dataset API functions taking {{String}} use the basic resolver that only look at the columns at that level, whereas all the other means of expressing an attribute are lazily resolved during the analyzer.
> The reason why the first 3 calls work is explained in the docs for {{object ResolveMissingReferences}}:
> {code}
>   /**
>    * In many dialects of SQL it is valid to sort by attributes that are not present in the SELECT
>    * clause.  This rule detects such queries and adds the required attributes to the original
>    * projection, so that they will be available during sorting. Another projection is added to
>    * remove these attributes after sorting.
>    *
>    * The HAVING clause could also used a grouping columns that is not presented in the SELECT.
>    */
> {code}
> For consistency, it would be good to use the same attribute resolution mechanism everywhere.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org