You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2017/07/27 06:32:00 UTC
[jira] [Commented] (SPARK-21538) Attribute resolution inconsistency
in Dataset API
[ https://issues.apache.org/jira/browse/SPARK-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102782#comment-16102782 ]
Xiao Li commented on SPARK-21538:
---------------------------------
https://github.com/apache/spark/pull/18740
> Attribute resolution inconsistency in Dataset API
> -------------------------------------------------
>
> Key: SPARK-21538
> URL: https://issues.apache.org/jira/browse/SPARK-21538
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: Adrian Ionescu
>
> {code}
> spark.range(1).withColumnRenamed("id", "x").sort(col("id")) // works
> spark.range(1).withColumnRenamed("id", "x").sort($"id") // works
> spark.range(1).withColumnRenamed("id", "x").sort('id) // works
> spark.range(1).withColumnRenamed("id", "x").sort("id") // fails with:
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among (x);
> ...
> {code}
> It looks like the Dataset API functions taking {{String}} use the basic resolver that only look at the columns at that level, whereas all the other means of expressing an attribute are lazily resolved during the analyzer.
> The reason why the first 3 calls work is explained in the docs for {{object ResolveMissingReferences}}:
> {code}
> /**
> * In many dialects of SQL it is valid to sort by attributes that are not present in the SELECT
> * clause. This rule detects such queries and adds the required attributes to the original
> * projection, so that they will be available during sorting. Another projection is added to
> * remove these attributes after sorting.
> *
> * The HAVING clause could also used a grouping columns that is not presented in the SELECT.
> */
> {code}
> For consistency, it would be good to use the same attribute resolution mechanism everywhere.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org