You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2019/04/24 23:08:00 UTC
[jira] [Updated] (SPARK-27561) Support "lateral column alias references" to allow column aliases to be used within SELECT clauses

     [ https://issues.apache.org/jira/browse/SPARK-27561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen updated SPARK-27561:
-------------------------------
    Description: 
Amazon Redshift has a feature called "lateral column alias references: [https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/]. Quoting from that blogpost:
{quote}The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement:
{code:java}
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
{code}
{quote}
There's more information about this feature on [https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html:]
{quote}The benefit of the lateral alias reference is you don't need to repeat the aliased expression when building more complex expressions in the same target list. When Amazon Redshift parses this type of reference, it just inlines the previously defined aliases. If there is a column with the same name defined in the FROM clause as the previously aliased expression, the column in the FROM clause takes priority. For example, in the above query if there is a column named 'probability' in table raw_data, the 'probability' in the second expression in the target list will refer to that column instead of the alias name 'probability'.
{quote}
It would be nice if Spark supported this syntax. I don't think that this is standard SQL, so it might be a good idea to research if other SQL databases support similar syntax (and to see if they implement the same column resolution strategy as Redshift).

We should also consider whether this needs to be feature-flagged as part of a specific SQL compatibility mode / dialect.

One possibly-related existing ticket: SPARK-9338, which discusses the use of SELECT aliases in GROUP BY expressions.

/cc [~hvanhovell]

  was:
Amazon Redshift has a feature called "lateral column alias references: [https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/]. Quoting from that blogpost:
{quote}The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement:
{code:java}
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
{code}
{quote}
There's more information about this feature on [https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html:]
{quote}The benefit of the lateral alias reference is you don't need to repeat the aliased expression when building more complex expressions in the same target list. When Amazon Redshift parses this type of reference, it just inlines the previously defined aliases. If there is a column with the same name defined in the FROM clause as the previously aliased expression, the column in the FROM clause takes priority. For example, in the above query if there is a column named 'probability' in table raw_data, the 'probability' in the second expression in the target list will refer to that column instead of the alias name 'probability'.
{quote}
It would be nice if Spark supported this syntax. I don't think that this is standard SQL, so it might be a good idea to research if other SQL databases support similar syntax (and to see if they implement the same column resolution strategy as Redshift).

One possibly-related existing ticket: SPARK-9338, which discusses the use of SELECT aliases in GROUP BY expressions.

/cc [~hvanhovell]


> Support "lateral column alias references" to allow column aliases to be used within SELECT clauses
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27561
>                 URL: https://issues.apache.org/jira/browse/SPARK-27561
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Josh Rosen
>            Priority: Major
>
> Amazon Redshift has a feature called "lateral column alias references: [https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/]. Quoting from that blogpost:
> {quote}The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement:
> {code:java}
> select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
> {code}
> {quote}
> There's more information about this feature on [https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html:]
> {quote}The benefit of the lateral alias reference is you don't need to repeat the aliased expression when building more complex expressions in the same target list. When Amazon Redshift parses this type of reference, it just inlines the previously defined aliases. If there is a column with the same name defined in the FROM clause as the previously aliased expression, the column in the FROM clause takes priority. For example, in the above query if there is a column named 'probability' in table raw_data, the 'probability' in the second expression in the target list will refer to that column instead of the alias name 'probability'.
> {quote}
> It would be nice if Spark supported this syntax. I don't think that this is standard SQL, so it might be a good idea to research if other SQL databases support similar syntax (and to see if they implement the same column resolution strategy as Redshift).
> We should also consider whether this needs to be feature-flagged as part of a specific SQL compatibility mode / dialect.
> One possibly-related existing ticket: SPARK-9338, which discusses the use of SELECT aliases in GROUP BY expressions.
> /cc [~hvanhovell]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org