You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/04/14 00:27:12 UTC

[jira] [Comment Edited] (SPARK-6865) Decide on semantics for string identifiers in DataFrame API

    [ https://issues.apache.org/jira/browse/SPARK-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493193#comment-14493193 ] 

Reynold Xin edited comment on SPARK-6865 at 4/13/15 10:26 PM:
--------------------------------------------------------------

As discussed offline, it would makes more sense to go with option 1, i.e.

- "str" is treated as a quoted identifier in SQL, equivalent to `str`.
- "* " is a special case in which it refers to all the columns in a data frame. (Note that this means we cannot have a column named "*", which I think is fine.)

The reason is that strings are already quoted, and programmers expect them to be quoted literals without extra escaping.

We will need to fix our resolver with respect to dots.



was (Author: rxin):
As discussed offline, it would makes more sense to go with option 1, i.e.

- "str" is treated as a quoted identifier in SQL, equivalent to `str`.
- "*" is a special case in which it refers to all the columns in a data frame. (Note that this means we cannot have a column named "*", which I think is fine.)

The reason is that strings are already quoted, and programmers expect them to be quoted literals without extra escaping.

We will need to fix our resolver with respect to dots.


> Decide on semantics for string identifiers in DataFrame API
> -----------------------------------------------------------
>
>                 Key: SPARK-6865
>                 URL: https://issues.apache.org/jira/browse/SPARK-6865
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Priority: Blocker
>
> There are two options:
>  - Quoted Identifiers: meaning that the strings are treated as though they were in backticks in SQL.  Any weird characters (spaces, or, etc) are considered part of the identifier.  Kind of weird given that `*` is already a special identifier explicitly allowed by the API
>  - Unquoted parsed identifiers: would allow users to specify things like tableAlias.*  However, would also require explicit use of `backticks` for identifiers with weird characters in them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org