You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2015/04/26 12:47:38 UTC

[jira] [Comment Edited] (SPARK-7133) Implement struct, array, and map field accessor using apply in Scala and __getitem__ in Python

    [ https://issues.apache.org/jira/browse/SPARK-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512968#comment-14512968 ] 

Wenchen Fan edited comment on SPARK-7133 at 4/26/15 10:47 AM:
--------------------------------------------------------------

Hi [~rxin] , the reason why we didn't generalize UnresolvedGetField to support all map, struct, and array is because we didn't need to. With SQL, we can distinguish GetField and GetItem during parse time as they have different grammar("[]" and "."), so we only need to consider ArrayGetField and StructGetField.
Another reason is: GetItem support expressions like `a[2+3]`, GetField only support String.
Now if we want to define a unified API for all map, struct, and array in DataFrame, I think we should consider the API design first. 
cc [~marmbrus] , what do you think?


was (Author: cloud_fan):
Hi [~rxin] , the reason why we didn't generalize UnresolvedGetField to support all map, struct, and array is because we didn't need to. With SQL, we can distinguish GetField and GetItem during parse time as they have different grammar("[]" and "."), so we only need to consider ArrayGetField and StructGetField.
Now if we want to define a unified API for all map, struct, and array in DataFrame, I think we should add MapGetField. 
cc [~marmbrus] , what do you think?

> Implement struct, array, and map field accessor using apply in Scala and __getitem__ in Python
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-7133
>                 URL: https://issues.apache.org/jira/browse/SPARK-7133
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>              Labels: starter
>
> Typing 
> {code}
> df.col[1]
> {code}
> and
> {code}
> df.col['field']
> {code}
> is so much eaiser than
> {code}
> df.col.getField('field')
> df.col.getItem(1)
> {code}
> This would require us to define (in Column) an apply function in Scala, and a __getitem__ function in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org