You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:38:06 UTC

[jira] [Resolved] (SPARK-16204) Row() interface

     [ https://issues.apache.org/jira/browse/SPARK-16204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-16204.
----------------------------------
    Resolution: Incomplete

> Row() interface
> ---------------
>
>                 Key: SPARK-16204
>                 URL: https://issues.apache.org/jira/browse/SPARK-16204
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Max Moroz
>            Priority: Trivial
>              Labels: bulk-closed
>
> Row('a', 'b') creates a Row-like class, while is slightly unexpected. To create an actual Row, one needs Row(field1 = 'a', field2 = 'b'). Of course 
> Of course, Row('a', 'b')('a', 'b') does create a row.
> I understand the logic, it's similar to namedtuple. But there's a difference in that namedtuple *only* creates classes, while Row creates both Row-like classes and record-like instances. 
> Wouldn't be possible to do something slightly more safe? Like for example, replace expose the class-creation interface through something else, like a global function, or a Row class method, or a brand new class like RowFactory? Overloading the __init__ to create both records and classes seems unnecessarily dangerous.
> In addition, the classes created by Row('a', 'b') allow creation of invalid classes (where the field names are not strings); it would be better to catch that early rather than let it happen silently and then fail (like when someone tries to print(Row('a', 42)).
> And finally, key in Row(field1 = 'a', field2 = 'b') seems to search through the values instead of keys as promised in the documentation at least in 1.6.1 (admittedly the docs only mention it in 2.0.0, but I thought it's not a change between the versions?).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org