You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Max Moroz (JIRA)" <ji...@apache.org> on 2016/06/25 06:31:16 UTC

[jira] [Created] (SPARK-16204) Row() interfact

Max Moroz created SPARK-16204:
---------------------------------

             Summary: Row() interfact
                 Key: SPARK-16204
                 URL: https://issues.apache.org/jira/browse/SPARK-16204
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 2.0.0
            Reporter: Max Moroz
            Priority: Trivial


Row('a', 'b') creates a Row-like class, while is slightly unexpected. To create an actual Row, one needs Row(field1 = 'a', field2 = 'b'). Of course 
Of course, Row('a', 'b')('a', 'b') does create a row.

I understand the logic, it's similar to namedtuple. But there's a difference in that namedtuple *only* creates classes, while Row creates both Row-like classes and record-like instances. 

Wouldn't be possible to do something slightly more safe? Like for example, replace expose the class-creation interface through something else, like a global function, or a Row class method, or a brand new class like RowFactory? Overloading the __init__ to create both records and classes seems unnecessarily dangerous.

In addition, the classes created by Row('a', 'b') allow creation of invalid classes (where the field names are not strings); it would be better to catch that early rather than let it happen silently and then fail (like when someone tries to print(Row('a', 42)).

And finally, key in Row(field1 = 'a', field2 = 'b') seems to search through the values instead of keys as promised in the documentation at least in 1.6.1 (admittedly the docs only mention it in 2.0.0, but I thought it's not a change between the versions?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org