You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Max Moroz (JIRA)" <ji...@apache.org> on 2016/06/25 06:31:16 UTC
[jira] [Created] (SPARK-16204) Row() interfact
Max Moroz created SPARK-16204:
---------------------------------
Summary: Row() interfact
Key: SPARK-16204
URL: https://issues.apache.org/jira/browse/SPARK-16204
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 2.0.0
Reporter: Max Moroz
Priority: Trivial
Row('a', 'b') creates a Row-like class, while is slightly unexpected. To create an actual Row, one needs Row(field1 = 'a', field2 = 'b'). Of course
Of course, Row('a', 'b')('a', 'b') does create a row.
I understand the logic, it's similar to namedtuple. But there's a difference in that namedtuple *only* creates classes, while Row creates both Row-like classes and record-like instances.
Wouldn't be possible to do something slightly more safe? Like for example, replace expose the class-creation interface through something else, like a global function, or a Row class method, or a brand new class like RowFactory? Overloading the __init__ to create both records and classes seems unnecessarily dangerous.
In addition, the classes created by Row('a', 'b') allow creation of invalid classes (where the field names are not strings); it would be better to catch that early rather than let it happen silently and then fail (like when someone tries to print(Row('a', 42)).
And finally, key in Row(field1 = 'a', field2 = 'b') seems to search through the values instead of keys as promised in the documentation at least in 1.6.1 (admittedly the docs only mention it in 2.0.0, but I thought it's not a change between the versions?).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org