You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (Jira)" <ji...@apache.org> on 2020/01/10 22:40:00 UTC

[jira] [Assigned] (SPARK-29748) Remove sorting of fields in PySpark SQL Row creation

     [ https://issues.apache.org/jira/browse/SPARK-29748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Cutler reassigned SPARK-29748:
------------------------------------

    Assignee: Bryan Cutler

> Remove sorting of fields in PySpark SQL Row creation
> ----------------------------------------------------
>
>                 Key: SPARK-29748
>                 URL: https://issues.apache.org/jira/browse/SPARK-29748
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 3.0.0
>            Reporter: Bryan Cutler
>            Assignee: Bryan Cutler
>            Priority: Major
>
> Currently, when a PySpark Row is created with keyword arguments, the fields are sorted alphabetically. This has created a lot of confusion with users because it is not obvious (although it is stated in the pydocs) that they will be sorted alphabetically, and then an error can occur later when applying a schema and the field order does not match.
> The original reason for sorting fields is because kwargs in python < 3.6 are not guaranteed to be in the same order that they were entered. Sorting alphabetically would ensure a consistent order.  Matters are further complicated with the flag {{__from_dict__}} that allows the {{Row}} fields to to be referenced by name when made by kwargs, but this flag is not serialized with the Row and leads to inconsistent behavior.
> This JIRA proposes that any sorting of the Fields is removed. Users with Python 3.6+ creating Rows with kwargs can continue to do so since Python will ensure the order is the same as entered. Users with Python < 3.6 will have to create Rows with an OrderedDict or by using the Row class as a factory (explained in the pydoc).  If kwargs are used, an error will be raised or based on a conf setting it can fall back to a LegacyRow that will sort the fields as before. This LegacyRow will be immediately deprecated and removed once support for Python < 3.6 is dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org