You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/14 00:28:28 UTC

[GitHub] [spark] zsxwing commented on a change in pull request #26496: [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for version 3.6+

zsxwing commented on a change in pull request #26496: [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for version 3.6+
URL: https://github.com/apache/spark/pull/26496#discussion_r379194091
 
 

 ##########
 File path: docs/pyspark-migration-guide.md
 ##########
 @@ -87,6 +87,8 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
   - Since Spark 3.0, `Column.getItem` is fixed such that it does not call `Column.apply`. Consequently, if `Column` is used as an argument to `getItem`, the indexing operator should be used.
     For example, `map_col.getItem(col('id'))` should be replaced with `map_col[col('id')]`.
 
+  - As of Spark 3.0 `Row` field names are no longer sorted alphabetically when constructing with named arguments for Python versions 3.6 and above, and the order of fields will match that as entered. To enable sorted fields by default, as in Spark 2.4, set the environment variable `PYSPARK_ROW_FIELD_SORTING_ENABLED` to "true". For Python versions less than 3.6, the field names will be sorted alphabetically as the only option.
 
 Review comment:
   nit: Could we mention that this must be set for all processes? For example, `set the environment variable `PYSPARK_ROW_FIELD_SORTING_ENABLED` to "true" for **executors and driver**. This env must be consistent on all executors and driver. Any inconsistency may cause failures or incorrect answers `

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org