You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/03/09 18:08:34 UTC
[spark] branch master updated: [SPARK-30941][PYSPARK] Add a note to
asDict to document its behavior when there are duplicate fields
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new d21aab4 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
d21aab4 is described below
commit d21aab403a0a32e8b705b38874c0b335e703bd5d
Author: Liang-Chi Hsieh <vi...@gmail.com>
AuthorDate: Mon Mar 9 11:06:45 2020 -0700
[SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
### What changes were proposed in this pull request?
Adding a note to document `Row.asDict` behavior when there are duplicate fields.
### Why are the changes needed?
When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly.
### Does this PR introduce any user-facing change?
No. Only document change.
### How was this patch tested?
Existing test.
Closes #27853 from viirya/SPARK-30941.
Authored-by: Liang-Chi Hsieh <vi...@gmail.com>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
python/pyspark/sql/types.py | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index a5302e7..320a68d 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -1528,6 +1528,12 @@ class Row(tuple):
:param recursive: turns the nested Rows to dict (default: False).
+ .. note:: If a row contains duplicate field names, e.g., the rows of a join
+ between two :class:`DataFrame` that both have the fields of same names,
+ one of the duplicate fields will be selected by ``asDict``. ``__getitem__``
+ will also return one of the duplicate fields, however returned value might
+ be different to ``asDict``.
+
>>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11}
True
>>> row = Row(key=1, value=Row(name='a', age=2))
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org