You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/03/08 10:37:57 UTC
[spark] branch master updated: [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new bacab6a7576 [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow
bacab6a7576 is described below
commit bacab6a7576967ec0871d55ebfc0ef81673321b9
Author: Xinrong Meng <xi...@apache.org>
AuthorDate: Wed Mar 8 19:37:41 2023 +0900
[SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow
### What changes were proposed in this pull request?
Improve docstring of mapInPandas and mapInArrow
### Why are the changes needed?
For readability. We call out they are not scalar - the input and output of the function might be of different sizes.
### Does this PR introduce _any_ user-facing change?
No. Doc change only.
### How was this patch tested?
Existing tests.
Closes #40330 from xinrong-meng/doc.
Lead-authored-by: Xinrong Meng <xi...@apache.org>
Co-authored-by: Hyukjin Kwon <gu...@gmail.com>
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
python/pyspark/sql/pandas/map_ops.py | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/python/pyspark/sql/pandas/map_ops.py b/python/pyspark/sql/pandas/map_ops.py
index 2184fdce52d..1370cad33b1 100644
--- a/python/pyspark/sql/pandas/map_ops.py
+++ b/python/pyspark/sql/pandas/map_ops.py
@@ -44,7 +44,8 @@ class PandasMapOpsMixin:
together as an iterator of `pandas.DataFrame`\\s to the function and the
returned iterator of `pandas.DataFrame`\\s are combined as a :class:`DataFrame`.
Each `pandas.DataFrame` size can be controlled by
- `spark.sql.execution.arrow.maxRecordsPerBatch`.
+ `spark.sql.execution.arrow.maxRecordsPerBatch`. The size of the function's input and
+ output can be different.
.. versionadded:: 3.0.0
@@ -108,7 +109,8 @@ class PandasMapOpsMixin:
together as an iterator of `pyarrow.RecordBatch`\\s to the function and the
returned iterator of `pyarrow.RecordBatch`\\s are combined as a :class:`DataFrame`.
Each `pyarrow.RecordBatch` size can be controlled by
- `spark.sql.execution.arrow.maxRecordsPerBatch`.
+ `spark.sql.execution.arrow.maxRecordsPerBatch`. The size of the function's input and
+ output can be different.
.. versionadded:: 3.3.0
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org