You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/03/08 10:37:57 UTC
[spark] branch master updated: [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new bacab6a7576 [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow
bacab6a7576 is described below

commit bacab6a7576967ec0871d55ebfc0ef81673321b9
Author: Xinrong Meng <xi...@apache.org>
AuthorDate: Wed Mar 8 19:37:41 2023 +0900

    [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow
    
    ### What changes were proposed in this pull request?
    Improve docstring of mapInPandas and mapInArrow
    
    ### Why are the changes needed?
    For readability. We call out they are not scalar - the input and output of the function might be of different sizes.
    
    ### Does this PR introduce _any_ user-facing change?
    No. Doc change only.
    
    ### How was this patch tested?
    Existing tests.
    
    Closes #40330 from xinrong-meng/doc.
    
    Lead-authored-by: Xinrong Meng <xi...@apache.org>
    Co-authored-by: Hyukjin Kwon <gu...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/sql/pandas/map_ops.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/pandas/map_ops.py b/python/pyspark/sql/pandas/map_ops.py
index 2184fdce52d..1370cad33b1 100644
--- a/python/pyspark/sql/pandas/map_ops.py
+++ b/python/pyspark/sql/pandas/map_ops.py
@@ -44,7 +44,8 @@ class PandasMapOpsMixin:
         together as an iterator of `pandas.DataFrame`\\s to the function and the
         returned iterator of `pandas.DataFrame`\\s are combined as a :class:`DataFrame`.
         Each `pandas.DataFrame` size can be controlled by
-        `spark.sql.execution.arrow.maxRecordsPerBatch`.
+        `spark.sql.execution.arrow.maxRecordsPerBatch`. The size of the function's input and
+        output can be different.
 
         .. versionadded:: 3.0.0
 
@@ -108,7 +109,8 @@ class PandasMapOpsMixin:
         together as an iterator of `pyarrow.RecordBatch`\\s to the function and the
         returned iterator of `pyarrow.RecordBatch`\\s are combined as a :class:`DataFrame`.
         Each `pyarrow.RecordBatch` size can be controlled by
-        `spark.sql.execution.arrow.maxRecordsPerBatch`.
+        `spark.sql.execution.arrow.maxRecordsPerBatch`. The size of the function's input and
+        output can be different.
 
         .. versionadded:: 3.3.0
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org