You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/06/26 01:53:44 UTC

[spark] branch branch-3.4 updated: [SPARK-44184][PYTHON][DOCS] Remove a wrong doc about `ARROW_PRE_0_15_IPC_FORMAT`

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new e3abc1bf7b6 [SPARK-44184][PYTHON][DOCS] Remove a wrong doc about `ARROW_PRE_0_15_IPC_FORMAT`
e3abc1bf7b6 is described below

commit e3abc1bf7b680eba202c733b1c2217ada43d2e77
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Sun Jun 25 18:53:25 2023 -0700

    [SPARK-44184][PYTHON][DOCS] Remove a wrong doc about `ARROW_PRE_0_15_IPC_FORMAT`
    
    ### What changes were proposed in this pull request?
    
    This PR aims to remove a wrong documentation about `ARROW_PRE_0_15_IPC_FORMAT`.
    
    ### Why are the changes needed?
    
    Since Apache Spark 3.0.0, Spark doesn't allow `ARROW_PRE_0_15_IPC_FORMAT` environment variable at all.
    
    https://github.com/apache/spark/blob/2407183cb8637b6ac2d1b76320cae9cbde3411da/python/pyspark/sql/pandas/utils.py#L69-L73
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. This is a removal of outdated wrong documentation.
    
    ### How was this patch tested?
    
    Manual review.
    
    Closes #41730 from dongjoon-hyun/SPARK-44184.
    
    Authored-by: Dongjoon Hyun <do...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit 00e7c08606d0b6de22604d2a7350ea0711355300)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 python/docs/source/user_guide/sql/arrow_pandas.rst | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/python/docs/source/user_guide/sql/arrow_pandas.rst b/python/docs/source/user_guide/sql/arrow_pandas.rst
index 0901be73a02..e6627d18f92 100644
--- a/python/docs/source/user_guide/sql/arrow_pandas.rst
+++ b/python/docs/source/user_guide/sql/arrow_pandas.rst
@@ -392,25 +392,6 @@ For usage with pyspark.sql, the minimum supported versions of Pandas is 1.0.5 an
 Higher versions may be used, however, compatibility and data correctness can not be guaranteed and should
 be verified by the user.
 
-Compatibility Setting for PyArrow >= 0.15.0 and Spark 2.3.x, 2.4.x
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Since Arrow 0.15.0, a change in the binary IPC format requires an environment variable to be
-compatible with previous versions of Arrow <= 0.14.1. This is only necessary to do for PySpark
-users with versions 2.3.x and 2.4.x that have manually upgraded PyArrow to 0.15.0. The following
-can be added to ``conf/spark-env.sh`` to use the legacy Arrow IPC format:
-
-.. code-block:: bash
-
-    ARROW_PRE_0_15_IPC_FORMAT=1
-
-
-This will instruct PyArrow >= 0.15.0 to use the legacy IPC format with the older Arrow Java that
-is in Spark 2.3.x and 2.4.x. Not setting this environment variable will lead to a similar error as
-described in `SPARK-29367 <https://issues.apache.org/jira/browse/SPARK-29367>`_ when running
-``pandas_udf``\s or :meth:`DataFrame.toPandas` with Arrow enabled. More information about the Arrow IPC change can
-be read on the Arrow 0.15.0 release `blog <https://arrow.apache.org/blog/2019/10/06/0.15.0-release/#columnar-streaming-protocol-change-since-0140>`_.
-
 Setting Arrow ``self_destruct`` for memory savings
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org