You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/03/17 00:43:25 UTC
[spark] branch master updated: [SPARK-42826][PS][DOCS] Add migration notes for update to supported pandas version

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 1b40565f99b [SPARK-42826][PS][DOCS] Add migration notes for update to supported pandas version
1b40565f99b is described below

commit 1b40565f99b1a3888be46cfdf673b60198bf47a2
Author: itholic <ha...@databricks.com>
AuthorDate: Fri Mar 17 09:43:10 2023 +0900

    [SPARK-42826][PS][DOCS] Add migration notes for update to supported pandas version
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to add a migration note for update to supported pandas version.
    
    ### Why are the changes needed?
    
    Some APIs have been deprecated or removed from SPARK-42593 to follow pandas 2.0.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Manual review is required.
    
    Closes #40459 from itholic/SPARK-42826.
    
    Lead-authored-by: itholic <ha...@databricks.com>
    Co-authored-by: Haejoon Lee <44...@users.noreply.github.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/docs/source/migration_guide/pyspark_upgrade.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/python/docs/source/migration_guide/pyspark_upgrade.rst b/python/docs/source/migration_guide/pyspark_upgrade.rst
index 0a590548684..d06475f9b36 100644
--- a/python/docs/source/migration_guide/pyspark_upgrade.rst
+++ b/python/docs/source/migration_guide/pyspark_upgrade.rst
@@ -33,6 +33,7 @@ Upgrading from PySpark 3.3 to 3.4
 * In Spark 3.4, the ``Series.concat`` sort parameter will be respected to follow pandas 1.4 behaviors.
 * In Spark 3.4, the ``DataFrame.__setitem__`` will make a copy and replace pre-existing arrays, which will NOT be over-written to follow pandas 1.4 behaviors.
 * In Spark 3.4, the ``SparkSession.sql`` and the Pandas on Spark API ``sql`` have got new parameter ``args`` which provides binding of named parameters to their SQL literals.
+* In Spark 3.4, Pandas API on Spark follows for the pandas 2.0, and some APIs were deprecated or removed in Spark 3.4 according to the changes made in pandas 2.0. Please refer to the [release notes of pandas](https://pandas.pydata.org/docs/dev/whatsnew/) for more details.
 
 
 Upgrading from PySpark 3.2 to 3.3
@@ -108,4 +109,4 @@ Upgrading from PySpark 1.4 to 1.5
 Upgrading from PySpark 1.0-1.2 to 1.3
 -------------------------------------
 
-* When using DataTypes in Python you will need to construct them (i.e. ``StringType()``) instead of referencing a singleton.
\ No newline at end of file
+* When using DataTypes in Python you will need to construct them (i.e. ``StringType()``) instead of referencing a singleton.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org