You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "bjornjorgensen (via GitHub)" <gi...@apache.org> on 2023/09/16 19:04:03 UTC

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

bjornjorgensen commented on code in PR #42793:
URL: https://github.com/apache/spark/pull/42793#discussion_r1327998224


##########
python/docs/source/migration_guide/pyspark_upgrade.rst:
##########
@@ -42,6 +42,8 @@ Upgrading from PySpark 3.5 to 4.0
 * In Spark 4.0, ``squeeze`` parameter from ``ps.read_csv`` and ``ps.read_excel`` has been removed from pandas API on Spark.
 * In Spark 4.0, ``null_counts`` parameter from ``DataFrame.info`` has been removed from pandas API on Spark, use ``show_counts`` instead.
 * In Spark 4.0, the result of ``MultiIndex.append`` does not keep the index names from pandas API on Spark.

Review Comment:
   Can we add a line her, where we tell users to have pandas version 2.1.0 installed for spark 4.0  
   The only way now to find witch pandas version to install is to check the docker file in dev/infra 
   
   https://github.com/jupyter/docker-stacks/blob/52a999a554fe42951e017f7be132d808695a1261/images/pyspark-notebook/Dockerfile#L69



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org