You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dzhigimont (via GitHub)" <gi...@apache.org> on 2023/04/04 16:22:52 UTC

[GitHub] [spark] dzhigimont opened a new pull request, #40664: [SPARK-43024][PS][INFRA] Upgrade pandas to 2.0.0

dzhigimont opened a new pull request, #40664:
URL: https://github.com/apache/spark/pull/40664

   ### What changes were proposed in this pull request?
   The PR proposes to upgrade pandas to 2.0.0
   
   ### Why are the changes needed?
   
   Support latest pandas for pandas API on Spark
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   UT, tests might fail without other changes from Epic https://issues.apache.org/jira/browse/SPARK-42618
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dzhigimont commented on a diff in pull request #40664: [SPARK-43024][PS][INFRA] Upgrade pandas to 2.0.0

Posted by "dzhigimont (via GitHub)" <gi...@apache.org>.
dzhigimont commented on code in PR #40664:
URL: https://github.com/apache/spark/pull/40664#discussion_r1158103795


##########
dev/infra/Dockerfile:
##########
@@ -64,8 +64,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy pyarrow 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN pypy3 -m pip install numpy 'pandas=>2.0.0' scipy coverage matplotlib

Review Comment:
   Ok, closed it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dzhigimont closed pull request #40664: [SPARK-43024][PS][INFRA] Upgrade pandas to 2.0.0

Posted by "dzhigimont (via GitHub)" <gi...@apache.org>.
dzhigimont closed pull request #40664: [SPARK-43024][PS][INFRA] Upgrade pandas to 2.0.0
URL: https://github.com/apache/spark/pull/40664


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40664: [SPARK-43024][PS][INFRA] Upgrade pandas to 2.0.0

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40664:
URL: https://github.com/apache/spark/pull/40664#discussion_r1157883873


##########
dev/infra/Dockerfile:
##########
@@ -64,8 +64,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy pyarrow 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN pypy3 -m pip install numpy 'pandas=>2.0.0' scipy coverage matplotlib

Review Comment:
   In this way with `=>`, pandas will be automatically upgraded whenever it's released, and it will break CI if there's a breaking change. Let's go with https://github.com/apache/spark/pull/40658 one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org