You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/11/15 17:28:15 UTC
[GitHub] [spark] nchammas opened a new pull request #34606: [SPARK-37336][ML][PySparkk] Migrate _java2py from SQLContext to SparkSession
nchammas opened a new pull request #34606:
URL: https://github.com/apache/spark/pull/34606
### What changes were proposed in this pull request?
https://github.com/apache/spark/blob/2fe9af8b2b91d0a46782dd6fff57eca8609be105/python/pyspark/ml/common.py#L99
`_java2py()` uses a deprecated method to create a SparkSession. This triggers a warning:
https://github.com/apache/spark/blob/007afbbde9b128fc22c133035ad52239943a7290/python/pyspark/sql/context.py#L162-L165
This PR updates `_java2py()` so it creates a SparkSession using the builder.
### Why are the changes needed?
Non-deprecated internal methods should not invoke deprecated methods that trigger user-visible warnings.
### Does this PR introduce _any_ user-facing change?
Yes, this PR eliminates a user-facing deprecation warning, e.g. when invoking ML methods.
### How was this patch tested?
Manual testing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] nchammas commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
nchammas commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973533379
OK, I cheated a bit by calling `SparkSession._wrapped` so that the correct type is passed without triggering the user warning inside `SQLContext.getOrCreate()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973648325
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969192664
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145246/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #34606: [SPARK-37336][ML][PYTHON] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973650338
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969165375
**[Test build #145246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145246/testReport)** for PR 34606 at commit [`da6ea20`](https://github.com/apache/spark/commit/da6ea2046f5b3dbfdfb09487b7aea67277a4c62b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973586116
**[Test build #145419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145419/testReport)** for PR 34606 at commit [`1727101`](https://github.com/apache/spark/commit/1727101550d1bf18fcafb0c326f86792030c43a3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969234902
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49716/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973647428
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49890/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969186655
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49716/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySparkk] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969143037
**[Test build #145246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145246/testReport)** for PR 34606 at commit [`da6ea20`](https://github.com/apache/spark/commit/da6ea2046f5b3dbfdfb09487b7aea67277a4c62b).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969143037
**[Test build #145246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145246/testReport)** for PR 34606 at commit [`da6ea20`](https://github.com/apache/spark/commit/da6ea2046f5b3dbfdfb09487b7aea67277a4c62b).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969241781
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49716/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973648325
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34606:
URL: https://github.com/apache/spark/pull/34606#discussion_r750812025
##########
File path: python/pyspark/ml/common.py
##########
@@ -96,15 +96,15 @@ def _java2py(sc, r, encoding="bytes"):
return RDD(jrdd, sc)
if clsName == "Dataset":
- return DataFrame(r, SQLContext.getOrCreate(sc))
+ return DataFrame(r, SparkSession(sc))
Review comment:
I think we can update together .. I suspect that would be easier actually.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] nchammas commented on pull request #34606: [SPARK-37336][ML][PySparkk] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
nchammas commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969144498
MLlib has the same problem, btw:
https://github.com/apache/spark/blob/2fe9af8b2b91d0a46782dd6fff57eca8609be105/python/pyspark/mllib/common.py#L101
But I don't know if I should fix it too. Isn't MLlib as a whole deprecated? It seems not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34606:
URL: https://github.com/apache/spark/pull/34606#discussion_r749783512
##########
File path: python/pyspark/ml/common.py
##########
@@ -96,15 +96,15 @@ def _java2py(sc, r, encoding="bytes"):
return RDD(jrdd, sc)
if clsName == "Dataset":
- return DataFrame(r, SQLContext.getOrCreate(sc))
+ return DataFrame(r, SparkSession(sc))
Review comment:
Hm .. I think it might need a bigger scope of change.
<img width="759" alt="Screen Shot 2021-11-16 at 9 30 24 AM" src="https://user-images.githubusercontent.com/6477701/141874067-2273d1c5-a212-4f69-9e0c-dc55258b95c3.png">
I think the constructor currently expects SQLContext ..
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] nchammas commented on a change in pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
nchammas commented on a change in pull request #34606:
URL: https://github.com/apache/spark/pull/34606#discussion_r750368365
##########
File path: python/pyspark/ml/common.py
##########
@@ -96,15 +96,15 @@ def _java2py(sc, r, encoding="bytes"):
return RDD(jrdd, sc)
if clsName == "Dataset":
- return DataFrame(r, SQLContext.getOrCreate(sc))
+ return DataFrame(r, SparkSession(sc))
Review comment:
Hmm, I didn't look too deeply since my changed worked locally and passed tests. But I'll look into this.
Shall I also [update MLlib](https://github.com/apache/spark/pull/34606#issuecomment-969144498)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969192664
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145246/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969241781
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49716/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973586116
**[Test build #145419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145419/testReport)** for PR 34606 at commit [`1727101`](https://github.com/apache/spark/commit/1727101550d1bf18fcafb0c326f86792030c43a3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973601416
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49890/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973636103
**[Test build #145419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145419/testReport)** for PR 34606 at commit [`1727101`](https://github.com/apache/spark/commit/1727101550d1bf18fcafb0c326f86792030c43a3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #34606: [SPARK-37336][ML][PYTHON] Migrate _java2py from SQLContext to SparkSession
Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34606:
URL: https://github.com/apache/spark/pull/34606
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org