You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/11/15 17:28:15 UTC

[GitHub] [spark] nchammas opened a new pull request #34606: [SPARK-37336][ML][PySparkk] Migrate _java2py from SQLContext to SparkSession

nchammas opened a new pull request #34606:
URL: https://github.com/apache/spark/pull/34606


   ### What changes were proposed in this pull request?
   
   https://github.com/apache/spark/blob/2fe9af8b2b91d0a46782dd6fff57eca8609be105/python/pyspark/ml/common.py#L99
   
   `_java2py()` uses a deprecated method to create a SparkSession. This triggers a warning:
   
   https://github.com/apache/spark/blob/007afbbde9b128fc22c133035ad52239943a7290/python/pyspark/sql/context.py#L162-L165
   
   This PR updates `_java2py()` so it creates a SparkSession using the builder.
   
   ### Why are the changes needed?
   
   Non-deprecated internal methods should not invoke deprecated methods that trigger user-visible warnings.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, this PR eliminates a user-facing deprecation warning, e.g. when invoking ML methods.
   
   ### How was this patch tested?
   
   Manual testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nchammas commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
nchammas commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973533379


   OK, I cheated a bit by calling `SparkSession._wrapped` so that the correct type is passed without triggering the user warning inside `SQLContext.getOrCreate()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973648325






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969192664


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145246/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34606: [SPARK-37336][ML][PYTHON] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973650338


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969165375


   **[Test build #145246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145246/testReport)** for PR 34606 at commit [`da6ea20`](https://github.com/apache/spark/commit/da6ea2046f5b3dbfdfb09487b7aea67277a4c62b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973586116


   **[Test build #145419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145419/testReport)** for PR 34606 at commit [`1727101`](https://github.com/apache/spark/commit/1727101550d1bf18fcafb0c326f86792030c43a3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969234902


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973647428


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49890/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969186655


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySparkk] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969143037


   **[Test build #145246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145246/testReport)** for PR 34606 at commit [`da6ea20`](https://github.com/apache/spark/commit/da6ea2046f5b3dbfdfb09487b7aea67277a4c62b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969143037


   **[Test build #145246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145246/testReport)** for PR 34606 at commit [`da6ea20`](https://github.com/apache/spark/commit/da6ea2046f5b3dbfdfb09487b7aea67277a4c62b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969241781


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973648325






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34606:
URL: https://github.com/apache/spark/pull/34606#discussion_r750812025



##########
File path: python/pyspark/ml/common.py
##########
@@ -96,15 +96,15 @@ def _java2py(sc, r, encoding="bytes"):
             return RDD(jrdd, sc)
 
         if clsName == "Dataset":
-            return DataFrame(r, SQLContext.getOrCreate(sc))
+            return DataFrame(r, SparkSession(sc))

Review comment:
       I think we can update together .. I suspect that would be easier actually.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nchammas commented on pull request #34606: [SPARK-37336][ML][PySparkk] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
nchammas commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969144498


   MLlib has the same problem, btw:
   
   https://github.com/apache/spark/blob/2fe9af8b2b91d0a46782dd6fff57eca8609be105/python/pyspark/mllib/common.py#L101
   
   But I don't know if I should fix it too. Isn't MLlib as a whole deprecated? It seems not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34606:
URL: https://github.com/apache/spark/pull/34606#discussion_r749783512



##########
File path: python/pyspark/ml/common.py
##########
@@ -96,15 +96,15 @@ def _java2py(sc, r, encoding="bytes"):
             return RDD(jrdd, sc)
 
         if clsName == "Dataset":
-            return DataFrame(r, SQLContext.getOrCreate(sc))
+            return DataFrame(r, SparkSession(sc))

Review comment:
       Hm .. I think it might need a bigger scope of change.
   <img width="759" alt="Screen Shot 2021-11-16 at 9 30 24 AM" src="https://user-images.githubusercontent.com/6477701/141874067-2273d1c5-a212-4f69-9e0c-dc55258b95c3.png">
   
   I think the constructor currently expects SQLContext .. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nchammas commented on a change in pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
nchammas commented on a change in pull request #34606:
URL: https://github.com/apache/spark/pull/34606#discussion_r750368365



##########
File path: python/pyspark/ml/common.py
##########
@@ -96,15 +96,15 @@ def _java2py(sc, r, encoding="bytes"):
             return RDD(jrdd, sc)
 
         if clsName == "Dataset":
-            return DataFrame(r, SQLContext.getOrCreate(sc))
+            return DataFrame(r, SparkSession(sc))

Review comment:
       Hmm, I didn't look too deeply since my changed worked locally and passed tests. But I'll look into this.
   
   Shall I also [update MLlib](https://github.com/apache/spark/pull/34606#issuecomment-969144498)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969192664


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145246/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-969241781


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973586116


   **[Test build #145419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145419/testReport)** for PR 34606 at commit [`1727101`](https://github.com/apache/spark/commit/1727101550d1bf18fcafb0c326f86792030c43a3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973601416


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49890/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34606: [SPARK-37336][ML][PySpark] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34606:
URL: https://github.com/apache/spark/pull/34606#issuecomment-973636103


   **[Test build #145419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145419/testReport)** for PR 34606 at commit [`1727101`](https://github.com/apache/spark/commit/1727101550d1bf18fcafb0c326f86792030c43a3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #34606: [SPARK-37336][ML][PYTHON] Migrate _java2py from SQLContext to SparkSession

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34606:
URL: https://github.com/apache/spark/pull/34606


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org