You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2016/05/27 07:46:55 UTC

[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/13352

    [SPARK-15603][MLLIB] Replace SQLContext with SparkSession in ML/MLLib

    ## What changes were proposed in this pull request?
    
    This PR replaces all deprecated `SQLContext` occurrences with `SparkSession` in `ML/MLLib` module except the following two classes. These two classes use `SQLContext` as their function arguments.
    - ReadWrite.scala
    - TreeModels.scala
    
    ## How was this patch tested?
    
    Pass the existing Jenkins tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-15603

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13352.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13352
    
----
commit ae8acae6173a83d65d9695a464c6af9d58153d47
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-05-27T07:35:48Z

    [SPARK-15603][MLLIB] Replace SQLContext with SparkSession in ML/MLLib

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222082414
  
    **[Test build #59475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59475/consoleFull)** for PR 13352 at commit [`ae8acae`](https://github.com/apache/spark/commit/ae8acae6173a83d65d9695a464c6af9d58153d47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222335566
  
    Sorry, I did not notice this PR. I submitted another PR (https://github.com/apache/spark/pull/13380) that removes `SQLContext` from `MLlib`. Any reason why you still keep `SQLContext` in `ReadWrite.scala` and `TreeModels.scala`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13352#discussion_r64942596
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
    @@ -1178,8 +1176,9 @@ private[python] class PythonMLLibAPI extends Serializable {
       def getIndexedRows(indexedRowMatrix: IndexedRowMatrix): DataFrame = {
         // We use DataFrames for serialization of IndexedRows to Python,
         // so return a DataFrame.
    -    val sqlContext = SQLContext.getOrCreate(indexedRowMatrix.rows.sparkContext)
    -    sqlContext.createDataFrame(indexedRowMatrix.rows)
    +    val sc = indexedRowMatrix.rows.sparkContext
    +    val spark = SparkSession.builder().config(sc.getConf).getOrCreate()
    --- End diff --
    
    there are a number of similar places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222115643
  
    This removes lots of deprecation warning messages like the followings.
    ```
    /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala:126: method getOrCreate in object SQLContext is deprecated: Use SparkSession.builder instead
    [warn]       val sqlContext = SQLContext.getOrCreate(sc)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222206324
  
    Hi, @andrewor14 .
    This is about deprecation warnings about `SQLContext`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222089523
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59475/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222216523
  
    Oh, thank you, @andrewor14 . I see.
    I will make another PR for using `builder.sparkContext(sc)` pattern.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222215279
  
    @dongjoon-hyun actually the `builder.sparkContext` method was added recently so there are other existing places where we could use that. Would you mind submitting a patch to fix those? This patch itself LGTM other than that issue, but we can fix it separately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13352


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13352#discussion_r64942547
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
    @@ -1178,8 +1176,9 @@ private[python] class PythonMLLibAPI extends Serializable {
       def getIndexedRows(indexedRowMatrix: IndexedRowMatrix): DataFrame = {
         // We use DataFrames for serialization of IndexedRows to Python,
         // so return a DataFrame.
    -    val sqlContext = SQLContext.getOrCreate(indexedRowMatrix.rows.sparkContext)
    -    sqlContext.createDataFrame(indexedRowMatrix.rows)
    +    val sc = indexedRowMatrix.rows.sparkContext
    +    val spark = SparkSession.builder().config(sc.getConf).getOrCreate()
    --- End diff --
    
    I think there's a `sparkContext` method in the builder now so you can be more explicit instead of just specifying the conf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222089522
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15603][MLLIB] Replace SQLContext with S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13352#issuecomment-222089414
  
    **[Test build #59475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59475/consoleFull)** for PR 13352 at commit [`ae8acae`](https://github.com/apache/spark/commit/ae8acae6173a83d65d9695a464c6af9d58153d47).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org