You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vectorijk <gi...@git.apache.org> on 2016/02/23 13:12:38 UTC

[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

GitHub user vectorijk opened a pull request:

    https://github.com/apache/spark/pull/11321

    [SPARK-7106][MLlib][PySpark] Support model save/load in Python's FPGrowth

    ## What changes were proposed in this pull request?
    
    Python API supports mode save/load in FPGrowth
    JIRA: [https://issues.apache.org/jira/browse/SPARK-7106](https://issues.apache.org/jira/browse/SPARK-7106)
    ## How was the this patch tested?
    
    The patch is tested with Python doctest.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vectorijk/spark spark-7106

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11321
    
----
commit b7145ac7e09d4dcc9b5b5874c59d48cf9e0f0860
Author: Kai Jiang <ji...@gmail.com>
Date:   2016-02-21T05:06:59Z

    [SPARK-7106][MLlib][PySpark] Support model save/load in Python's FPGrowth

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by vectorijk <gi...@git.apache.org>.
Github user vectorijk commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188006126
  
    @mengxr Thanks, I didn't notice that. Addressed comment.
    Also, I take a look at 9ca79c1 and it only moved cleanup temp file code for doctest under ml directory. Meanwhile, it seems like we should do the same thing under mllib. Should we also need to create a JIRA issue for this?
    
    _off-topic_
    
    I was wondering if Spark community would be interested in mentoring students for Google Summer of Code(GSoC) under Apache Software Foundation this year. Last time, I was impressed by the MechCoder's project mentored by @mengxr. Therefore, I look forward to having a chance to do something interesting and continue to contribute to codebase during this summer.
    
    @mengxr, Are you still interested in mentoring a MLlib PySpark related project this summer? If so, I am very willing to brainstorm with you and others on JIRA about what could be probably worked on during this summer. And I might start to write the proposal for GSoC. If there are some pre-GSoC issues related to project, I would love to work on those.
    
    P.S Here is the [post](http://apache-spark-developers-list.1001551.n3.nabble.com/GSoC-Interested-in-GSoC-2016-ideas-td16224.html) I published on dev mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188651306
  
    LGTM. Merged into master. Thanks!
    
    For GSoC, I created https://issues.apache.org/jira/browse/SPARK-13489 to collect some project ideas. Let's move our discussion there. If I don't have time to mentor a GSoC project, other committers might be interested. Could you prepare a draft proposal and post it on the JIRA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188002550
  
    **[Test build #51830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51830/consoleFull)** for PR 11321 at commit [`10580d2`](https://github.com/apache/spark/commit/10580d21c8e0023ed9e54e73c955ffcd6d1b6e92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188216513
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-187681236
  
    **[Test build #51768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51768/consoleFull)** for PR 11321 at commit [`b7145ac`](https://github.com/apache/spark/commit/b7145ac7e09d4dcc9b5b5874c59d48cf9e0f0860).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188008304
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51830/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188209489
  
    **[Test build #51874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51874/consoleFull)** for PR 11321 at commit [`f4b5357`](https://github.com/apache/spark/commit/f4b5357cfe606be3a4e9fb3774f33e44ff87ecfe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188008085
  
    **[Test build #51830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51830/consoleFull)** for PR 11321 at commit [`10580d2`](https://github.com/apache/spark/commit/10580d21c8e0023ed9e54e73c955ffcd6d1b6e92).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-187726854
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51768/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188216220
  
    **[Test build #51874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51874/consoleFull)** for PR 11321 at commit [`f4b5357`](https://github.com/apache/spark/commit/f4b5357cfe606be3a4e9fb3774f33e44ff87ecfe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-187726849
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11321#discussion_r53908665
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -40,6 +41,11 @@ class FPGrowthModel(JavaModelWrapper):
         >>> model = FPGrowth.train(rdd, 0.6, 2)
         >>> sorted(model.freqItemsets().collect())
         [FreqItemset(items=[u'a'], freq=4), FreqItemset(items=[u'c'], freq=3), ...
    +    >>> model_path = temp_path + "/fpg_model"
    --- End diff --
    
    ```/fpm``` is enough, because we only support save/load model under the old MLlib API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11321


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-187892025
  
    @vectorijk Please check this commit https://github.com/apache/spark/commit/9ca79c1ece5ad139719e4eea9f7d1b59aed01b20 and update your PR. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188216522
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51874/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-187726521
  
    **[Test build #51768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51768/consoleFull)** for PR 11321 at commit [`b7145ac`](https://github.com/apache/spark/commit/b7145ac7e09d4dcc9b5b5874c59d48cf9e0f0860).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class FPGrowthModel(JavaModelWrapper, JavaSaveable, JavaLoader):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by vectorijk <gi...@git.apache.org>.
Github user vectorijk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11321#discussion_r53924161
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -40,6 +41,11 @@ class FPGrowthModel(JavaModelWrapper):
         >>> model = FPGrowth.train(rdd, 0.6, 2)
         >>> sorted(model.freqItemsets().collect())
         [FreqItemset(items=[u'a'], freq=4), FreqItemset(items=[u'c'], freq=3), ...
    +    >>> model_path = temp_path + "/fpg_model"
    --- End diff --
    
    ok, I have done this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by vectorijk <gi...@git.apache.org>.
Github user vectorijk commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-187676196
  
    cc @mengxr @yanboliang  Could you take a look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by vectorijk <gi...@git.apache.org>.
Github user vectorijk commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188658913
  
    @mengxr, Thanks for replying!
    
    Definitely, I will post a rough draft proposal on JIRA later.
    
    On Wed, Feb 24, 2016 at 11:31 PM, Xiangrui Meng <no...@github.com>
    wrote:
    
    > LGTM. Merged into master. Thanks!
    >
    > For GSoC, I created https://issues.apache.org/jira/browse/SPARK-13489 to
    > collect some project ideas. Let's move our discussion there. If I don't
    > have time to mentor a GSoC project, other committers might be interested.
    > Could you prepare a draft proposal and post it on the JIRA?
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/11321#issuecomment-188651306>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11321#issuecomment-188008303
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org