You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MechCoder <gi...@git.apache.org> on 2015/02/28 20:16:07 UTC

[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

GitHub user MechCoder opened a pull request:

    https://github.com/apache/spark/pull/4834

    [SPARK-6083] [MLLib] [DOC] Make Python API example consistent in NaiveBayes

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MechCoder/spark spark-6083

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4834.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4834
    
----
commit 0c5fe03dc516f65bf5ded301cefaebd6c34c03c9
Author: MechCoder <ma...@gmail.com>
Date:   2015-02-28T19:14:41Z

    [SPARK-6083] Make Python API example consistent in NaiveBayes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76623656
  
    LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76627473
  
      [Test build #28152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28152/consoleFull) for   PR 4834 at commit [`1cdd7b5`](https://github.com/apache/spark/commit/1cdd7b5a03e99810fe8ecc340fd933d7294ff15e).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76586315
  
      [Test build #28139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28139/consoleFull) for   PR 4834 at commit [`65bbbe9`](https://github.com/apache/spark/commit/65bbbe9fdc5c5095d223c2f15bacdee7ca973f11).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by MechCoder <gi...@git.apache.org>.
Github user MechCoder commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76541036
  
    cc: @mengxr Would you be able to verify this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4834#discussion_r25569774
  
    --- Diff: docs/mllib-naive-bayes.md ---
    @@ -115,22 +115,31 @@ used for evaluation and prediction.
     
     Note that the Python API does not yet support model save/load but will in the future.
     
    -<!-- TODO: Make Python's example consistent with Scala's and Java's. -->
     {% highlight python %}
    -from pyspark.mllib.regression import LabeledPoint
     from pyspark.mllib.classification import NaiveBayes
    +from pyspark.mllib.linalg import Vectors
    +from pyspark.mllib.regression import LabeledPoint
    +
    +data = sc.textFile("data/mllib/sample_naive_bayes_data.txt")
    +
    +# Preprocessing
    +splitData = data.map(lambda line: line.split(','))
    +parsedData = splitData.map(
    +  lambda parts: LabeledPoint(
    +    float(parts[0]),
    +    Vectors.dense(map(lambda x: float(x), parts[1].split(' ')))
    +    )
    +  )
     
    -# an RDD of LabeledPoint
    -data = sc.parallelize([
    -  LabeledPoint(0.0, [0.0, 0.0])
    -  ... # more labeled points
    -])
    +# Split data into training (60%) and test (40%)
    --- End diff --
    
    `data into` -> `data approximately into`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76627476
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28152/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76622580
  
    @MechCoder Thanks for the update! We only have 6 lines in `sample_naive_bayes_data.txt`. That's why some random seed would give bad splits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76586319
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28139/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by MechCoder <gi...@git.apache.org>.
Github user MechCoder commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76623459
  
    @mengxr fixed !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by MechCoder <gi...@git.apache.org>.
Github user MechCoder commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76578092
  
    I changed the randomSplit seed and it works better. It should look good now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76578360
  
      [Test build #28139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28139/consoleFull) for   PR 4834 at commit [`65bbbe9`](https://github.com/apache/spark/commit/65bbbe9fdc5c5095d223c2f15bacdee7ca973f11).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76541017
  
      [Test build #28130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28130/consoleFull) for   PR 4834 at commit [`0c5fe03`](https://github.com/apache/spark/commit/0c5fe03dc516f65bf5ded301cefaebd6c34c03c9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76544712
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28130/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4834#discussion_r25569772
  
    --- Diff: docs/mllib-naive-bayes.md ---
    @@ -115,22 +115,31 @@ used for evaluation and prediction.
     
     Note that the Python API does not yet support model save/load but will in the future.
     
    -<!-- TODO: Make Python's example consistent with Scala's and Java's. -->
     {% highlight python %}
    -from pyspark.mllib.regression import LabeledPoint
     from pyspark.mllib.classification import NaiveBayes
    +from pyspark.mllib.linalg import Vectors
    +from pyspark.mllib.regression import LabeledPoint
    +
    +data = sc.textFile("data/mllib/sample_naive_bayes_data.txt")
    +
    +# Preprocessing
    +splitData = data.map(lambda line: line.split(','))
    +parsedData = splitData.map(
    --- End diff --
    
    We can define a parse function to make the code more readable. Btw, we use 4 space indentation in Python, following PEP8.
    
    ~~~python
    def parseLine(line):
        parts = line.split(',')
        label = float(parts[0])
        features = Vector.dense([float(x) for x in parts[1].split(' ')])
        return LabeledPoint(label, features)
    
    data = sc.textFile('data/mllib/sample_naive_bayes_data.txt').map(parseLine)
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76623530
  
      [Test build #28152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28152/consoleFull) for   PR 4834 at commit [`1cdd7b5`](https://github.com/apache/spark/commit/1cdd7b5a03e99810fe8ecc340fd933d7294ff15e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4834


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by MechCoder <gi...@git.apache.org>.
Github user MechCoder commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76541233
  
    Hmm. I get an a accuracy of zero for the given example. Not sure where I'm going wrong though :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76544707
  
      [Test build #28130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28130/consoleFull) for   PR 4834 at commit [`0c5fe03`](https://github.com/apache/spark/commit/0c5fe03dc516f65bf5ded301cefaebd6c34c03c9).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by MechCoder <gi...@git.apache.org>.
Github user MechCoder commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76622887
  
    Great. Do you have any more comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4834#issuecomment-76641038
  
    Merged into master and branch-1.3. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org