You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BryanCutler <gi...@git.apache.org> on 2015/10/16 02:04:25 UTC

[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/9141

    [SPARK-10560] [PySpark] [MLlib] [Docs] Make StreamingLogisticRegressionWithSGD Python API equal to Scala one

    This is to bring the API documentation of StreamingLogisticReressionWithSGD and StreamingLinearRegressionWithSGC in line with the Scala versions.  
    
    -Fixed the algorithm descriptions
    -Added default values to parameter descriptions
    -Changed StreamingLogisticRegressionWithSGD regParam to default to 0, as in the Scala version

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark StreamingLogisticRegressionWithSGD-python-api-sync

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9141
    
----
commit 3f0ab59da33de6ded4c70956425fbd5ad96b182e
Author: Bryan Cutler <bj...@us.ibm.com>
Date:   2015-10-14T17:43:43Z

    [SPARK-10560] Align PySpark Streaming Logistic,Linear RegressionWithSGD documentation with Scala versions and include default values

commit 7343f4c38aa78582f88bc76f27f31ee1896df55f
Author: Bryan Cutler <bj...@us.ibm.com>
Date:   2015-10-14T17:45:27Z

    [SPARK-10560] Change default value of StreamingLogisticRegressionWithSGD regularization param to align with Scala version

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-149127556
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-148560703
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-149127559
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43918/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-149124316
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9141#discussion_r42292808
  
    --- Diff: python/pyspark/mllib/classification.py ---
    @@ -594,19 +594,27 @@ def train(cls, data, lambda_=1.0):
     @inherit_doc
     class StreamingLogisticRegressionWithSGD(StreamingLinearAlgorithm):
         """
    -    Run LogisticRegression with SGD on a batch of data.
    -
    -    The weights obtained at the end of training a stream are used as initial
    -    weights for the next batch.
    -
    -    :param stepSize: Step size for each iteration of gradient descent.
    -    :param numIterations: Number of iterations run for each batch of data.
    -    :param miniBatchFraction: Fraction of data on which SGD is run for each
    -                              iteration.
    -    :param regParam: L2 Regularization parameter.
    -    :param convergenceTol: A condition which decides iteration termination.
    +    Train or predict a logistic regression model on streaming data. Training uses
    +    Stochastic Gradient Descent to update the model based on each new batch of
    +    incoming data from a DStream.
    +
    +    Each batch of data is assumed to be an RDD of LabeledPoints.
    +    The number of data points per batch can vary, but the number
    +    of features must be constant. An initial weight
    +    vector must be provided.
    +
    +    :param stepSize:          Step size for each iteration of gradient descent.
    --- End diff --
    
    We shouldn't do vertical alignment. If in the future we add a new parameter with a long name, we have to change all lines. There are two options:
    
    ~~~
    :param stepSize: Step size for each iteration of gradient descent.
    ~~~
    
    or 
    
    ~~~
    :param stepSize:
      Step size for each iteration of gradient descent.
    ~~~
    
    I think the latter one is better because it doesn't affected by the length of the parameter name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-149124303
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-148557974
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-148560705
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43816/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-148558027
  
    This came out of [SPARK-10022](https://issues.apache.org/jira/browse/SPARK-10022) Scala-Python method/parameter inconsistency check for ML during 1.5 QA
    
    cc @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-149122916
  
    Good point about vertical alignment.  I second the latter format, it improves the readability a little.  I'll implement that here, and I wouldn't mind doing the same for the other parts of pyspark.mllib so there is a consistent format for params.  I can open up a related JIRA for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-148558523
  
      [Test build #43816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43816/consoleFull) for   PR 9141 at commit [`7343f4c`](https://github.com/apache/spark/commit/7343f4c38aa78582f88bc76f27f31ee1896df55f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-149124897
  
    **[Test build #43918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43918/consoleFull)** for PR 9141 at commit [`407493d`](https://github.com/apache/spark/commit/407493df92559de5586e018ec7912950d74d628c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-148560625
  
      [Test build #43816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43816/console) for   PR 9141 at commit [`7343f4c`](https://github.com/apache/spark/commit/7343f4c38aa78582f88bc76f27f31ee1896df55f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-148557957
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10560] [PySpark] [MLlib] [Docs] Make St...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9141#issuecomment-149127452
  
    **[Test build #43918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43918/consoleFull)** for PR 9141 at commit [`407493d`](https://github.com/apache/spark/commit/407493df92559de5586e018ec7912950d74d628c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org