You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by goodsoldiersvejk <gi...@git.apache.org> on 2016/06/05 06:56:18 UTC

[GitHub] spark pull request #13516: [MLLIB][DOC] Edit logistic regression docs to pro...

GitHub user goodsoldiersvejk opened a pull request:

    https://github.com/apache/spark/pull/13516

    [MLLIB][DOC] Edit logistic regression docs to provide context for distinction with linear svms

    ## What changes were proposed in this pull request?
    The distinction to linear support vector machines is alluded to very briefly in the documentation for logistic regression ('mllib-linear-methods.md'). The distinction is rewritten to give some clarity and context; in particular its formally being a Bayesian model.
    
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    manual tests
    
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/goodsoldiersvejk/spark logisticregressiondocs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13516
    
----
commit 2ce6cbc07862bcc1cd6556106439532039410de4
Author: goodsoldiersvejk <sv...@gmail.com>
Date:   2016-06-05T06:37:08Z

    Edit logistic regression documentation in
    'mllib-linear-methods.md' to provide some context
    to the distinction with linear SVMs already alluded
    to.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13516: [MLLIB][DOC] Edit logistic regression docs to pro...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13516#discussion_r65818408
  
    --- Diff: docs/mllib-linear-methods.md ---
    @@ -257,10 +257,10 @@ applying the logistic function
     \mathrm{f}(z) = \frac{1}{1 + e^{-z}}
     \]`
     where $z = \wv^T \x$.
    -By default, if $\mathrm{f}(\wv^T x) > 0.5$, the outcome is positive, or
    -negative otherwise, though unlike linear SVMs, the raw output of the logistic regression
    -model, $\mathrm{f}(z)$, has a probabilistic interpretation (i.e., the probability
    -that $\x$ is positive).
    +By default, if $\mathrm{f}(\wv^T x) > 0.5$, the outcome is positive, else it is negative.
    +Logistic regression is distinct from say linear SVMs in its formally being a Bayesian model, albeit trivial one: rather than producing directly an 'input-output machine', the conditional distribution of the output given the input is modeled explicitly through the function $\mathrm{f}$ above; this model can be and is then used to provide definite outputs for definite inputs.
    --- End diff --
    
    Even if this is accurate, I don't think it's an improvement. This drops the key points, that the output of logistic regression may be interpreted as a probability. I don't believe this description clarifies anything for Spark users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13516: [MLLIB][DOC] Edit logistic regression docs to provide co...

Posted by goodsoldiersvejk <gi...@git.apache.org>.
Github user goodsoldiersvejk commented on the issue:

    https://github.com/apache/spark/pull/13516
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13516: [MLLIB][DOC] Edit logistic regression docs to pro...

Posted by goodsoldiersvejk <gi...@git.apache.org>.
Github user goodsoldiersvejk closed the pull request at:

    https://github.com/apache/spark/pull/13516


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13516: [MLLIB][DOC] Edit logistic regression docs to provide co...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/13516
  
    I think the Spark documentation should focus on how to use SVMs and LR in Spark, with the occasional practical note. This is quite theoretical in comparison, and the kind of thing someone should look for in complete references on these methods. That is, these notes aren't specific to Spark or particularly helpful in understanding how to use these methods. I don't think it will mean much to the large majority of readers of these docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13516: [MLLIB][DOC] Edit logistic regression docs to provide co...

Posted by goodsoldiersvejk <gi...@git.apache.org>.
Github user goodsoldiersvejk commented on the issue:

    https://github.com/apache/spark/pull/13516
  
    Thanks for your comment. The exact wording can be made more explicit, but that key point is implicit in the conditional distribution of y given x being modeled in logistic regression. The Spark mllib documentation attempts to and I think should provide a balance between operational use and context. The Spark user wonders why choose logistic regression over linear svms if operationally they can be the same but "the raw output of the logistic regression .... has a probabilistic interpretation". This hints at but skirts the difference in methodology by which the probabilistic interpretation becomes obvious.  I would still rewrite this to provide context for Bayesian methods (maybe say a reference link).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org