You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BigCrunsh <gi...@git.apache.org> on 2014/08/27 15:08:04 UTC

[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...

Github user BigCrunsh commented on the pull request:

    https://github.com/apache/spark/pull/2137#issuecomment-53569214
  
    Currently, MLLIB contains linear models (GLMs) that produce scores based on an inner product, classification models that might derive a classification using scores, and probabilistic models that provide a confidence score (or a probability under some model assumption) in addition to the predicted class. Currently the score for classification models is only available by removing the threshold:
    ```scala
    val classes = model.predict(testset)
    val scores = model.clearThreshold().predict(testset)
    ```
    The threshold is lost after the last step and for LogReg it is not possible to access the (uncalibrated) score. However, depending on the model, I would expect that one has direct and consistent access to all of these values:
    ```scala
    val classes = model.predictClass(testset)
    val scores = model.predictScore(testset)
    val probs = model.predictProbability(testset)
    ```
    @mengxr: I think in general a probability is some measure of likeliness that an event will occur. It is often based on some more or less realistic model assumptions (e.g., normal assumption in regression, t-tests, etc.), isn't it? The exponential family, which is the assumption of the class-wise conditional distributions ``p(features|class)``, comprises commonly used distributions as multinomial, Poisson, and Gaussian distribution. The learning algorithm (with tuned hyper-parameter) is then "responsible" to calibrated these probabilities. Do you have a more appropriate name to distinguish between scores and "probabilities".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org