You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by WeichenXu123 <gi...@git.apache.org> on 2017/03/21 08:54:40 UTC

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

GitHub user WeichenXu123 opened a pull request:

    https://github.com/apache/spark/pull/17373

    [SPARK-12664] Expose probability in mlp model

    ## What changes were proposed in this pull request?
    
    Modify MLP model to inherit `ProbabilisticClassificationModel` and so that it can expose the probability  column when transforming data.
    
    ## How was this patch tested?
    
    Test added.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/WeichenXu123/spark expose_probability_in_mlp_model

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17373
    
----
commit 1f0da4e8ebff8509ac3bc6f06004cbecff6356e9
Author: WeichenXu <we...@outlook.com>
Date:   2017-03-20T14:31:14Z

    expose probability in mlp model

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r129697649
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private(
       private var outputs: Array[BDM[Double]] = null
       private var deltas: Array[BDM[Double]] = null
     
    -  override def forward(data: BDM[Double]): Array[BDM[Double]] = {
    +  override def forward(data: BDM[Double], containsLastLayer: Boolean): Array[BDM[Double]] = {
         // Initialize output arrays for all layers. Special treatment for InPlace
    --- End diff --
    
    The last layer is always `softmax`, add the `containsLastLayer` parameter, when `true` the forward computing will contains last layer, otherwise not. The parameter is used when we need `rawPrediction`, the last layer `softmax` should discard.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80411/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by alwaysprep <gi...@git.apache.org>.
Github user alwaysprep commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    In which version this is going to be available on PySpark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80411/testReport)** for PR 17373 at commit [`0b00908`](https://github.com/apache/spark/commit/0b009085f25d1cf9ad7e4556e8f7d28e46ebf2cb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80684/testReport)** for PR 17373 at commit [`eedc647`](https://github.com/apache/spark/commit/eedc64744ae28c7fcc8e81524e182f49b73b8406).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    cc @jkbradley Code updated, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    cc @yanboliang thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #3894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3894/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r130746665
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private(
       private var outputs: Array[BDM[Double]] = null
       private var deltas: Array[BDM[Double]] = null
     
    -  override def forward(data: BDM[Double]): Array[BDM[Double]] = {
    +  override def forward(data: BDM[Double], containsLastLayer: Boolean): Array[BDM[Double]] = {
    --- End diff --
    
    Could we use a variable name like`includeLastLayer` here? `containsLastLayer` sounds like a property of the model instead of an instruction to the method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #3895 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3895/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #79374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79374/testReport)** for PR 17373 at commit [`4cf8cee`](https://github.com/apache/spark/commit/4cf8ceec03a890d28901fe47dd2876918a79ecb0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133076755
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,49 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("strong dataset test") {
    +    val layers = Array[Int](4, 5, 5, 2)
    --- End diff --
    
    Can you make this test faster by using a simpler network, e.g., by removing one of the middle layers?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80684/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r129697890
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -527,9 +544,21 @@ private[ml] class FeedForwardModel private(
     
       override def predict(data: Vector): Vector = {
         val size = data.size
    -    val result = forward(new BDM[Double](size, 1, data.toArray))
    +    val result = forward(new BDM[Double](size, 1, data.toArray), true)
         Vectors.dense(result.last.toArray)
       }
    +
    +  override def predictRaw(data: Vector): Vector = {
    +    val size = data.size
    --- End diff --
    
    add `predictRaw` method, computing without last layer (softmax)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #79374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79374/testReport)** for PR 17373 at commit [`4cf8cee`](https://github.com/apache/spark/commit/4cf8ceec03a890d28901fe47dd2876918a79ecb0).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80371/testReport)** for PR 17373 at commit [`645fdc4`](https://github.com/apache/spark/commit/645fdc416d8529689de7be3d734b8c07f3a80893).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133323889
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -374,6 +380,22 @@ private[ann] trait TopologyModel extends Serializable {
       def predict(data: Vector): Vector
     
       /**
    +   * Raw prediction of the model
    +   *
    +   * @param data input data
    +   * @return raw prediction
    +   */
    +  def predictRaw(data: Vector): Vector
    --- End diff --
    
    Ping: rename data -> features


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133075752
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,49 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("strong dataset test") {
    +    val layers = Array[Int](4, 5, 5, 2)
    +
    +    val strongDataset = Seq(
    +      (Vectors.dense(1, 2, 3, 4), 0d, Vectors.dense(1d, 0d)),
    +      (Vectors.dense(4, 3, 2, 1), 1d, Vectors.dense(0d, 1d)),
    +      (Vectors.dense(1, 1, 1, 1), 0d, Vectors.dense(.5, .5)),
    +      (Vectors.dense(1, 1, 1, 1), 1d, Vectors.dense(.5, .5))
    +    ).toDF("features", "label", "expectedProbability")
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(strongDataset)
    +    val result = model.transform(strongDataset)
    +    model.setProbabilityCol("probability")
    +    MLTestingUtils.checkCopyAndUids(trainer, model)
    --- End diff --
    
    checkCopyAndUids is a generic test which should only be run in a single test; it does not need to be run in each test.  Please remove it from here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by nicodri <gi...@git.apache.org>.
Github user nicodri commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    @WeichenXu123 are you keep working on this own or do you want me to take it over? I'm also interested in adding this feature. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133322927
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,49 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("strong dataset test") {
    +    val layers = Array[Int](4, 5, 5, 2)
    +
    +    val strongDataset = Seq(
    +      (Vectors.dense(1, 2, 3, 4), 0d, Vectors.dense(1d, 0d)),
    +      (Vectors.dense(4, 3, 2, 1), 1d, Vectors.dense(0d, 1d)),
    +      (Vectors.dense(1, 1, 1, 1), 0d, Vectors.dense(.5, .5)),
    +      (Vectors.dense(1, 1, 1, 1), 1d, Vectors.dense(.5, .5))
    +    ).toDF("features", "label", "expectedProbability")
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(strongDataset)
    +    val result = model.transform(strongDataset)
    +    model.setProbabilityCol("probability")
    +    MLTestingUtils.checkCopyAndUids(trainer, model)
    +    // result.select("probability").show(false)
    +    result.select("probability", "expectedProbability").collect().foreach {
    +      case Row(p: Vector, e: Vector) =>
    +        assert(p ~== e absTol 1e-3)
    +    }
    +  }
    +
    +  test("test model probability") {
    +    val layers = Array[Int](2, 5, 2)
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(dataset)
    +    model.setProbabilityCol("probability")
    --- End diff --
    
    Ping --- this should not be necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80409/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    @WeichenXu123  Can you please add "[ML]" to the PR title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80946/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133080046
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -361,9 +361,15 @@ private[ann] trait TopologyModel extends Serializable {
        * Forward propagation
        *
        * @param data input data
    +   * @param includeLastLayer include last layer when computing. In MultilayerPerceptronClassifier,
    --- End diff --
    
    This text is unclear.  This phrasing is better: "Include the last layer in the output. In MultilayerPerceptronClassifier, the last layer is always softmax; the last layer of outputs is needed for class predictions, but not for rawPrediction."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #3895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3895/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    I think this will change the output of `summary` on `spark.mlp` in R right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by LeoIV <gi...@git.apache.org>.
Github user LeoIV commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Alright, thanks. But have a look at the probabilities. They aren’t in [0,1] either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #79806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79806/testReport)** for PR 17373 at commit [`fb83553`](https://github.com/apache/spark/commit/fb83553822e0fd5022a5c38b5b68c069450f98c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80689/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17373


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80409/testReport)** for PR 17373 at commit [`bcb44af`](https://github.com/apache/spark/commit/bcb44af65c3c7f9c0ead6cff5706243da80f88bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    @LeoIV sorry for delay! I will update code soon! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74965/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #74965 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74965/testReport)** for PR 17373 at commit [`1f0da4e`](https://github.com/apache/spark/commit/1f0da4e8ebff8509ac3bc6f06004cbecff6356e9).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133082607
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,49 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("strong dataset test") {
    +    val layers = Array[Int](4, 5, 5, 2)
    +
    +    val strongDataset = Seq(
    +      (Vectors.dense(1, 2, 3, 4), 0d, Vectors.dense(1d, 0d)),
    +      (Vectors.dense(4, 3, 2, 1), 1d, Vectors.dense(0d, 1d)),
    +      (Vectors.dense(1, 1, 1, 1), 0d, Vectors.dense(.5, .5)),
    +      (Vectors.dense(1, 1, 1, 1), 1d, Vectors.dense(.5, .5))
    +    ).toDF("features", "label", "expectedProbability")
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(strongDataset)
    +    val result = model.transform(strongDataset)
    +    model.setProbabilityCol("probability")
    +    MLTestingUtils.checkCopyAndUids(trainer, model)
    +    // result.select("probability").show(false)
    +    result.select("probability", "expectedProbability").collect().foreach {
    +      case Row(p: Vector, e: Vector) =>
    +        assert(p ~== e absTol 1e-3)
    +    }
    +  }
    +
    +  test("test model probability") {
    +    val layers = Array[Int](2, 5, 2)
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(dataset)
    +    model.setProbabilityCol("probability")
    +    val result = model.transform(dataset)
    +    val features2prob = udf { features: Vector => model.mlpModel.predict(features) }
    +    val cmpVec = udf { (v1: Vector, v2: Vector) => v1 ~== v2 relTol 1e-3 }
    +    assert(result.select(cmpVec(features2prob(col("features")), col("probability")))
    --- End diff --
    
    If this test fails, it will not give much info.  How about collecting the data and comparing on the driver?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r132021227
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -107,9 +103,9 @@ class MultilayerPerceptronClassifierSuite
         model.setProbabilityCol("probability")
         MLTestingUtils.checkCopyAndUids(trainer, model)
         // result.select("probability").show(false)
    --- End diff --
    
    The result is
    ```
    +------------------------------------------+
    |probability                               |
    +------------------------------------------+
    |[0.9999995713441748,4.2865582522823835E-7]|
    |[1.992910055147819E-9,0.9999999980070899] |
    |[0.4999458983233704,0.5000541016766296]   |
    |[0.4999458983233704,0.5000541016766296]   |
    +------------------------------------------+
    ```
    cc @MrBago Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #3894 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3894/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Thanks for the changes @WeichenXu123. I don't have any other comments @jkbradley do you want to have a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80371/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    cc @jkbradley @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80762/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133324363
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -527,9 +550,21 @@ private[ml] class FeedForwardModel private(
     
       override def predict(data: Vector): Vector = {
         val size = data.size
    -    val result = forward(new BDM[Double](size, 1, data.toArray))
    +    val result = forward(new BDM[Double](size, 1, data.toArray), true)
         Vectors.dense(result.last.toArray)
       }
    +
    +  override def predictRaw(data: Vector): Vector = {
    +    val size = data.size
    +    val result = forward(new BDM[Double](size, 1, data.toArray), false)
    +    Vectors.dense(result(result.length - 2).toArray)
    +  }
    +
    +  override def raw2ProbabilityInPlace(data: Vector): Vector = {
    +    val dataMatrix = new BDM[Double](data.size, 1, data.toArray)
    +    layerModels.last.eval(dataMatrix, dataMatrix)
    --- End diff --
    
    Ping: If this proposal sounds good, then can you please update accordingly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by LeoIV <gi...@git.apache.org>.
Github user LeoIV commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    I checked out version 2.1.2-SNAPSHOT and performed your changes there (for me locally). It works, however the probabilites are not in range [-1,1]. Is this intended?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79950/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80413 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80413/testReport)** for PR 17373 at commit [`df7439e`](https://github.com/apache/spark/commit/df7439eae0f2786ae66831d9de26b1ef63f92dc3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #75835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75835/testReport)** for PR 17373 at commit [`4cf8cee`](https://github.com/apache/spark/commit/4cf8ceec03a890d28901fe47dd2876918a79ecb0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Oh OK makes sense.  @WeichenXu123 could you please open a JIRA (linked from this task's JIRA) and CC @felixcheung on it?  Thanks!
    
    I'll rerun tests to be safe and merge this afterwards.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r131790673
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private(
       private var outputs: Array[BDM[Double]] = null
       private var deltas: Array[BDM[Double]] = null
     
    -  override def forward(data: BDM[Double]): Array[BDM[Double]] = {
    +  override def forward(data: BDM[Double], containsLastLayer: Boolean): Array[BDM[Double]] = {
         // Initialize output arrays for all layers. Special treatment for InPlace
    --- End diff --
    
    @MrBago In `MultiLayerPerceptronClassifier.train` there is a line:
    ```
    val topology = FeedForwardTopology.multiLayerPerceptron(myLayers, softmaxOnTop = true)
    ```
    So MultiLayerPerceptronClassifier always use softmax as the last layer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    We can open a JIRA to track


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #75835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75835/testReport)** for PR 17373 at commit [`4cf8cee`](https://github.com/apache/spark/commit/4cf8ceec03a890d28901fe47dd2876918a79ecb0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133079322
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -374,6 +380,22 @@ private[ann] trait TopologyModel extends Serializable {
       def predict(data: Vector): Vector
     
       /**
    +   * Raw prediction of the model
    +   *
    +   * @param data input data
    +   * @return raw prediction
    +   */
    +  def predictRaw(data: Vector): Vector
    --- End diff --
    
    Please match the Classifier API here (data -> features) unless there's a reason to deviate


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80409/testReport)** for PR 17373 at commit [`bcb44af`](https://github.com/apache/spark/commit/bcb44af65c3c7f9c0ead6cff5706243da80f88bc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    They are, I think some values are something like 4.7532244532E-10 the display truncate them.
    Thanks
    
    Sent from my iPhone
    
    On 15 Jul 2017, at 12:35 AM, Leonard Hövelmann <no...@github.com>> wrote:
    
    
    Alright, thanks. But have a look at the probabilities. They aren’t in [0,1] either.
    
    —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub<https://github.com/apache/spark/pull/17373#issuecomment-315516792>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ASWEklwhRnYnSO_KDKASS09NLsaw_Aq2ks5sOGvZgaJpZM4MjfvG>.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133079098
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -374,6 +380,22 @@ private[ann] trait TopologyModel extends Serializable {
       def predict(data: Vector): Vector
     
       /**
    +   * Raw prediction of the model
    +   *
    +   * @param data input data
    +   * @return raw prediction
    +   */
    +  def predictRaw(data: Vector): Vector
    +
    +  /**
    +   * Probability of the model
    --- End diff --
    
    This documentation does not add any information.  Can you please link to ProbabilisticClassifier instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #79806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79806/testReport)** for PR 17373 at commit [`fb83553`](https://github.com/apache/spark/commit/fb83553822e0fd5022a5c38b5b68c069450f98c7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80689/testReport)** for PR 17373 at commit [`eedc647`](https://github.com/apache/spark/commit/eedc64744ae28c7fcc8e81524e182f49b73b8406).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    The probability should always between 0 and 1
    Send me your test code and test data to help me find out where is wrong.
    In my own test the result is ok.
    
    Sent from my iPhone
    
    On 12 Jul 2017, at 11:57 PM, Leonard Hövelmann <no...@github.com>> wrote:
    
    
    I checked out version 2.1.2-SNAPSHOT and performed your changes there (for me locally). It works, however the probabilites are not in range [-1,1]. Is this intended?
    
    —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub<https://github.com/apache/spark/pull/17373#issuecomment-314989403>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ASWEkm7B6dTvsgY3C9IZRrsPQTnFW_-1ks5sNb_KgaJpZM4MjfvG>.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #74965 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74965/testReport)** for PR 17373 at commit [`1f0da4e`](https://github.com/apache/spark/commit/1f0da4e8ebff8509ac3bc6f06004cbecff6356e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by LeoIV <gi...@git.apache.org>.
Github user LeoIV commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Right 🙈  That explains why it works perfectly fine in with my classifier :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80371/testReport)** for PR 17373 at commit [`645fdc4`](https://github.com/apache/spark/commit/645fdc416d8529689de7be3d734b8c07f3a80893).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    @nicodri Hi, I am modifying this PR and will commit this week! Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Thinking more about the proposal about separating the classification-specific logic out of the generic Topology, it's something we should definitely do at some point, but I'm OK with leaving it as is for now.  Adding new, unused classes is probably not worth the trouble right now.  Can you please document very clearly, though, that predictRaw and raw2ProbabilityInPlace are only for classification?  Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133079087
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -374,6 +380,22 @@ private[ann] trait TopologyModel extends Serializable {
       def predict(data: Vector): Vector
     
       /**
    +   * Raw prediction of the model
    --- End diff --
    
    This documentation does not add any information.  Can you please link to ProbabilisticClassifier instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133076328
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,49 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("strong dataset test") {
    --- End diff --
    
    Make the test title more descriptive so it is clear what it is testing.  E.g. "Predicted class probabilities: calibration on toy dataset"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75835/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80684/testReport)** for PR 17373 at commit [`eedc647`](https://github.com/apache/spark/commit/eedc64744ae28c7fcc8e81524e182f49b73b8406).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80411/testReport)** for PR 17373 at commit [`0b00908`](https://github.com/apache/spark/commit/0b009085f25d1cf9ad7e4556e8f7d28e46ebf2cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133080187
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -527,9 +550,21 @@ private[ml] class FeedForwardModel private(
     
       override def predict(data: Vector): Vector = {
         val size = data.size
    -    val result = forward(new BDM[Double](size, 1, data.toArray))
    +    val result = forward(new BDM[Double](size, 1, data.toArray), true)
         Vectors.dense(result.last.toArray)
       }
    +
    +  override def predictRaw(data: Vector): Vector = {
    +    val size = data.size
    --- End diff --
    
    This temp val of "size" is only used once, so I recommend removing it to make the code clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    @felixcheung How will this affect ```spark.mlp``` in R?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133075802
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,49 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("strong dataset test") {
    +    val layers = Array[Int](4, 5, 5, 2)
    +
    +    val strongDataset = Seq(
    +      (Vectors.dense(1, 2, 3, 4), 0d, Vectors.dense(1d, 0d)),
    +      (Vectors.dense(4, 3, 2, 1), 1d, Vectors.dense(0d, 1d)),
    +      (Vectors.dense(1, 1, 1, 1), 0d, Vectors.dense(.5, .5)),
    +      (Vectors.dense(1, 1, 1, 1), 1d, Vectors.dense(.5, .5))
    +    ).toDF("features", "label", "expectedProbability")
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(strongDataset)
    +    val result = model.transform(strongDataset)
    +    model.setProbabilityCol("probability")
    +    MLTestingUtils.checkCopyAndUids(trainer, model)
    +    // result.select("probability").show(false)
    --- End diff --
    
    remove old comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80762/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r130747996
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,23 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("test model probability") {
    +    val layers = Array[Int](2, 5, 2)
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(dataset)
    +    model.setProbabilityCol("probability")
    +    val result = model.transform(dataset)
    +    val features2prob = udf { features: Vector => model.mlpModel.predict(features) }
    +    val cmpVec = udf { (v1: Vector, v2: Vector) => v1 ~== v2 relTol 1e-3 }
    +    assert(result.select(cmpVec(features2prob(col("features")), col("probability")))
    +      .rdd.map(_.getBoolean(0)).reduce(_ && _))
    +  }
    +
    --- End diff --
    
    I think we should include a stronger test for this. I did a quick search and couldn't find a strong test for `mlpModel.predict`, it might be good to add one. Also, I believe this xor dataset only produces probability predictions ~equal to 0 or 1.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r129698281
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -527,9 +544,21 @@ private[ml] class FeedForwardModel private(
     
       override def predict(data: Vector): Vector = {
         val size = data.size
    -    val result = forward(new BDM[Double](size, 1, data.toArray))
    +    val result = forward(new BDM[Double](size, 1, data.toArray), true)
         Vectors.dense(result.last.toArray)
       }
    +
    +  override def predictRaw(data: Vector): Vector = {
    +    val size = data.size
    +    val result = forward(new BDM[Double](size, 1, data.toArray), false)
    +    Vectors.dense(result(result.length - 2).toArray)
    +  }
    +
    +  override def raw2ProbabilityInPlace(data: Vector): Vector = {
    +    val dataMatrix = new BDM[Double](data.size, 1, data.toArray)
    --- End diff --
    
    add `raw2ProbabilityInPlace`, what it compute is:
    ```
    softmax(rawPredictionsVector) ==> predictionsVector
    ```
    directly call the last layer function to compute it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80689/testReport)** for PR 17373 at commit [`eedc647`](https://github.com/apache/spark/commit/eedc64744ae28c7fcc8e81524e182f49b73b8406).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    @felixcheung So it do not cause bugs in sparkR, we can leave it in a separated PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r130746928
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private(
       private var outputs: Array[BDM[Double]] = null
       private var deltas: Array[BDM[Double]] = null
     
    -  override def forward(data: BDM[Double]): Array[BDM[Double]] = {
    +  override def forward(data: BDM[Double], containsLastLayer: Boolean): Array[BDM[Double]] = {
         // Initialize output arrays for all layers. Special treatment for InPlace
    --- End diff --
    
    Could you add the above comment in the code, it could be useful for folks reading/editing this in the future.
    
    Also it seems like the last layer could also be a SigmoidLayerWithSqueredError or a SigmiodFunction do we need to hand those cases any differently?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133082415
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,49 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("strong dataset test") {
    +    val layers = Array[Int](4, 5, 5, 2)
    +
    +    val strongDataset = Seq(
    +      (Vectors.dense(1, 2, 3, 4), 0d, Vectors.dense(1d, 0d)),
    +      (Vectors.dense(4, 3, 2, 1), 1d, Vectors.dense(0d, 1d)),
    +      (Vectors.dense(1, 1, 1, 1), 0d, Vectors.dense(.5, .5)),
    +      (Vectors.dense(1, 1, 1, 1), 1d, Vectors.dense(.5, .5))
    +    ).toDF("features", "label", "expectedProbability")
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(strongDataset)
    +    val result = model.transform(strongDataset)
    +    model.setProbabilityCol("probability")
    +    MLTestingUtils.checkCopyAndUids(trainer, model)
    +    // result.select("probability").show(false)
    +    result.select("probability", "expectedProbability").collect().foreach {
    +      case Row(p: Vector, e: Vector) =>
    +        assert(p ~== e absTol 1e-3)
    +    }
    +  }
    +
    +  test("test model probability") {
    +    val layers = Array[Int](2, 5, 2)
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(dataset)
    +    model.setProbabilityCol("probability")
    --- End diff --
    
    That's the default already, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80413/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merging with master
    Thanks @WeichenXu123 !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80946/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80413/testReport)** for PR 17373 at commit [`df7439e`](https://github.com/apache/spark/commit/df7439eae0f2786ae66831d9de26b1ef63f92dc3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r130747030
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -363,7 +363,7 @@ private[ann] trait TopologyModel extends Serializable {
        * @param data input data
        * @return array of outputs for each of the layers
        */
    -  def forward(data: BDM[Double]): Array[BDM[Double]]
    +  def forward(data: BDM[Double], containsLastLayer: Boolean): Array[BDM[Double]]
    --- End diff --
    
    Can you update the docstring for this method to add the argument?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    RawPrediction is not probability
    It's range is from -inf to inf
    Softmax(raw predictions) get probabilities
    It's range is from 0 to 1
    Thanks!
    
    Sent from my iPhone
    
    On 14 Jul 2017, at 6:38 AM, Leonard Hövelmann <no...@github.com>> wrote:
    
    
    @WeichenXu123<https://github.com/weichenxu123> I deleted my last comment because I wasn't quite sure, if I had no mistakes at other places. As I described above, I performed your changes in version 2.1. For small datasets, I get raw predictions, that are not in [0, 1]. You should be able to check it, using this small test case:
    
    import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
    import org.apache.spark.ml.linalg.Vectors
    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
    import org.apache.spark.sql.types.{IntegerType, StructType}
    import org.apache.spark.sql.{Row, SparkSession}
    
    /**
      * Created by Leonard Hövelmann (leonard.hoevelmann@adesso.de<ma...@adesso.de>) on 14.07.2017.
      */
    object TestProb {
    
      def main(args: Array[String]) = {
        val spark = SparkSession.builder().master("local[*]").getOrCreate()
    
        val rowSchema = new StructType().add("class", IntegerType).add("features", org.apache.spark.ml.linalg.SQLDataTypes.VectorType)
    
        val testData: RDD[Row] = spark.sparkContext.parallelize(Seq(
          new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 0.4, 0.5))), rowSchema).asInstanceOf[Row]
        ))
    
        val testDataDf = spark.sqlContext.createDataFrame(testData, rowSchema)
    
        val mlp = new MultilayerPerceptronClassifier().setFeaturesCol("features").setLabelCol("class").setLayers(Array(5, 4, 3))
    
        val mlpModel = mlp.fit(testDataDf)
    
        mlpModel.transform(testDataDf).show(6)
      }
    
    }
    
    
    Using this, I get the following results:
    
    +-----+--------------------+--------------------+--------------------+----------+
    |class|            features|       rawPrediction|         probability|prediction|
    +-----+--------------------+--------------------+--------------------+----------+
    |    0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|       0.0|
    |    0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|       0.0|
    |    1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|       1.0|
    |    1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|       1.0|
    |    2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|       2.0|
    |    2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|       2.0|
    +-----+--------------------+--------------------+--------------------+----------+
    
    
    Does this work in your code?
    
    —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub<https://github.com/apache/spark/pull/17373#issuecomment-315360925>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ASWEkjtBMz-Hk6Uujc-VS1Nn5fnsPOhhks5sN27_gaJpZM4MjfvG>.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #79950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79950/testReport)** for PR 17373 at commit [`14c4c6c`](https://github.com/apache/spark/commit/14c4c6c8ebd5bfad3bd797a9ac5e1dfee1438c3f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class MultilayerPerceptronClassificationModel(JavaModel, JavaClassificationModel, JavaMLWritable,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79806/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133079226
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -374,6 +380,22 @@ private[ann] trait TopologyModel extends Serializable {
       def predict(data: Vector): Vector
     
       /**
    +   * Raw prediction of the model
    +   *
    +   * @param data input data
    +   * @return raw prediction
    +   */
    +  def predictRaw(data: Vector): Vector
    +
    +  /**
    +   * Probability of the model
    +   *
    +   * @param data input data
    --- End diff --
    
    "input data" sounds like input feature values, which is not correct.  Why not match the ProbabilisticClassifier API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #79950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79950/testReport)** for PR 17373 at commit [`14c4c6c`](https://github.com/apache/spark/commit/14c4c6c8ebd5bfad3bd797a9ac5e1dfee1438c3f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80946/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by LeoIV <gi...@git.apache.org>.
Github user LeoIV commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    @WeichenXu123 I deleted my last comment because I wasn't quite sure, if I had no mistakes at other places. As I described above, I performed your changes in version 2.1. For **small** datasets, I get raw predictions, that are not in [0, 1]. You should be able to check it, using this small test case:
    
    ```
    import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
    import org.apache.spark.ml.linalg.Vectors
    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
    import org.apache.spark.sql.types.{IntegerType, StructType}
    import org.apache.spark.sql.{Row, SparkSession}
    
    /**
      * Created by Leonard Hövelmann (leonard.hoevelmann@adesso.de) on 14.07.2017.
      */
    object TestProb {
    
      def main(args: Array[String]) = {
        val spark = SparkSession.builder().master("local[*]").getOrCreate()
    
        val rowSchema = new StructType().add("class", IntegerType).add("features", org.apache.spark.ml.linalg.SQLDataTypes.VectorType)
    
        val testData: RDD[Row] = spark.sparkContext.parallelize(Seq(
          new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 0.4, 0.5))), rowSchema).asInstanceOf[Row]
        ))
    
        val testDataDf = spark.sqlContext.createDataFrame(testData, rowSchema)
    
        val mlp = new MultilayerPerceptronClassifier().setFeaturesCol("features").setLabelCol("class").setLayers(Array(5, 4, 3))
    
        val mlpModel = mlp.fit(testDataDf)
    
        mlpModel.transform(testDataDf).show(6)
      }
    
    }
    ```
    
    Using this, I get the following results:
    
    ```
    +-----+--------------------+--------------------+--------------------+----------+
    |class|            features|       rawPrediction|         probability|prediction|
    +-----+--------------------+--------------------+--------------------+----------+
    |    0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|       0.0|
    |    0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|       0.0|
    |    1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|       1.0|
    |    1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|       1.0|
    |    2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|       2.0|
    |    2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|       2.0|
    +-----+--------------------+--------------------+--------------------+----------+
    ```
    
    
    Does this work in your code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Thanks!  Will merge after rerunning tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    **[Test build #80762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80762/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r133081809
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -527,9 +550,21 @@ private[ml] class FeedForwardModel private(
     
       override def predict(data: Vector): Vector = {
         val size = data.size
    -    val result = forward(new BDM[Double](size, 1, data.toArray))
    +    val result = forward(new BDM[Double](size, 1, data.toArray), true)
         Vectors.dense(result.last.toArray)
       }
    +
    +  override def predictRaw(data: Vector): Vector = {
    +    val size = data.size
    +    val result = forward(new BDM[Double](size, 1, data.toArray), false)
    +    Vectors.dense(result(result.length - 2).toArray)
    +  }
    +
    +  override def raw2ProbabilityInPlace(data: Vector): Vector = {
    +    val dataMatrix = new BDM[Double](data.size, 1, data.toArray)
    +    layerModels.last.eval(dataMatrix, dataMatrix)
    --- End diff --
    
    This assumes that the ```eval``` method can operate in-place.  That is fine for the last layer for MLP (SoftmaxLayerModelWithCrossEntropyLoss), but not OK in general.  More generally, these methods for classifiers should not go in the very general TopologyModel abstraction; that abstraction may be used in the future for regression as well.  I'd be fine with putting this classification-specific logic in MLP itself; we do not need to generalize the logic until we add other Classifiers, which might take a long time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by LeoIV <gi...@git.apache.org>.
Github user LeoIV commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    I'd like to express my demand for this feature 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    cc @yanboliang @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79374/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r131824713
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -82,6 +83,23 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("test model probability") {
    +    val layers = Array[Int](2, 5, 2)
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(dataset)
    +    model.setProbabilityCol("probability")
    +    val result = model.transform(dataset)
    +    val features2prob = udf { features: Vector => model.mlpModel.predict(features) }
    +    val cmpVec = udf { (v1: Vector, v2: Vector) => v1 ~== v2 relTol 1e-3 }
    +    assert(result.select(cmpVec(features2prob(col("features")), col("probability")))
    +      .rdd.map(_.getBoolean(0)).reduce(_ && _))
    +  }
    +
    --- End diff --
    
    @MrBago 
    Which way of the strong test should be done ? Add a test to check the probability vector equals given vectors ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17373#discussion_r131992917
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
    @@ -83,6 +83,36 @@ class MultilayerPerceptronClassifierSuite
         }
       }
     
    +  test("strong dataset test") {
    +    val layers = Array[Int](4, 5, 5, 4)
    +
    +    val rnd = new scala.util.Random(1234L)
    +
    +    val strongDataset = Seq.tabulate(4) { index =>
    +      (Vectors.dense(
    +        rnd.nextGaussian(),
    +        rnd.nextGaussian() * 2.0,
    +        rnd.nextGaussian() * 3.0,
    +        rnd.nextGaussian() * 2.0
    +      ), (index % 4).toDouble)
    +    }.toDF("features", "label")
    +    val trainer = new MultilayerPerceptronClassifier()
    +      .setLayers(layers)
    +      .setBlockSize(1)
    +      .setSeed(123L)
    +      .setMaxIter(100)
    +      .setSolver("l-bfgs")
    +    val model = trainer.fit(strongDataset)
    +    val result = model.transform(strongDataset)
    +    model.setProbabilityCol("probability")
    +    MLTestingUtils.checkCopyAndUids(trainer, model)
    +    // result.select("probability").show(false)
    +    val predictionAndLabels = result.select("prediction", "label").collect()
    +    predictionAndLabels.foreach { case Row(p: Double, l: Double) =>
    +      assert(p == l)
    +    }
    +  }
    --- End diff --
    
    @MrBago 
    How do you like this test ?
    The probability it generate is
    
    +--------------------------------------------------------------------------------------+
    |probability                                                                           |
    +--------------------------------------------------------------------------------------+
    |[0.9917274999513315,0.001511626318489583,0.004831796668307991,0.0019290770618710876]  |
    |[4.2392735713619E-12,0.9999999999955336,1.8369996605279208E-14,2.0871629225077174E-13]|
    |[1.8975708749716946E-4,5.191732707447977E-22,0.5010860788259045,0.49872416408659836]  |
    |[1.6776134471360903E-4,3.9309610969078615E-22,0.49629577580941386,0.5035364628458726] |
    +--------------------------------------------------------------------------------------+
    
    it contains some values near 0.5



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org