You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ludatabricks <gi...@git.apache.org> on 2018/04/11 19:23:29 UTC

[GitHub] spark pull request #21044: Add RawPrediction, numClasses, and numFeatures fo...

GitHub user ludatabricks opened a pull request:

    https://github.com/apache/spark/pull/21044

    Add RawPrediction, numClasses, and numFeatures for OneVsRestModel

    add RawPrediction as output column 
    add numClasses and numFeatures to OneVsRestModel
    
    ## What changes were proposed in this pull request?
    
    - Add two val numClasses and numFeatures in OneVsRestModel so that we can inherit from Classifier in the future
    
    - Add rawPrediction output column in transform, the prediction label in calculated by the rawPrediciton like raw2prediction
    
     
    
    ## How was this patch tested?
    
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ludatabricks/spark-1 SPARK-9312

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21044
    
----
commit 0cfc20a3637c06071e6fe48ca5db4834b34c889e
Author: Lu WANG <lu...@...>
Date:   2018-04-11T19:08:22Z

    add rawPrediction as an output column;
    add numCLasses and numFeatures to OneVsRestModel

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89309/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r181288716
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -146,6 +152,10 @@ final class OneVsRestModel private[ml] (
       @Since("2.1.0")
       def setPredictionCol(value: String): this.type = set(predictionCol, value)
     
    +  /** @group setParam */
    +  @Since("2.4.0")
    +  def setRawPredictionCol(value: String): this.type = set(rawPredictionCol, value)
    --- End diff --
    
    You'll need to add this to the Estimator too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89216/testReport)** for PR 21044 at commit [`0cfc20a`](https://github.com/apache/spark/commit/0cfc20a3637c06071e6fe48ca5db4834b34c889e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89308/testReport)** for PR 21044 at commit [`2a47e2b`](https://github.com/apache/spark/commit/2a47e2be30d52e3fbea7e1eeeaa5048a6ac97116).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Thanks for the PR!  Quick high-level comment: We'll need to have rawPredictionCol be optional.  If it's not set or is an empty string, then it should not be added to the output DataFrame.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89308/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r181287383
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml] (
           newDataset.unpersist()
         }
     
    -    // output the index of the classifier with highest confidence as prediction
    -    val labelUDF = udf { (predictions: Map[Int, Double]) =>
    -      predictions.maxBy(_._2)._1.toDouble
    -    }
    +    // output the RawPrediction as vector
    +    if (getRawPredictionCol != "") {
    +      val rawPredictionUDF = udf { (predictions: Map[Int, Double]) =>
    +        val predArray = Array.fill[Double](numClasses)(0.0)
    +        predictions.foreach { case (idx, value) => predArray(idx) = value }
    +        Vectors.dense(predArray)
    +      }
    +
    +      // output the index of the classifier with highest confidence as prediction
    +      val labelUDF = udf { (predictions: Vector) => predictions.argmax.toDouble }
    --- End diff --
    
    ==> `udf { (rawPredictions: Vector) => ... }`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89361/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89353/testReport)** for PR 21044 at commit [`ebf4a6c`](https://github.com/apache/spark/commit/ebf4a6c155be6a13fb41f492eb2777465d163478).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89353 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89353/testReport)** for PR 21044 at commit [`ebf4a6c`](https://github.com/apache/spark/commit/ebf4a6c155be6a13fb41f492eb2777465d163478).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89353/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r181288721
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml] (
           newDataset.unpersist()
         }
     
    -    // output the index of the classifier with highest confidence as prediction
    -    val labelUDF = udf { (predictions: Map[Int, Double]) =>
    -      predictions.maxBy(_._2)._1.toDouble
    -    }
    +    // output the RawPrediction as vector
    +    if (getRawPredictionCol != "") {
    +      val rawPredictionUDF = udf { (predictions: Map[Int, Double]) =>
    +        val predArray = Array.fill[Double](numClasses)(0.0)
    --- End diff --
    
    This causes a subtle ContextCleaner bug: `numClasses` refers to a field of the class OneVsRestModel, so when Spark's closure capture serializes this UDF to send to executors, it will end up sending the entire OneVsRestModel object, rather than just the value for numClasses.  Make a local copy of the value numClasses within the transform() method to avoid this issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r181288725
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml] (
           newDataset.unpersist()
         }
     
    -    // output the index of the classifier with highest confidence as prediction
    -    val labelUDF = udf { (predictions: Map[Int, Double]) =>
    -      predictions.maxBy(_._2)._1.toDouble
    -    }
    +    // output the RawPrediction as vector
    +    if (getRawPredictionCol != "") {
    +      val rawPredictionUDF = udf { (predictions: Map[Int, Double]) =>
    +        val predArray = Array.fill[Double](numClasses)(0.0)
    +        predictions.foreach { case (idx, value) => predArray(idx) = value }
    +        Vectors.dense(predArray)
    +      }
    +
    +      // output the index of the classifier with highest confidence as prediction
    +      val labelUDF = udf { (predictions: Vector) => predictions.argmax.toDouble }
     
    -    // output label and label metadata as prediction
    -    aggregatedDataset
    -      .withColumn($(predictionCol), labelUDF(col(accColName)), labelMetadata)
    -      .drop(accColName)
    +      aggregatedDataset
    +        .withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName)))
    +        .withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata)
    +        .drop(accColName)
    +    }
    +    else {
    --- End diff --
    
    Scala style: This should go on the previous line: ```} else {```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r181288736
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml] (
           newDataset.unpersist()
         }
     
    -    // output the index of the classifier with highest confidence as prediction
    -    val labelUDF = udf { (predictions: Map[Int, Double]) =>
    -      predictions.maxBy(_._2)._1.toDouble
    -    }
    +    // output the RawPrediction as vector
    +    if (getRawPredictionCol != "") {
    +      val rawPredictionUDF = udf { (predictions: Map[Int, Double]) =>
    +        val predArray = Array.fill[Double](numClasses)(0.0)
    +        predictions.foreach { case (idx, value) => predArray(idx) = value }
    +        Vectors.dense(predArray)
    +      }
    +
    +      // output the index of the classifier with highest confidence as prediction
    +      val labelUDF = udf { (predictions: Vector) => predictions.argmax.toDouble }
     
    -    // output label and label metadata as prediction
    -    aggregatedDataset
    -      .withColumn($(predictionCol), labelUDF(col(accColName)), labelMetadata)
    -      .drop(accColName)
    +      aggregatedDataset
    +        .withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName)))
    +        .withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata)
    +        .drop(accColName)
    +    }
    +    else {
    +      // output the index of the classifier with highest confidence as prediction
    +      val labelUDF = udf { (predictions: Map[Int, Double]) =>
    +        predictions.maxBy(_._2)._1.toDouble
    +      }
    +      // output confidence as rwa prediction, label and label metadata as prediction
    --- End diff --
    
    This comment seems to be in the wrong part of the code.  Also there's a typo


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89216/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r181288710
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -138,6 +138,12 @@ final class OneVsRestModel private[ml] (
         @Since("1.4.0") val models: Array[_ <: ClassificationModel[_, _]])
       extends Model[OneVsRestModel] with OneVsRestParams with MLWritable {
     
    --- End diff --
    
    Let's add a require() statement here which checks that models.nonEmpty is true (to throw an exception upon construction, rather than when numFeatures calls models.head below).  Just to be safe...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r180920806
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -195,14 +205,18 @@ final class OneVsRestModel private[ml] (
           newDataset.unpersist()
         }
     
    -    // output the index of the classifier with highest confidence as prediction
    -    val labelUDF = udf { (predictions: Map[Int, Double]) =>
    -      predictions.maxBy(_._2)._1.toDouble
    +    // output the RawPrediction as vector
    +    val rawPredictionUDF = udf { (predictions: Map[Int, Double]) =>
    +      Vectors.sparse(numClasses, predictions.toList )
    --- End diff --
    
    Also, let's output a dense Vector since it will almost surely be dense.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    LGTM
    Merging with master
    Thanks!!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89308/testReport)** for PR 21044 at commit [`2a47e2b`](https://github.com/apache/spark/commit/2a47e2be30d52e3fbea7e1eeeaa5048a6ac97116).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21044


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89309/testReport)** for PR 21044 at commit [`0c32fca`](https://github.com/apache/spark/commit/0c32fcaaf87f1922170e4ce7e60381ccd23ab6e8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89309/testReport)** for PR 21044 at commit [`0c32fca`](https://github.com/apache/spark/commit/0c32fcaaf87f1922170e4ce7e60381ccd23ab6e8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r181286908
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml] (
           newDataset.unpersist()
         }
     
    -    // output the index of the classifier with highest confidence as prediction
    -    val labelUDF = udf { (predictions: Map[Int, Double]) =>
    -      predictions.maxBy(_._2)._1.toDouble
    -    }
    +    // output the RawPrediction as vector
    +    if (getRawPredictionCol != "") {
    +      val rawPredictionUDF = udf { (predictions: Map[Int, Double]) =>
    +        val predArray = Array.fill[Double](numClasses)(0.0)
    +        predictions.foreach { case (idx, value) => predArray(idx) = value }
    +        Vectors.dense(predArray)
    +      }
    +
    +      // output the index of the classifier with highest confidence as prediction
    +      val labelUDF = udf { (predictions: Vector) => predictions.argmax.toDouble }
     
    -    // output label and label metadata as prediction
    -    aggregatedDataset
    -      .withColumn($(predictionCol), labelUDF(col(accColName)), labelMetadata)
    -      .drop(accColName)
    +      aggregatedDataset
    +        .withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName)))
    +        .withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata)
    +        .drop(accColName)
    +    }
    +    else {
    +      // output the index of the classifier with highest confidence as prediction
    +      val labelUDF = udf { (predictions: Map[Int, Double]) =>
    +        predictions.maxBy(_._2)._1.toDouble
    +      }
    +      // output confidence as rwa prediction, label and label metadata as prediction
    --- End diff --
    
    rwa -> raw


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89216/testReport)** for PR 21044 at commit [`0cfc20a`](https://github.com/apache/spark/commit/0cfc20a3637c06071e6fe48ca5db4834b34c889e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89361/testReport)** for PR 21044 at commit [`b3c7fec`](https://github.com/apache/spark/commit/b3c7fec0fda9056b832d1d35e829e9946218e504).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21044
  
    **[Test build #89361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89361/testReport)** for PR 21044 at commit [`b3c7fec`](https://github.com/apache/spark/commit/b3c7fec0fda9056b832d1d35e829e9946218e504).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org