You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2018/08/13 06:15:23 UTC

[GitHub] spark pull request #22087: [SPARK-25097][Support prediction on single instan...

GitHub user zhengruifeng opened a pull request:

    https://github.com/apache/spark/pull/22087

    [SPARK-25097][Support prediction on single instance in KMeans/BiKMeans/GMM] Support prediction on single instance in KMeans/BiKMeans/GMM

    ## What changes were proposed in this pull request?
    expose method `predict` in KMeans/BiKMeans/GMM
    
    ## How was this patch tested?
    NA

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhengruifeng/spark clu_pre_instance

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22087.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22087
    
----
commit f6cbb46f46f039d6168de382a18018a37e0e3ee7
Author: zhengruifeng <ru...@...>
Date:   2018-08-13T06:10:35Z

    init

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #98345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98345/testReport)** for PR 22087 at commit [`2d6594e`](https://github.com/apache/spark/commit/2d6594e37ab6968fc13add89cdec2fd42f2b799b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by erikerlandson <gi...@git.apache.org>.
Github user erikerlandson commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    This LGTM, but it raises a more general question about the lack of single-sample prediction over the entire hierarchy. For example (IMO) there should be some kind of single-sample method associated with `org.apache.spark.ml.Model`, or `org.apache.spark.ml.Transformer`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4828/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    @imatiach-msft  Updated according to your comments! Thanks for your reviewing!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22087: [SPARK-25097][ML] Support prediction on single in...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22087#discussion_r230968204
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
    @@ -155,4 +155,16 @@ trait MLTest extends StreamTest with TempDirectory { self: Suite =>
             assert(prediction === model.predict(features))
         }
       }
    +
    +  def testClusteringModelSinglePrediction(model: Model[_],
    +                                          transform: Vector => Int,
    +                                          dataset: Dataset[_],
    +                                          input: String,
    +                                          output: String): Unit = {
    --- End diff --
    
    I think we should use 2 spaces ident?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22087: [SPARK-25097][ML] Support prediction on single in...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22087#discussion_r231107042
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -117,7 +117,8 @@ class BisectingKMeansModel private[ml] (
         validateAndTransformSchema(schema)
       }
     
    -  private[clustering] def predict(features: Vector): Int = parentModel.predict(features)
    +  @Since("2.4.0")
    --- End diff --
    
    I don't see a good reason either. I think it is fine to expose it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #98502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98502/testReport)** for PR 22087 at commit [`fb3c6d1`](https://github.com/apache/spark/commit/fb3c6d1f933b807c89e5892dadf8654ec280d3b5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98345/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2117/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98574/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #94671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94671/testReport)** for PR 22087 at commit [`f6cbb46`](https://github.com/apache/spark/commit/f6cbb46f46f039d6168de382a18018a37e0e3ee7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    @felixcheung Testsuites is added. Thanks for reviewing!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    @erikerlandson @srowen actually we already have a `PredictionModel` for this.  am not sure why clustering algorithms are not extending it though, but in that class the method returns a `Double`, not an `Int`. We can explore the feasibility of making clustering algorithms predictors too I think.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #95990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95990/testReport)** for PR 22087 at commit [`5fe7ed3`](https://github.com/apache/spark/commit/5fe7ed356e48b9692c3476d2b9e2ede3348f9f41).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95990/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #98574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98574/testReport)** for PR 22087 at commit [`01b726f`](https://github.com/apache/spark/commit/01b726f850d5f987a0b1de15f8c4d94a694541b0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #94723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94723/testReport)** for PR 22087 at commit [`5fe7ed3`](https://github.com/apache/spark/commit/5fe7ed356e48b9692c3476d2b9e2ede3348f9f41).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Yeah it's a good point and I wonder if @jkbradley or @mengxr or @MLnick want to weigh in. If this superclass method existed, I think it would be `predict(Vector):Int` anyway, so seems pretty reasonable to expose the already-existing instances of it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Aha. On the one hand, I suppose they can't extend that class because of the signature difference, and indeed it says it's the superclass of regression and classification models. I can imagine that the clustering algoritms should have a ClusteringModel superclass or something with `predict(Vector):Int`.  That I suppose can be decided later. That is, maybe there isn't going to be a logical overall superclass defining a universal prediction method, for reasons like this. Yes, these things can return an Int where a Double is needed but that to me isn't a great reason.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #94723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94723/testReport)** for PR 22087 at commit [`5fe7ed3`](https://github.com/apache/spark/commit/5fe7ed356e48b9692c3476d2b9e2ede3348f9f41).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #94671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94671/testReport)** for PR 22087 at commit [`f6cbb46`](https://github.com/apache/spark/commit/f6cbb46f46f039d6168de382a18018a37e0e3ee7).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98344/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4786/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2162/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #95990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95990/testReport)** for PR 22087 at commit [`5fe7ed3`](https://github.com/apache/spark/commit/5fe7ed356e48b9692c3476d2b9e2ede3348f9f41).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94723/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    ah, good point @imatiach-msft , I missed the `HasLabelCol` in `PredictorParams`. We might have to revise the trait hierarchy here to something like you mentioned. I do agree with @srowen to escalate this, and I'd also suggest to create a design doc for the new hierarchy, so that it can be reviewed more carefully and by a broader audience


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #98502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98502/testReport)** for PR 22087 at commit [`fb3c6d1`](https://github.com/apache/spark/commit/fb3c6d1f933b807c89e5892dadf8654ec280d3b5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by erikerlandson <gi...@git.apache.org>.
Github user erikerlandson commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    PredictionModel as a super-class of unsupervised and supervised seems sane to me. Returning a double to unify the signature also seems sane, although the thought of casting it might irk people.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94671/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #98345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98345/testReport)** for PR 22087 at commit [`2d6594e`](https://github.com/apache/spark/commit/2d6594e37ab6968fc13add89cdec2fd42f2b799b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22087: [SPARK-25097][ML] Support prediction on single in...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22087#discussion_r220779540
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala ---
    @@ -268,6 +268,13 @@ class GaussianMixtureSuite extends MLTest with DefaultReadWriteTest {
         assert(trueLikelihood ~== doubleLikelihood absTol 1e-6)
         assert(trueLikelihood ~== floatLikelihood absTol 1e-6)
       }
    +
    +  test("prediction on single instance") {
    +    val gmm = new GaussianMixture()
    --- End diff --
    
    maybe explicitly set seed in order to minimize randomness from run to run (usually better to make tests as deterministic as possible)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    I'd escalate to dev@ for more visibility. It has some longer-term consequences, and I'd like to hear current thinking on how much these APIs should change, etc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    @srowen I think it would not be a big deal if we would change the return type of these to Double, since they are now private. If that is the only change needed we may be able to reuse the `PredictionModel` also for clustering models. But if we expose them, going back would be harder. So I'd first agree on whether to try or not to make them `PredictionModel`s: if not, this PR is good IMHO.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #98344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98344/testReport)** for PR 22087 at commit [`f428a5d`](https://github.com/apache/spark/commit/f428a5ddef242bbcccb189ae62259a9ced6e80de).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98502/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    @srowen @mgaido91 @erikerlandson I don't think it makes sense to use PredictionModel as it is currently because it is for supervised learning and contains label column specific params (eg getLabelCol() method).  KMeans and friends are unsupervised learning and should not have these params.  I definitely agree though that some of the goodness on prediction model should live on a common base class shared by the supervised and unsupervised algos.  Maybe a class hierarchy like:
                                  predictionmodel
    unsupervisedmodel                          supervisedmodel
    would be ideal?  where the label specific params would live on supervisedmodel.
    
    also tagging @jkbradley 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4690/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4691/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    @zhengruifeng thanks, the PR looks good to me, maybe @felixcheung or @jkbradley can review and possibly merge?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Sounds good to design a universal prediction model as a super-class. 
    BTW, I think we can also create a new class `ProbabilisticPredictionModel` (as a subclass of `PredictionModel`), so that we can let soft-clustering model extends it to expose method `predictProbability`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #98574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98574/testReport)** for PR 22087 at commit [`01b726f`](https://github.com/apache/spark/commit/01b726f850d5f987a0b1de15f8c4d94a694541b0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    I also expose GMM's predictProbability.
     could you please make a final pass? @srowen @felixcheung 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    **[Test build #98344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98344/testReport)** for PR 22087 at commit [`f428a5d`](https://github.com/apache/spark/commit/f428a5ddef242bbcccb189ae62259a9ced6e80de).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22087: [SPARK-25097][ML] Support prediction on single in...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22087#discussion_r230948830
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -117,7 +117,8 @@ class BisectingKMeansModel private[ml] (
         validateAndTransformSchema(schema)
       }
     
    -  private[clustering] def predict(features: Vector): Int = parentModel.predict(features)
    +  @Since("2.4.0")
    --- End diff --
    
    This would have to be 3.0.0.
    
    I don't see a good reason not to expose this. CCing also maybe @mgaido91 ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22087
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org