You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/02/13 15:31:26 UTC

[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/20600

    [SPARK-23412][ML] Add cosine distance to BisectingKMeans

    ## What changes were proposed in this pull request?
    
    The PR adds the option to specify a distance measure in BisectingKMeans. Moreover, it introduces the ability to use the cosine distance measure in it.
    
    ## How was this patch tested?
    
    added UTs + existing UTs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-23412

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20600
    
----
commit 343ab72491f0f357d714203c6904e70090e3da14
Author: Marco Gaido <ma...@...>
Date:   2018-02-12T11:58:01Z

    [SPARK-23412][ML] Add cosine distance to BisectingKMeans

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20600


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87424/testReport)** for PR 20600 at commit [`ed9b55f`](https://github.com/apache/spark/commit/ed9b55fffa71798c9f8f893a4f3c77d0af33ddd6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/861/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87401/testReport)** for PR 20600 at commit [`ed9b55f`](https://github.com/apache/spark/commit/ed9b55fffa71798c9f8f893a4f3c77d0af33ddd6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20600#discussion_r171625505
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala ---
    @@ -155,34 +183,55 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] {
           spark.createDataFrame(data).write.parquet(Loader.dataPath(path))
         }
     
    -    private def getNodes(node: ClusteringTreeNode): Array[ClusteringTreeNode] = {
    -      if (node.children.isEmpty) {
    -        Array(node)
    -      } else {
    -        node.children.flatMap(getNodes(_)) ++ Array(node)
    -      }
    -    }
    -
    -    def load(sc: SparkContext, path: String, rootId: Int): BisectingKMeansModel = {
    +    def load(sc: SparkContext, path: String): BisectingKMeansModel = {
    --- End diff --
    
    This changed the signature of `load`, as MiMa notes. I'm not sure you can do this? though I'm not so clear on the semantics of this serialization method. Can it be avoided? I'd have thought the older serialization could be left as-is.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87401/testReport)** for PR 20600 at commit [`ed9b55f`](https://github.com/apache/spark/commit/ed9b55fffa71798c9f8f893a4f3c77d0af33ddd6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20600#discussion_r171632341
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala ---
    @@ -136,8 +144,28 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] {
           r.getDouble(4), r.getDouble(5), r.getSeq[Int](6))
       }
     
    +  private def getNodes(node: ClusteringTreeNode): Array[ClusteringTreeNode] = {
    +    if (node.children.isEmpty) {
    +      Array(node)
    +    } else {
    +      node.children.flatMap(getNodes) ++ Array(node)
    +    }
    +  }
    +
    +  private def buildTree(rootId: Int, nodes: Map[Int, Data]): ClusteringTreeNode = {
    +    val root = nodes(rootId)
    +    if (root.children.isEmpty) {
    +      new ClusteringTreeNode(root.index, root.size, new VectorWithNorm(root.center, root.norm),
    +        root.cost, root.height, new Array[ClusteringTreeNode](0))
    +    } else {
    +      val children = root.children.map(c => buildTree(c, nodes))
    +      new ClusteringTreeNode(root.index, root.size, new VectorWithNorm(root.center, root.norm),
    +        root.cost, root.height, children.toArray)
    +    }
    +  }
    +
       private[clustering] object SaveLoadV1_0 {
    -    private val thisFormatVersion = "1.0"
    +    private[clustering] val thisFormatVersion = "1.0"
    --- End diff --
    
    it is used about 30 lines before, in the `load` method. Previously it was hard-coded the value there


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87400/testReport)** for PR 20600 at commit [`8625bff`](https://github.com/apache/spark/commit/8625bff0fdb1667abd4be2c4f12cd96b32e2b1d9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20600#discussion_r171625847
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -36,6 +36,12 @@ object MimaExcludes {
     
       // Exclude rules for 2.4.x
       lazy val v24excludes = v23excludes ++ Seq(
    +    // [SPARK-23412][ML] Add cosine distance measure to BisectingKmeans
    +    ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasDistanceMeasure.org$apache$spark$ml$param$shared$HasDistanceMeasure$_setter_$distanceMeasure_="),
    +    ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasDistanceMeasure.getDistanceMeasure"),
    +    ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasDistanceMeasure.distanceMeasure"),
    +    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.clustering.BisectingKMeansModel#SaveLoadV1_0.load"),
    --- End diff --
    
    See above; this one I think needs to be avoided. The others are issues too but probably acceptable in a minor release. I don't know if people extend these implementations. That said, any way to work around this by providing a dummy implementation of the new methods? I haven't thought through the Scala here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87400/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #4139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4139/testReport)** for PR 20600 at commit [`ed9b55f`](https://github.com/apache/spark/commit/ed9b55fffa71798c9f8f893a4f3c77d0af33ddd6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20600#discussion_r171625208
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala ---
    @@ -136,8 +144,28 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] {
           r.getDouble(4), r.getDouble(5), r.getSeq[Int](6))
       }
     
    +  private def getNodes(node: ClusteringTreeNode): Array[ClusteringTreeNode] = {
    +    if (node.children.isEmpty) {
    +      Array(node)
    +    } else {
    +      node.children.flatMap(getNodes) ++ Array(node)
    +    }
    +  }
    +
    +  private def buildTree(rootId: Int, nodes: Map[Int, Data]): ClusteringTreeNode = {
    +    val root = nodes(rootId)
    +    if (root.children.isEmpty) {
    +      new ClusteringTreeNode(root.index, root.size, new VectorWithNorm(root.center, root.norm),
    +        root.cost, root.height, new Array[ClusteringTreeNode](0))
    +    } else {
    +      val children = root.children.map(c => buildTree(c, nodes))
    +      new ClusteringTreeNode(root.index, root.size, new VectorWithNorm(root.center, root.norm),
    +        root.cost, root.height, children.toArray)
    +    }
    +  }
    +
       private[clustering] object SaveLoadV1_0 {
    -    private val thisFormatVersion = "1.0"
    +    private[clustering] val thisFormatVersion = "1.0"
    --- End diff --
    
    Did this need to be more visible?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/873/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20600#discussion_r171634107
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala ---
    @@ -155,34 +183,55 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] {
           spark.createDataFrame(data).write.parquet(Loader.dataPath(path))
         }
     
    -    private def getNodes(node: ClusteringTreeNode): Array[ClusteringTreeNode] = {
    -      if (node.children.isEmpty) {
    -        Array(node)
    -      } else {
    -        node.children.flatMap(getNodes(_)) ++ Array(node)
    -      }
    -    }
    -
    -    def load(sc: SparkContext, path: String, rootId: Int): BisectingKMeansModel = {
    +    def load(sc: SparkContext, path: String): BisectingKMeansModel = {
    --- End diff --
    
    yes, but this is the load method of the object `SaveLoadV1_0` which is marked as `private[clustering]`. The real `load` method has no change in the signature, so I don't think this is a problem.
    
    I think that this change can't be avoided. If we don't update this and the user happens to use the `mllib` implementation, instead of `ml`, what happens is that he/she can set the distance measure successfully, but if he/she saves the model and loads it, this information is lost and it will default to the euclidean distance measure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/858/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87396/testReport)** for PR 20600 at commit [`343ab72`](https://github.com/apache/spark/commit/343ab72491f0f357d714203c6904e70090e3da14).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait HasDistanceMeasure extends Params `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20600#discussion_r171635212
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -36,6 +36,12 @@ object MimaExcludes {
     
       // Exclude rules for 2.4.x
       lazy val v24excludes = v23excludes ++ Seq(
    +    // [SPARK-23412][ML] Add cosine distance measure to BisectingKmeans
    +    ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasDistanceMeasure.org$apache$spark$ml$param$shared$HasDistanceMeasure$_setter_$distanceMeasure_="),
    +    ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasDistanceMeasure.getDistanceMeasure"),
    +    ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasDistanceMeasure.distanceMeasure"),
    +    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.clustering.BisectingKMeansModel#SaveLoadV1_0.load"),
    --- End diff --
    
    I can add the methods directly to the `BisectingKMeansModel`, but then we have the same code duplicated. So I think this solution is cleaner. I saw the same was done for `HasOutputCols` in 2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87399/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87396/testReport)** for PR 20600 at commit [`343ab72`](https://github.com/apache/spark/commit/343ab72491f0f357d714203c6904e70090e3da14).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87399/testReport)** for PR 20600 at commit [`87c3ebd`](https://github.com/apache/spark/commit/87c3ebdcd5f0ec834d3bcb00ac54b493f298d2e8).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/860/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20600#discussion_r171637778
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala ---
    @@ -155,34 +183,55 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] {
           spark.createDataFrame(data).write.parquet(Loader.dataPath(path))
         }
     
    -    private def getNodes(node: ClusteringTreeNode): Array[ClusteringTreeNode] = {
    -      if (node.children.isEmpty) {
    -        Array(node)
    -      } else {
    -        node.children.flatMap(getNodes(_)) ++ Array(node)
    -      }
    -    }
    -
    -    def load(sc: SparkContext, path: String, rootId: Int): BisectingKMeansModel = {
    +    def load(sc: SparkContext, path: String): BisectingKMeansModel = {
    --- End diff --
    
    Oh right, that's OK then. It's a false positive as it's public in the byte code but not really public.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    @srowen @viirya @zhengruifeng sorry, did you have time to take a look at this? Any thoughts? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87424 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87424/testReport)** for PR 20600 at commit [`ed9b55f`](https://github.com/apache/spark/commit/ed9b55fffa71798c9f8f893a4f3c77d0af33ddd6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87424/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87401/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    any more comments @srowen ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87396/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #4139 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4139/testReport)** for PR 20600 at commit [`ed9b55f`](https://github.com/apache/spark/commit/ed9b55fffa71798c9f8f893a4f3c77d0af33ddd6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87399/testReport)** for PR 20600 at commit [`87c3ebd`](https://github.com/apache/spark/commit/87c3ebdcd5f0ec834d3bcb00ac54b493f298d2e8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    cc @srowen @viirya @zhengruifeng 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20600
  
    **[Test build #87400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87400/testReport)** for PR 20600 at commit [`8625bff`](https://github.com/apache/spark/commit/8625bff0fdb1667abd4be2c4f12cd96b32e2b1d9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org