You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2018/11/07 15:39:02 UTC

[GitHub] spark pull request #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmea...

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22893#discussion_r231555559
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
    @@ -521,19 +521,21 @@ object MLUtils extends Logging {
          * The bound doesn't need the inner product, so we can use it as a sufficient condition to
          * check quickly whether the inner product approach is accurate.
          */
    -    val precisionBound1 = 2.0 * EPSILON * sumSquaredNorm / (normDiff * normDiff + EPSILON)
    -    if (precisionBound1 < precision) {
    -      sqDist = sumSquaredNorm - 2.0 * dot(v1, v2)
    -    } else if (v1.isInstanceOf[SparseVector] || v2.isInstanceOf[SparseVector]) {
    -      val dotValue = dot(v1, v2)
    -      sqDist = math.max(sumSquaredNorm - 2.0 * dotValue, 0.0)
    -      val precisionBound2 = EPSILON * (sumSquaredNorm + 2.0 * math.abs(dotValue)) /
    -        (sqDist + EPSILON)
    -      if (precisionBound2 > precision) {
    -        sqDist = Vectors.sqdist(v1, v2)
    -      }
    -    } else {
    +    if (v1.isInstanceOf[DenseVector] && v2.isInstanceOf[DenseVector]) {
           sqDist = Vectors.sqdist(v1, v2)
    +    } else {
    +      val precisionBound1 = 2.0 * EPSILON * sumSquaredNorm / (normDiff * normDiff + EPSILON)
    --- End diff --
    
    Shouldn't you pull computation of things like normDiff into this block, then? That would improve the dense case more, I suppose.
    
    I'd still like to see your current final test code to understand what it's testing. If it's a win on some types of vectors, on realistic data, and doesn't hurt performance otherwise, this could be OK.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org