You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2017/10/03 07:19:55 UTC

[GitHub] spark pull request #19372: [SPARK-22156][MLLIB] Fix update equation of learn...

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19372#discussion_r142328125
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
    @@ -368,11 +371,12 @@ class Word2Vec extends Serializable with Logging {
                 var wc = wordCount
                 if (wordCount - lastWordCount > 10000) {
                   lwc = wordCount
    -              // TODO: discount by iteration?
    -              alpha =
    -                learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))
    +              alpha = learningRate *
    +                (1 - (numPartitions * wordCount.toDouble + numWordsProcessedInPreviousIterations) /
    +                  totalWordsCounts)
                   if (alpha < learningRate * 0.0001) alpha = learningRate * 0.0001
    -              logInfo("wordCount = " + wordCount + ", alpha = " + alpha)
    +              logInfo("wordCount = " + (wordCount + numWordsProcessedInPreviousIterations) +
    --- End diff --
    
    If you update this again, you can use string interpolation: `logInfo(s"wordCount = ${wordCount + ...}, alpha = $alpha")`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org