You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by davideis <gi...@git.apache.org> on 2017/05/17 21:06:59 UTC

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

GitHub user davideis opened a pull request:

    https://github.com/apache/spark/pull/18022

    [SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback

    ## What changes were proposed in this pull request?
    
    Revert the handling of negative values in ALS with implicit feedback, so that the confidence is the absolute value of the rating and the preference is 0 for negative ratings. This was the original behavior.
    
    ## How was this patch tested?
    
    This patch was tested with the existing unit tests and an added unit test to ensure that negative ratings are not ignored.
    
    @mengxr 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davideis/spark bugfix/negative-rating

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18022.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18022
    
----
commit 767d72e9bd08cc251aa1df0ce56a4fb8deb9e2e6
Author: David Eis <de...@bloomberg.net>
Date:   2017-05-02T21:46:29Z

    Fix regression introduced by ccafd757eda478913f783f3127be715bf6413740
    Revert the handling of negative values in ALS with implicit feedback and test for regression.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by davideis <gi...@git.apache.org>.

Github user davideis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117265969
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -1624,15 +1628,15 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
                 val srcFactor = sortedSrcFactors(blockId)(localIndex)
                 val rating = ratings(i)
                 if (implicitPrefs) {
    -              // Extension to the original paper to handle b < 0. confidence is a function of |b|
    -              // instead so that it is never negative. c1 is confidence - 1.0.
    +              // Extension to the original paper to handle rating < 0. confidence is a function
    +              // of |rating| instead so that it is never negative. c1 is confidence - 1.
                   val c1 = alpha * math.abs(rating)
    -              // For rating <= 0, the corresponding preference is 0. So the term below is only added
    -              // for rating > 0. Because YtY is already added, we need to adjust the scaling here.
    -              if (rating > 0) {
    +              // For rating <= 0, the corresponding preference is 0. So the second argument of add
    +              // is only there for rating > 0.
    +              if (rating > 0.0) {
                     numExplicits += 1
    -                ls.add(srcFactor, (c1 + 1.0) / c1, c1)
                   }
    +              ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)
    --- End diff --
    
    Correct, this is the crux of the change (moving outside of the if condition). Changing the arguments was more to be less confusing and more direct, since it was very confusing to me before where the ``(1+c1)/c1`` was coming from and then when it is actually used in ``add``, it gets multiplied by ``c1``, which is a wasted operation and may not even exactly yield ``1+c1`` in the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18022


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by davideis <gi...@git.apache.org>.

Github user davideis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117262532
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -763,11 +763,15 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
       /**
        * Representing a normal equation to solve the following weighted least squares problem:
        *
    -   * minimize \sum,,i,, c,,i,, (a,,i,,^T^ x - b,,i,,)^2^ + lambda * x^T^ x.
    +   * minimize \sum,,i,, c,,i,, (a,,i,,^T^ x - d,,i,,)^2^ + lambda * x^T^ x.
        *
        * Its normal equation is given by
        *
    -   * \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0.
    +   * \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - d,,i,, a,,i,,) + lambda * x = 0.
    +   *
    +   * Distributing and letting b,,i,, = d,,i,, * b,,i,,
    --- End diff --
    
    good point, I meant $b_i=c_i*d_i$. The function below accepts three arguments a, b and c, I wanted to name them something more meaningful, but in light of this comment a, b, and c make sense to use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117191129
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
    @@ -78,7 +79,7 @@ class ALSSuite
         val k = 2
         val ne0 = new NormalEquation(k)
           .add(Array(1.0f, 2.0f), 3.0)
    -      .add(Array(4.0f, 5.0f), 6.0, 2.0) // weighted
    +      .add(Array(4.0f, 5.0f), 12.0, 2.0) // weighted
    --- End diff --
    
    Was this test change intentional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r119347220
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
    @@ -455,6 +487,22 @@ class ALSSuite
           targetRMSE = 0.3)
       }
     
    +  test("implicit feedback regression") {
    +    val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, -3)))
    +    val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, 0)))
    +    val modelWithNeg =
    +      trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true)
    +    val modelWithZero =
    +      trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true)
    +    val userFactorsNeg = modelWithNeg.userFactors
    +    val itemFactorsNeg = modelWithNeg.itemFactors
    +    val userFactorsZero = modelWithZero.userFactors
    +    val itemFactorsZero = modelWithZero.itemFactors
    +    userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" ")))
    --- End diff --
    
    Small nit here but ideally we don't usually log info during this sort of test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    **[Test build #3745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3745/testReport)** for PR 18022 at commit [`21c0fd9`](https://github.com/apache/spark/commit/21c0fd9f2e77de4d1003fd9801028a947ce6d338).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by davideis <gi...@git.apache.org>.

Github user davideis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117264333
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
           require(a.length == k)
           copyToDouble(a)
           blas.dspr(upper, k, c, da, 1, ata)
    -      if (b != 0.0) {
    -        blas.daxpy(k, c * b, da, 1, atb, 1)
    +      if (Math.abs(b) > Double.MinPositiveValue) {
    --- End diff --
    
    You are right, I should pick a looser threshold. It seems that this check is only really to prevent extra work, since daxpy will just be adding a zeros vector if b==0. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by davideis <gi...@git.apache.org>.

Github user davideis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117260197
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
    @@ -78,7 +79,7 @@ class ALSSuite
         val k = 2
         val ne0 = new NormalEquation(k)
           .add(Array(1.0f, 2.0f), 3.0)
    -      .add(Array(4.0f, 5.0f), 6.0, 2.0) // weighted
    +      .add(Array(4.0f, 5.0f), 12.0, 2.0) // weighted
    --- End diff --
    
    Yes this test change was intentional, because I change the semantic meaning of the arguments to add, before add would multiply the second and third arguments together internally, so to make this test valid I premultiplied them together. In the usage of this function in ALS.scala, for non-implicit the third argument is 1, so there is no change, and implicit is now handled correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    **[Test build #3745 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3745/testReport)** for PR 18022 at commit [`21c0fd9`](https://github.com/apache/spark/commit/21c0fd9f2e77de4d1003fd9801028a947ce6d338).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by davideis <gi...@git.apache.org>.

Github user davideis commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    Does it need another jira ticket?
    
    On Jun 2, 2017 9:13 AM, "Nick Pentreath" <no...@github.com> wrote:
    
    > Yeah you can just open a small follow up PR
    > On Fri, 2 Jun 2017 at 15:10, davideis <no...@github.com> wrote:
    >
    > > *@davideis* commented on this pull request.
    > > ------------------------------
    > >
    > > In mllib/src/test/scala/org/apache/spark/ml/
    > recommendation/ALSSuite.scala
    > > <https://github.com/apache/spark/pull/18022#discussion_r119852009>:
    > >
    > > > @@ -455,6 +487,22 @@ class ALSSuite
    > > targetRMSE = 0.3)
    > > }
    > >
    > > + test("implicit feedback regression") {
    > > + val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1,
    > 1, 1), Rating(0, 1, -3)))
    > > + val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1,
    > 1, 1), Rating(0, 1, 0)))
    > > + val modelWithNeg =
    > > + trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01,
    > implicitPrefs = true)
    > > + val modelWithZero =
    > > + trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01,
    > implicitPrefs = true)
    > > + val userFactorsNeg = modelWithNeg.userFactors
    > > + val itemFactorsNeg = modelWithNeg.itemFactors
    > > + val userFactorsZero = modelWithZero.userFactors
    > > + val itemFactorsZero = modelWithZero.itemFactors
    > > + userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " +
    > arr.mkString(" ")))
    > >
    > > Good point, I meant to remove, shall I open another pr?
    > >
    > > —
    > > You are receiving this because you were mentioned.
    > > Reply to this email directly, view it on GitHub
    > > <https://github.com/apache/spark/pull/18022#discussion_r119852009>, or
    > mute
    > > the thread
    > > <https://github.com/notifications/unsubscribe-auth/AA_
    > SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX>
    > > .
    > >
    >
    > —
    > You are receiving this because you authored the thread.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/18022#issuecomment-305783257>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AbQISk92QY-a_H3_8Fi10HD_B7FvyKsaks5sAAqQgaJpZM4NecIX>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117191375
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
           require(a.length == k)
           copyToDouble(a)
           blas.dspr(upper, k, c, da, 1, ata)
    -      if (b != 0.0) {
    -        blas.daxpy(k, c * b, da, 1, atb, 1)
    +      if (Math.abs(b) > Double.MinPositiveValue) {
    --- End diff --
    
    How does this differ from `!= 0.0`? I get that it differs for `Double.MinPositiveValue` but why is that important?
    
    You're right that the condition was `b > 0` before on purpose, though I think this is trying to handle explicit/implicit cases


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    You can link the same JIRA since it's a small follow up
    On Fri, 2 Jun 2017 at 15:16, davideis <no...@github.com> wrote:
    
    > Does it need another jira ticket?
    >
    > On Jun 2, 2017 9:13 AM, "Nick Pentreath" <no...@github.com> wrote:
    >
    > > Yeah you can just open a small follow up PR
    > > On Fri, 2 Jun 2017 at 15:10, davideis <no...@github.com> wrote:
    > >
    > > > *@davideis* commented on this pull request.
    > > > ------------------------------
    > > >
    > > > In mllib/src/test/scala/org/apache/spark/ml/
    > > recommendation/ALSSuite.scala
    > > > <https://github.com/apache/spark/pull/18022#discussion_r119852009>:
    > > >
    > > > > @@ -455,6 +487,22 @@ class ALSSuite
    > > > targetRMSE = 0.3)
    > > > }
    > > >
    > > > + test("implicit feedback regression") {
    > > > + val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1,
    > > 1, 1), Rating(0, 1, -3)))
    > > > + val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1),
    > Rating(1,
    > > 1, 1), Rating(0, 1, 0)))
    > > > + val modelWithNeg =
    > > > + trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01,
    > > implicitPrefs = true)
    > > > + val modelWithZero =
    > > > + trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01,
    > > implicitPrefs = true)
    > > > + val userFactorsNeg = modelWithNeg.userFactors
    > > > + val itemFactorsNeg = modelWithNeg.itemFactors
    > > > + val userFactorsZero = modelWithZero.userFactors
    > > > + val itemFactorsZero = modelWithZero.itemFactors
    > > > + userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " +
    > > arr.mkString(" ")))
    > > >
    > > > Good point, I meant to remove, shall I open another pr?
    > > >
    > > > —
    > > > You are receiving this because you were mentioned.
    > > > Reply to this email directly, view it on GitHub
    > > > <https://github.com/apache/spark/pull/18022#discussion_r119852009>, or
    > > mute
    > > > the thread
    > > > <https://github.com/notifications/unsubscribe-auth/AA_
    > > SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX>
    > > > .
    > > >
    > >
    > > —
    > > You are receiving this because you authored the thread.
    > > Reply to this email directly, view it on GitHub
    > > <https://github.com/apache/spark/pull/18022#issuecomment-305783257>, or
    > mute
    > > the thread
    > > <
    > https://github.com/notifications/unsubscribe-auth/AbQISk92QY-a_H3_8Fi10HD_B7FvyKsaks5sAAqQgaJpZM4NecIX
    > >
    > > .
    > >
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/18022#issuecomment-305783820>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AA_SB2zzEDd0c2Mll0dhinpoQAbV_jfVks5sAAsmgaJpZM4NecIX>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    **[Test build #3763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3763/testReport)** for PR 18022 at commit [`21c0fd9`](https://github.com/apache/spark/commit/21c0fd9f2e77de4d1003fd9801028a947ce6d338).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    Merged to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by davideis <gi...@git.apache.org>.

Github user davideis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117261995
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
    @@ -348,6 +349,37 @@ class ALSSuite
       }
     
       /**
    +  * Train ALS using the given training set and parameters
    +  * @param training training dataset
    +  * @param rank rank of the matrix factorization
    +  * @param maxIter max number of iterations
    +  * @param regParam regularization constant
    +  * @param implicitPrefs whether to use implicit preference
    +  * @param numUserBlocks number of user blocks
    +  * @param numItemBlocks number of item blocks
    +  * @return a trained ALSModel
    +  */
    +  def trainALS(
    --- End diff --
    
    It is a helper function, because I call it twice in the test. I also wanted to use this in the testALS function, but it wasn't straightforward. I can't use testALS in my test since it does more than just train the model and it doesn't allow me to compare the two models the test generates, one with negative values and one with those negative values zeroed out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by davideis <gi...@git.apache.org>.

Github user davideis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r119852009
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
    @@ -455,6 +487,22 @@ class ALSSuite
           targetRMSE = 0.3)
       }
     
    +  test("implicit feedback regression") {
    +    val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, -3)))
    +    val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, 0)))
    +    val modelWithNeg =
    +      trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true)
    +    val modelWithZero =
    +      trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true)
    +    val userFactorsNeg = modelWithNeg.userFactors
    +    val itemFactorsNeg = modelWithNeg.itemFactors
    +    val userFactorsZero = modelWithZero.userFactors
    +    val itemFactorsZero = modelWithZero.itemFactors
    +    userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" ")))
    --- End diff --
    
    Good point, I meant to remove, shall I open another pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    **[Test build #3763 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3763/testReport)** for PR 18022 at commit [`21c0fd9`](https://github.com/apache/spark/commit/21c0fd9f2e77de4d1003fd9801028a947ce6d338).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117192420
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -1624,15 +1628,15 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
                 val srcFactor = sortedSrcFactors(blockId)(localIndex)
                 val rating = ratings(i)
                 if (implicitPrefs) {
    -              // Extension to the original paper to handle b < 0. confidence is a function of |b|
    -              // instead so that it is never negative. c1 is confidence - 1.0.
    +              // Extension to the original paper to handle rating < 0. confidence is a function
    +              // of |rating| instead so that it is never negative. c1 is confidence - 1.
                   val c1 = alpha * math.abs(rating)
    -              // For rating <= 0, the corresponding preference is 0. So the term below is only added
    -              // for rating > 0. Because YtY is already added, we need to adjust the scaling here.
    -              if (rating > 0) {
    +              // For rating <= 0, the corresponding preference is 0. So the second argument of add
    +              // is only there for rating > 0.
    +              if (rating > 0.0) {
                     numExplicits += 1
    -                ls.add(srcFactor, (c1 + 1.0) / c1, c1)
                   }
    +              ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)
    --- End diff --
    
    Is this the substance of the change? I might need some help understanding why this is needed. Yes, even negative values should be recorded for implicit prefs, I agree. It adds 1 + c1 now instead of (1 + c1) / c1, so that's why the factor of c is taken out above? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117190338
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
    @@ -348,6 +349,37 @@ class ALSSuite
       }
     
       /**
    +  * Train ALS using the given training set and parameters
    +  * @param training training dataset
    +  * @param rank rank of the matrix factorization
    +  * @param maxIter max number of iterations
    +  * @param regParam regularization constant
    +  * @param implicitPrefs whether to use implicit preference
    +  * @param numUserBlocks number of user blocks
    +  * @param numItemBlocks number of item blocks
    +  * @return a trained ALSModel
    +  */
    +  def trainALS(
    --- End diff --
    
    Why do you need a new overload?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18022#discussion_r117192700
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -763,11 +763,15 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
       /**
        * Representing a normal equation to solve the following weighted least squares problem:
        *
    -   * minimize \sum,,i,, c,,i,, (a,,i,,^T^ x - b,,i,,)^2^ + lambda * x^T^ x.
    +   * minimize \sum,,i,, c,,i,, (a,,i,,^T^ x - d,,i,,)^2^ + lambda * x^T^ x.
        *
        * Its normal equation is given by
        *
    -   * \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0.
    +   * \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - d,,i,, a,,i,,) + lambda * x = 0.
    +   *
    +   * Distributing and letting b,,i,, = d,,i,, * b,,i,,
    --- End diff --
    
    I'm not clear on this change. It defines $b_i$ in terms of itself? what is this correcting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18022: [SPARK-20790] [MLlib] Correctly handle negative values f...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18022
  
    Yeah you can just open a small follow up PR
    On Fri, 2 Jun 2017 at 15:10, davideis <no...@github.com> wrote:
    
    > *@davideis* commented on this pull request.
    > ------------------------------
    >
    > In mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala
    > <https://github.com/apache/spark/pull/18022#discussion_r119852009>:
    >
    > > @@ -455,6 +487,22 @@ class ALSSuite
    >        targetRMSE = 0.3)
    >    }
    >
    > +  test("implicit feedback regression") {
    > +    val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, -3)))
    > +    val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, 0)))
    > +    val modelWithNeg =
    > +      trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true)
    > +    val modelWithZero =
    > +      trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true)
    > +    val userFactorsNeg = modelWithNeg.userFactors
    > +    val itemFactorsNeg = modelWithNeg.itemFactors
    > +    val userFactorsZero = modelWithZero.userFactors
    > +    val itemFactorsZero = modelWithZero.itemFactors
    > +    userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" ")))
    >
    > Good point, I meant to remove, shall I open another pr?
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/18022#discussion_r119852009>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AA_SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org