You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tmyklebu <gi...@git.apache.org> on 2014/04/21 04:14:00 UTC

[GitHub] spark pull request: Alternating nonnegative least-squares

GitHub user tmyklebu opened a pull request:

    https://github.com/apache/spark/pull/460

    Alternating nonnegative least-squares

    This pull request includes a nonnegative least-squares solver (NNLS) tailored to the kinds of small-scale problems that come up when training matrix factorisation models by alternating nonnegative least-squares (ANNLS).
    
    The method used for the NNLS subproblems is based on the classical method of projected gradients.  There is a modification where, if the set of active constraints has not changed since the last iteration, a conjugate gradient step is considered and possibly rejected in favour of the gradient; this improves convergence once the optimal face has been located.
    
    The NNLS solver is in `org.apache.spark.mllib.optimization.NNLSbyPCG`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tmyklebu/spark annls

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/460.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #460
    
----
commit a68ac10932b2d9ae50b3a27b34bf65fce7e024e1
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-19T23:07:27Z

    A nonnegative least-squares solver.

commit 6cb563c19e7ae554724f9832f83a4df96d3d92c1
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-19T23:07:40Z

    Tests for the nonnegative least squares solver.

commit f5dbf4d5fb5fe5d09cf86e0fba388efc68acadd0
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-19T23:08:31Z

    Teach ALS how to use the NNLS solver.

commit 89ea0a8ef942ba0e62d91d75ca26f92a85b25470
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-19T23:08:56Z

    Hack ALSSuite to support NNLS testing.

commit 33bf4f20e28fdeaab2ed6ae36ef97115898c88dd
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-16T18:14:38Z

    Fix missing space.

commit 9a82fa61cb467db2e0dbc308d15268f15d20500d
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-20T03:54:58Z

    Fix scalastyle moanings.

commit c288b6ae99bb4aecacead1d399ec61dadd1ce2ab
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-20T03:55:13Z

    Finish moving the NNLS solver.

commit ac673bda27883d4bb4085fa326380aaed29b1a70
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-20T19:29:10Z

    More safeguards against numerical ridiculousness.

commit 5345402b6d44cd9fad46de830b0f7b4c8d3f9268
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-21T02:04:35Z

    Style fixes that got eaten.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40973680
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14296/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802008
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    --- End diff --
    
    Btw, if there is a reference for the implementation, please provide the reference. Otherwise, please add more comments about the difference here from a standard projected gradient/CG method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11984700
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    --- End diff --
    
    Use `DoubleMatrix#fill(0.0)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40973679
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802304
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean): Array[Double] = {
    +    val n = atb.rows
    +    val scratch = new DoubleMatrix(n, 1)
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, resid: DoubleMatrix): Double = {
    +      val top = SimpleBlas.dot(dir, resid)
    +      SimpleBlas.gemv(1.0, ata, dir, 0.0, scratch)
    +      // Push the denominator upward very slightly to avoid infinities and silliness
    +      top / (SimpleBlas.dot(scratch, dir) + 1e-20)
    +    }
    +
    +    // stopping condition
    +    def stop(step: Double, ndir: Double, nx: Double): Boolean = {
    +        ((step != step) // NaN
    +      || (step < 1e-6) // too small or negative
    +      || (step > 1e40) // too small; almost certainly numerical problems
    +      || (ndir < 1e-12 * nx) // gradient relatively too small
    +      || (ndir < 1e-32) // gradient absolutely too small; numerical issues may lurk
    +      )
    +    }
    +
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastdir = new DoubleMatrix(n, 1)
    +    val resid = new DoubleMatrix(n, 1)
    --- End diff --
    
    `resid` is not a common acronym for residual. Use either `res` or `residual`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40949723
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40958189
  
    Instead of using a static method, creating a workspace for temp vectors in the NNLS solver may help memory usage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985631
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -140,16 +169,21 @@ class ALSSuite extends FunSuite with LocalSparkContext {
        * @param implicitPrefs  flag to test implicit feedback
        * @param bulkPredict    flag to test bulk prediciton
        * @param negativeWeights whether the generated data can contain negative values
    +   * @param numBlocks      number of blocks to partition users and products into
    +   * @param negativeFactors whether the generated user/product factors can have negative entries
        */
       def testALS(users: Int, products: Int, features: Int, iterations: Int,
         samplingRate: Double, matchThreshold: Double, implicitPrefs: Boolean = false,
    -    bulkPredict: Boolean = false, negativeWeights: Boolean = false)
    +    bulkPredict: Boolean = false, negativeWeights: Boolean = false, numBlocks: Int = -1,
    +    negativeFactors: Boolean = true)
       {
         val (sampledRatings, trueRatings, truePrefs) = ALSSuite.generateRatings(users, products,
    -      features, samplingRate, implicitPrefs, negativeWeights)
    +      features, samplingRate, implicitPrefs, negativeWeights, negativeFactors)
         val model = implicitPrefs match {
    -      case false => ALS.train(sc.parallelize(sampledRatings), features, iterations)
    -      case true => ALS.trainImplicit(sc.parallelize(sampledRatings), features, iterations)
    +      case false => ALS.train(sc.parallelize(sampledRatings), features, iterations, 0.01,
    +          numBlocks, 0L, !negativeFactors)
    --- End diff --
    
    Does the following fit?
    
    ~~~
    case false =>
      ALS.train(sc.parallelize(sampledRatings), features, iterations, 0.01, numBlocks, 0L, !negativeFactors)
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985511
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    +    NativeBlas.dgemv('T', n, n, 1.0, A.data, 0, n, b.data, 0, 1, 0.0, atb.data, 0, 1)
    +
    +    (ata, atb)
    +  }
    +
    +  test("NNLSbyPCG: exact solution cases") {
    +    val n = 20
    +    val rand = new Random(12346)
    +    val ws = NNLSbyPCG.createWorkspace(n)
    +    var numSolved = 0
    +
    +    // About 15% of random 20x20 [-1,1]-matrices have a singular value less than 1e-3.  NNLSbyPCG
    +    // can legitimately fail to solve these anywhere close to exactly.  So we grab a considerable
    +    // sample of these matrices and make sure that we solved a substantial fraction of them.
    +
    +    for (kase <- 0 until 100) {
    +      val (ata, atb) = genOnesData(n, rand)
    +      val x = NNLSbyPCG.solve(ata, atb, true, ws)
    +      assert(x.length == n)
    +      var error = 0.0
    +      var solved = true
    +      for (i <- 0 until n) {
    +        error = error + (x(i) - 1) * (x(i) - 1)
    +        if (Math.abs(x(i) - 1) > 1e-3) solved = false
    +      }
    +      if (error > 1e-2) solved = false
    +      if (solved) numSolved = numSolved + 1
    +    }
    +    println(numSolved)
    --- End diff --
    
    Remove this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11813502
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean): Array[Double] = {
    --- End diff --
    
    @mengxr Should be handled that way, but I prefer this since it lets me check that CG is kicking in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40981933
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985279
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    --- End diff --
    
    `an NNLS`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985263
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -627,6 +684,37 @@ object ALS {
        * @param blocks     level of parallelism to split computation into
        * @param alpha      confidence parameter (only applies when immplicitPrefs = true)
        * @param seed       random seed
    +   * @param nonnegative Whether to impose nonnegativity upon the user and product factors
    +   */
    +  def trainImplicit(
    +      ratings: RDD[Rating],
    +      rank: Int,
    +      iterations: Int,
    +      lambda: Double,
    +      blocks: Int,
    +      alpha: Double,
    +      seed: Long,
    +      nonnegative: Boolean
    --- End diff --
    
    Ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41438784
  
    Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40912816
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40981946
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41421733
  
    Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985477
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    +    NativeBlas.dgemv('T', n, n, 1.0, A.data, 0, n, b.data, 0, 1, 0.0, atb.data, 0, 1)
    +
    +    (ata, atb)
    +  }
    +
    +  test("NNLSbyPCG: exact solution cases") {
    +    val n = 20
    +    val rand = new Random(12346)
    +    val ws = NNLSbyPCG.createWorkspace(n)
    +    var numSolved = 0
    +
    +    // About 15% of random 20x20 [-1,1]-matrices have a singular value less than 1e-3.  NNLSbyPCG
    +    // can legitimately fail to solve these anywhere close to exactly.  So we grab a considerable
    +    // sample of these matrices and make sure that we solved a substantial fraction of them.
    +
    +    for (kase <- 0 until 100) {
    --- End diff --
    
    `kase` -> `k`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11984827
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    +        grad.data(i) = 0.0
    +        x.data(i) = 0.0
    +        dir.data(i) = 0.0
    +        lastDir.data(i) = 0.0
    +        res.data(i) = 0.0
    +        i = i + 1
    +      }
    +    }
    +  }
    +
    +  def createWorkspace(n: Int): Workspace = {
    +    new Workspace(n)
    +  }
    +
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   *
    +   * We solve the problem
    +   *   min_x      1/2 x^T ata x^T - x^T atb
    +   *   subject to x >= 0 (if nonnegative == true)
    +   *
    +   * The method used is similar to one described by Polyak (B. T. Polyak, The conjugate gradient
    +   * method in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112) for bound-
    +   * constrained nonlinear programming.  Polyak unconditionally uses a conjugate gradient
    +   * direction, however, while this method only uses a conjugate gradient direction if the last
    +   * iteration did not cause a previously-inactive constraint to become active.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean,
    +      ws: Workspace): Array[Double] = {
    +    ws.wipe()
    +
    +    val n = atb.rows
    +    val scratch = ws.scratch
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, res: DoubleMatrix): Double = {
    +      val top = SimpleBlas.dot(dir, res)
    +      SimpleBlas.gemv(1.0, ata, dir, 0.0, scratch)
    +      // Push the denominator upward very slightly to avoid infinities and silliness
    +      top / (SimpleBlas.dot(scratch, dir) + 1e-20)
    +    }
    +
    +    // stopping condition
    +    def stop(step: Double, ndir: Double, nx: Double): Boolean = {
    +        ((step != step) // NaN
    --- End diff --
    
    Scala has `Double#isNaN`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41430529
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41453017
  
    Same output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802116
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +import org.scalatest.matchers.ShouldMatchers
    +
    +import org.apache.spark.mllib.util.LocalSparkContext
    +
    +import org.jblas.DoubleMatrix
    +import org.jblas.SimpleBlas
    +
    +class NNLSSuite extends FunSuite with LocalSparkContext with ShouldMatchers {
    +  test("NNLSbyPCG: exact solution case") {
    +    val A = new DoubleMatrix(20, 20)
    +    val b = new DoubleMatrix(20, 1)
    +    val rand = new Random(12345)
    +    for (i <- 0 until 20; j <- 0 until 20) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(20, 20)
    +    val atb = new DoubleMatrix(20, 1)
    +    for (i <- 0 until 20; j <- 0 until 20; k <- 0 until 20) {
    --- End diff --
    
    Use jblas functions to compute ata and atb.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40953381
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14284/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802032
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean): Array[Double] = {
    +    val n = atb.rows
    +    val scratch = new DoubleMatrix(n, 1)
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, resid: DoubleMatrix): Double = {
    +      val top = SimpleBlas.dot(dir, resid)
    +      SimpleBlas.gemv(1.0, ata, dir, 0.0, scratch)
    +      // Push the denominator upward very slightly to avoid infinities and silliness
    +      top / (SimpleBlas.dot(scratch, dir) + 1e-20)
    +    }
    +
    +    // stopping condition
    +    def stop(step: Double, ndir: Double, nx: Double): Boolean = {
    +        ((step != step) // NaN
    +      || (step < 1e-6) // too small or negative
    +      || (step > 1e40) // too small; almost certainly numerical problems
    +      || (ndir < 1e-12 * nx) // gradient relatively too small
    +      || (ndir < 1e-32) // gradient absolutely too small; numerical issues may lurk
    +      )
    +    }
    +
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastdir = new DoubleMatrix(n, 1)
    +    val resid = new DoubleMatrix(n, 1)
    +    var lastnorm = 0.0
    --- End diff --
    
    `lastNorm`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r12005355
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -537,6 +566,34 @@ object ALS {
        * in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the
        * product of two lower-rank matrices of a given rank (number of features). To solve for these
        * features, we run a given number of iterations of ALS. This is done using a level of
    +   * parallelism given by `blocks`, partitioning the data using the Partitioner `partitioner`.
    +   *
    +   * @param ratings     RDD of (userID, productID, rating) pairs
    +   * @param rank        number of features to use
    +   * @param iterations  number of iterations of ALS (recommended: 10-20)
    +   * @param lambda      regularization factor (recommended: 0.01)
    +   * @param blocks      level of parallelism to split computation into
    +   * @param seed        random seed
    +   * @param nonnegative Whether to impose nonnegativity constraints
    +   */
    +  def train(
    +      ratings: RDD[Rating],
    +      rank: Int,
    +      iterations: Int,
    +      lambda: Double,
    +      blocks: Int,
    +      seed: Long,
    +      nonnegative: Boolean) = {
    --- End diff --
    
    @mengxr I did that, but it meant no longer marking ALS's constructor private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-44866020
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r12010640
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -91,7 +93,7 @@ case class Rating(val user: Int, val product: Int, val rating: Double)
      * indicated user
      * preferences rather than explicit ratings given to items.
      */
    -class ALS private (
    +class ALS (
    --- End diff --
    
    Users don't need to use this default constructor. We have a public constructor with no arguments:
    
    ~~~
     def this() = this(-1, 10, 10, 0.01, false, 1.0)
    ~~~
    
    So we can leave it private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11984932
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    +        grad.data(i) = 0.0
    +        x.data(i) = 0.0
    +        dir.data(i) = 0.0
    +        lastDir.data(i) = 0.0
    +        res.data(i) = 0.0
    +        i = i + 1
    +      }
    +    }
    +  }
    +
    +  def createWorkspace(n: Int): Workspace = {
    +    new Workspace(n)
    +  }
    +
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   *
    +   * We solve the problem
    +   *   min_x      1/2 x^T ata x^T - x^T atb
    +   *   subject to x >= 0 (if nonnegative == true)
    +   *
    +   * The method used is similar to one described by Polyak (B. T. Polyak, The conjugate gradient
    +   * method in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112) for bound-
    +   * constrained nonlinear programming.  Polyak unconditionally uses a conjugate gradient
    +   * direction, however, while this method only uses a conjugate gradient direction if the last
    +   * iteration did not cause a previously-inactive constraint to become active.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean,
    +      ws: Workspace): Array[Double] = {
    +    ws.wipe()
    +
    +    val n = atb.rows
    +    val scratch = ws.scratch
    +
    +    // find the optimal unconstrained step
    --- End diff --
    
    `unconstrained step` -> `unconstrained CG step`? Both gradient descent and CG are used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41443224
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] spark pull request: Alternating nonnegative least-squares

Posted by Debasish Das <de...@gmail.com>.
Hi Guys,

Could you please point to the JIRA for this ?

Why are we not doing NNMF using Breeze's constrained solver ? After
Xiangrui's sparse checkin we are trying to make use of the optimization
code that is available in breeze or extend upon it....

Breeze's constrained solver is also a projected gradient solver which uses
quasi newton matrix and it can handle bound constraints...I am not sure how
it works on more complex constraints but NNMF is a classic bound
constrained problem....

Thanks.
Deb



On Sun, Apr 20, 2014 at 10:42 PM, mengxr <gi...@git.apache.org> wrote:

> Github user mengxr commented on a diff in the pull request:
>
>     https://github.com/apache/spark/pull/460#discussion_r11802315
>
>     --- Diff:
> mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
>     @@ -0,0 +1,141 @@
>     +/*
>     + * Licensed to the Apache Software Foundation (ASF) under one or more
>     + * contributor license agreements.  See the NOTICE file distributed
> with
>     + * this work for additional information regarding copyright ownership.
>     + * The ASF licenses this file to You under the Apache License,
> Version 2.0
>     + * (the "License"); you may not use this file except in compliance
> with
>     + * the License.  You may obtain a copy of the License at
>     + *
>     + *    http://www.apache.org/licenses/LICENSE-2.0
>     + *
>     + * Unless required by applicable law or agreed to in writing, software
>     + * distributed under the License is distributed on an "AS IS" BASIS,
>     + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>     + * See the License for the specific language governing permissions and
>     + * limitations under the License.
>     + */
>     +
>     +package org.apache.spark.mllib.optimization
>     +
>     +import org.jblas.{DoubleMatrix, SimpleBlas}
>     +import org.apache.spark.annotation.DeveloperApi
>     +
>     +/**
>     + * :: DeveloperApi ::
>     + * Object used to solve nonnegative least squares problems using a
> modified
>     + * projected gradient method.
>     + */
>     +@DeveloperApi
>     +object NNLSbyPCG {
>     +  /**
>     +   * Solve a least squares problem, possibly with nonnegativity
> constraints, by a modified
>     +   * projected gradient method.  That is, find x minimising ||Ax -
> b||_2 given A^T A and A^T b.
>     +   */
>     +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative:
> Boolean): Array[Double] = {
>     +    val n = atb.rows
>     +    val scratch = new DoubleMatrix(n, 1)
>     +
>     +    // find the optimal unconstrained step
>     +    def steplen(dir: DoubleMatrix, resid: DoubleMatrix): Double = {
>     --- End diff --
>
>     `stepSize` or `stepLength`?
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802315
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean): Array[Double] = {
    +    val n = atb.rows
    +    val scratch = new DoubleMatrix(n, 1)
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, resid: DoubleMatrix): Double = {
    --- End diff --
    
    `stepSize` or `stepLength`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802030
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean): Array[Double] = {
    +    val n = atb.rows
    +    val scratch = new DoubleMatrix(n, 1)
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, resid: DoubleMatrix): Double = {
    +      val top = SimpleBlas.dot(dir, resid)
    +      SimpleBlas.gemv(1.0, ata, dir, 0.0, scratch)
    +      // Push the denominator upward very slightly to avoid infinities and silliness
    +      top / (SimpleBlas.dot(scratch, dir) + 1e-20)
    +    }
    +
    +    // stopping condition
    +    def stop(step: Double, ndir: Double, nx: Double): Boolean = {
    +        ((step != step) // NaN
    +      || (step < 1e-6) // too small or negative
    +      || (step > 1e40) // too small; almost certainly numerical problems
    +      || (ndir < 1e-12 * nx) // gradient relatively too small
    +      || (ndir < 1e-32) // gradient absolutely too small; numerical issues may lurk
    +      )
    +    }
    +
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastdir = new DoubleMatrix(n, 1)
    --- End diff --
    
    Use camelCase: `lastDir`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40916235
  
    @tmyklebu Nice work! Could you create a JIRA for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41448678
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14498/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40953380
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985249
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -537,6 +566,34 @@ object ALS {
        * in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the
        * product of two lower-rank matrices of a given rank (number of features). To solve for these
        * features, we run a given number of iterations of ALS. This is done using a level of
    +   * parallelism given by `blocks`, partitioning the data using the Partitioner `partitioner`.
    +   *
    +   * @param ratings     RDD of (userID, productID, rating) pairs
    +   * @param rank        number of features to use
    +   * @param iterations  number of iterations of ALS (recommended: 10-20)
    +   * @param lambda      regularization factor (recommended: 0.01)
    +   * @param blocks      level of parallelism to split computation into
    +   * @param seed        random seed
    +   * @param nonnegative Whether to impose nonnegativity constraints
    +   */
    +  def train(
    +      ratings: RDD[Rating],
    +      rank: Int,
    +      iterations: Int,
    +      lambda: Double,
    +      blocks: Int,
    +      seed: Long,
    +      nonnegative: Boolean) = {
    --- End diff --
    
    Those static methods shouldn't be used for more than 3 parameters. It is extremely hard to remember the order. Even with docs, they are still not user friendly. I'm actually okay with removing this static method and using the builder pattern to create an ALS instance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985346
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    +    NativeBlas.dgemv('T', n, n, 1.0, A.data, 0, n, b.data, 0, 1, 0.0, atb.data, 0, 1)
    --- End diff --
    
    Ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41649508
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41415229
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-44866006
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11984717
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    +        grad.data(i) = 0.0
    +        x.data(i) = 0.0
    +        dir.data(i) = 0.0
    +        lastDir.data(i) = 0.0
    +        res.data(i) = 0.0
    +        i = i + 1
    +      }
    +    }
    +  }
    +
    +  def createWorkspace(n: Int): Workspace = {
    +    new Workspace(n)
    +  }
    +
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   *
    +   * We solve the problem
    +   *   min_x      1/2 x^T ata x^T - x^T atb
    --- End diff --
    
    Do we need to mention `lambda I` somewhere? It is actually `AtA + \lambda I`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40984979
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14305/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41415244
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40960991
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14288/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40970393
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985176
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -537,6 +566,34 @@ object ALS {
        * in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the
        * product of two lower-rank matrices of a given rank (number of features). To solve for these
        * features, we run a given number of iterations of ALS. This is done using a level of
    +   * parallelism given by `blocks`, partitioning the data using the Partitioner `partitioner`.
    +   *
    +   * @param ratings     RDD of (userID, productID, rating) pairs
    +   * @param rank        number of features to use
    +   * @param iterations  number of iterations of ALS (recommended: 10-20)
    +   * @param lambda      regularization factor (recommended: 0.01)
    +   * @param blocks      level of parallelism to split computation into
    +   * @param seed        random seed
    +   * @param nonnegative Whether to impose nonnegativity constraints
    +   */
    +  def train(
    +      ratings: RDD[Rating],
    +      rank: Int,
    +      iterations: Int,
    +      lambda: Double,
    +      blocks: Int,
    +      seed: Long,
    +      nonnegative: Boolean) = {
    +    val als = new ALS(blocks, rank, iterations, lambda, false, 1.0, seed)
    +    if (nonnegative) als.setNonnegative(true)
    --- End diff --
    
    `als.setNonnegative(nonnegative)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41647146
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r12005322
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    +        grad.data(i) = 0.0
    +        x.data(i) = 0.0
    +        dir.data(i) = 0.0
    +        lastDir.data(i) = 0.0
    +        res.data(i) = 0.0
    +        i = i + 1
    +      }
    +    }
    +  }
    +
    +  def createWorkspace(n: Int): Workspace = {
    +    new Workspace(n)
    +  }
    +
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   *
    +   * We solve the problem
    +   *   min_x      1/2 x^T ata x^T - x^T atb
    +   *   subject to x >= 0 (if nonnegative == true)
    +   *
    +   * The method used is similar to one described by Polyak (B. T. Polyak, The conjugate gradient
    +   * method in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112) for bound-
    +   * constrained nonlinear programming.  Polyak unconditionally uses a conjugate gradient
    +   * direction, however, while this method only uses a conjugate gradient direction if the last
    +   * iteration did not cause a previously-inactive constraint to become active.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean,
    +      ws: Workspace): Array[Double] = {
    +    ws.wipe()
    +
    +    val n = atb.rows
    +    val scratch = ws.scratch
    +
    +    // find the optimal unconstrained step
    --- End diff --
    
    @mengxr We have a search direction and a quadratic objective function.  This finds the minimum value of the quadratic along the given line, ignoring nonnegativity constraints.  Works whether you have a CG or a steepest descent direction (and it's used for both).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41438788
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14488/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11813421
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean): Array[Double] = {
    +    val n = atb.rows
    +    val scratch = new DoubleMatrix(n, 1)
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, resid: DoubleMatrix): Double = {
    --- End diff --
    
    Traditional name for this is "alpha."  I'll go with stepLength.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985336
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    --- End diff --
    
    Performance is not an issue here. We can use high-level APIs like `DoubleMatrix#mmul` and `DoubleMatrix#transpose`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802063
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +import org.scalatest.matchers.ShouldMatchers
    +
    +import org.apache.spark.mllib.util.LocalSparkContext
    +
    +import org.jblas.DoubleMatrix
    +import org.jblas.SimpleBlas
    +
    +class NNLSSuite extends FunSuite with LocalSparkContext with ShouldMatchers {
    --- End diff --
    
    You don't need `LocalSparkContext` and `ShouldMatchers`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11801987
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean): Array[Double] = {
    --- End diff --
    
    Since the class name is `NNLS`, we expect `nonnegative = true`. The unconstrained case should be handled by direct solvers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11984783
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    +        grad.data(i) = 0.0
    +        x.data(i) = 0.0
    +        dir.data(i) = 0.0
    +        lastDir.data(i) = 0.0
    +        res.data(i) = 0.0
    +        i = i + 1
    +      }
    +    }
    +  }
    +
    +  def createWorkspace(n: Int): Workspace = {
    +    new Workspace(n)
    +  }
    +
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   *
    +   * We solve the problem
    +   *   min_x      1/2 x^T ata x^T - x^T atb
    +   *   subject to x >= 0 (if nonnegative == true)
    +   *
    +   * The method used is similar to one described by Polyak (B. T. Polyak, The conjugate gradient
    +   * method in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112) for bound-
    +   * constrained nonlinear programming.  Polyak unconditionally uses a conjugate gradient
    +   * direction, however, while this method only uses a conjugate gradient direction if the last
    +   * iteration did not cause a previously-inactive constraint to become active.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean,
    --- End diff --
    
    The `nonnegative` is not necessary. It can solve unconstrained LS iteratively, but we have better algorithms for that, like LSQR and QR/SVD. Also, the object name is `NNLS`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40984978
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-44870921
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15351/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-44875956
  
    Merged. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41421734
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14485/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985078
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    +        grad.data(i) = 0.0
    +        x.data(i) = 0.0
    +        dir.data(i) = 0.0
    +        lastDir.data(i) = 0.0
    +        res.data(i) = 0.0
    +        i = i + 1
    +      }
    +    }
    +  }
    +
    +  def createWorkspace(n: Int): Workspace = {
    +    new Workspace(n)
    +  }
    +
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   *
    +   * We solve the problem
    +   *   min_x      1/2 x^T ata x^T - x^T atb
    +   *   subject to x >= 0 (if nonnegative == true)
    +   *
    +   * The method used is similar to one described by Polyak (B. T. Polyak, The conjugate gradient
    +   * method in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112) for bound-
    +   * constrained nonlinear programming.  Polyak unconditionally uses a conjugate gradient
    +   * direction, however, while this method only uses a conjugate gradient direction if the last
    +   * iteration did not cause a previously-inactive constraint to become active.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean,
    +      ws: Workspace): Array[Double] = {
    +    ws.wipe()
    +
    +    val n = atb.rows
    +    val scratch = ws.scratch
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, res: DoubleMatrix): Double = {
    +      val top = SimpleBlas.dot(dir, res)
    +      SimpleBlas.gemv(1.0, ata, dir, 0.0, scratch)
    +      // Push the denominator upward very slightly to avoid infinities and silliness
    +      top / (SimpleBlas.dot(scratch, dir) + 1e-20)
    +    }
    +
    +    // stopping condition
    +    def stop(step: Double, ndir: Double, nx: Double): Boolean = {
    +        ((step != step) // NaN
    +      || (step < 1e-6) // too small or negative
    +      || (step > 1e40) // too small; almost certainly numerical problems
    +      || (ndir < 1e-12 * nx) // gradient relatively too small
    +      || (ndir < 1e-32) // gradient absolutely too small; numerical issues may lurk
    +      )
    +    }
    +
    +    val grad = ws.grad
    +    val x = ws.x
    +    val dir = ws.dir
    +    val lastDir = ws.lastDir
    +    val res = ws.res
    +    val iterMax = Math.max(400, 20 * n)
    +    var lastNorm = 0.0
    +    var iterno = 0
    +    var lastWall = 0 // Last iteration when we hit a bound constraint.
    +    var i = 0
    +    while (iterno < iterMax) {
    +      // find the residual
    +      SimpleBlas.gemv(1.0, ata, x, 0.0, res)
    +      SimpleBlas.axpy(-1.0, atb, res)
    +      SimpleBlas.copy(res, grad)
    +
    +      // project the gradient
    +      if (nonnegative) {
    +        i = 0
    +        while (i < n) {
    +          if (grad.data(i) > 0.0 && x.data(i) == 0.0) {
    +            grad.data(i) = 0.0
    +          }
    +          i = i + 1
    +        }
    +      }
    +      val ngrad = SimpleBlas.dot(grad, grad)
    +
    +      SimpleBlas.copy(grad, dir)
    +
    +      // use a CG direction under certain conditions
    +      var step = steplen(grad, res)
    +      var ndir = 0.0
    +      val nx = SimpleBlas.dot(x, x)
    +      if (iterno > lastWall + 1) {
    +        val alpha = ngrad / lastNorm
    +        SimpleBlas.axpy(alpha, lastDir, dir)
    +        val dstep = steplen(dir, res)
    +        ndir = SimpleBlas.dot(dir, dir)
    +        if (stop(dstep, ndir, nx)) {
    +          // reject the CG step if it could lead to premature termination
    +          SimpleBlas.copy(grad, dir)
    +          ndir = SimpleBlas.dot(dir, dir)
    +        } else {
    +          step = dstep
    +        }
    +      } else {
    +        ndir = SimpleBlas.dot(dir, dir)
    +      }
    +
    +      // terminate?
    +      if (stop(step, ndir, nx)) {
    +        return x.data.clone
    +      }
    +
    +      // don't run through the walls
    +      if (nonnegative) {
    +        i = 0
    +        while (i < n) {
    +          if (step * dir.data(i) > x.data(i)) {
    +            step = Math.min(step, x.data(i) / dir.data(i))
    +          }
    +          i = i + 1
    +        }
    +      }
    +
    +      // take the step
    +      i = 0
    +      while (i < n) {
    +        if (nonnegative) {
    +          if (step * dir.data(i) > x.data(i) * (1 - 1e-14)) {
    +            x.data(i) = 0
    +            lastWall = iterno
    +          } else x.data(i) -= step * dir.data(i)
    --- End diff --
    
    ~~~
    } else {
      x.data(i) -= step * dir.data(i)
    }
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-44870919
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11801960
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    --- End diff --
    
    Could you add more comments about the exact problem you are solving? It can help understand the code.
    
    ~~~
    minimize 1/2 * x^T (AtA) x - (Atb)^T x
    subject to x >= 0.
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r12005233
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    +        grad.data(i) = 0.0
    +        x.data(i) = 0.0
    +        dir.data(i) = 0.0
    +        lastDir.data(i) = 0.0
    +        res.data(i) = 0.0
    +        i = i + 1
    +      }
    +    }
    +  }
    +
    +  def createWorkspace(n: Int): Workspace = {
    +    new Workspace(n)
    +  }
    +
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   *
    +   * We solve the problem
    +   *   min_x      1/2 x^T ata x^T - x^T atb
    --- End diff --
    
    @mengxr I don't think so.  Regularisation is equivalent to tossing a new block of constraints that just say "lambda x = 0" in this context.  So it's already rolled into ata and atb before it makes it to the NNLS solver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40949714
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r12005168
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    --- End diff --
    
    I removed Line 24 here.  Seems like your coding style has two newlines after the package statement and two newlines between external imports and Spark imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41422324
  
    Manually aborted, it seems?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11984657
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    --- End diff --
    
    Since this is already package private, we don't need the annotation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985496
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    +    NativeBlas.dgemv('T', n, n, 1.0, A.data, 0, n, b.data, 0, 1, 0.0, atb.data, 0, 1)
    +
    +    (ata, atb)
    +  }
    +
    +  test("NNLSbyPCG: exact solution cases") {
    +    val n = 20
    +    val rand = new Random(12346)
    +    val ws = NNLSbyPCG.createWorkspace(n)
    +    var numSolved = 0
    +
    +    // About 15% of random 20x20 [-1,1]-matrices have a singular value less than 1e-3.  NNLSbyPCG
    +    // can legitimately fail to solve these anywhere close to exactly.  So we grab a considerable
    +    // sample of these matrices and make sure that we solved a substantial fraction of them.
    +
    +    for (kase <- 0 until 100) {
    +      val (ata, atb) = genOnesData(n, rand)
    +      val x = NNLSbyPCG.solve(ata, atb, true, ws)
    +      assert(x.length == n)
    +      var error = 0.0
    +      var solved = true
    +      for (i <- 0 until n) {
    --- End diff --
    
    `DoubleMatrix#distance2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40957081
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802128
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +import org.scalatest.matchers.ShouldMatchers
    +
    +import org.apache.spark.mllib.util.LocalSparkContext
    +
    +import org.jblas.DoubleMatrix
    +import org.jblas.SimpleBlas
    +
    +class NNLSSuite extends FunSuite with LocalSparkContext with ShouldMatchers {
    +  test("NNLSbyPCG: exact solution case") {
    +    val A = new DoubleMatrix(20, 20)
    +    val b = new DoubleMatrix(20, 1)
    +    val rand = new Random(12345)
    +    for (i <- 0 until 20; j <- 0 until 20) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(20, 20)
    +    val atb = new DoubleMatrix(20, 1)
    +    for (i <- 0 until 20; j <- 0 until 20; k <- 0 until 20) {
    +      ata.put(i, j, ata.get(i, j) + A.get(k, i) * A.get(k, j))
    +    }
    +    for (i <- 0 until 20; j <- 0 until 20) {
    +      atb.put(i, atb.get(i, 0) + A.get(j, i) * b.get(j))
    +    }
    +
    +    val x = NNLSbyPCG.solve(ata, atb, true)
    +    assert(x.length == 20)
    +    var error = 0.0
    +    for (i <- 0 until 20) {
    +      error = error + (x(i) - 1) * (x(i) - 1)
    --- End diff --
    
    Should create a test problem with some constraints active.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11801898
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    --- End diff --
    
    Mark it `private[mllib]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41443324
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985153
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -490,6 +504,8 @@ class ALS private (
           }
         }
     
    +    val ws = NNLSbyPCG.createWorkspace(rank)
    --- End diff --
    
    This is not necessary if `nonnegative` is false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41439353
  
    "Build was aborted."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41430514
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-44865790
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985572
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    +    NativeBlas.dgemv('T', n, n, 1.0, A.data, 0, n, b.data, 0, 1, 0.0, atb.data, 0, 1)
    +
    +    (ata, atb)
    +  }
    +
    +  test("NNLSbyPCG: exact solution cases") {
    +    val n = 20
    +    val rand = new Random(12346)
    +    val ws = NNLSbyPCG.createWorkspace(n)
    +    var numSolved = 0
    +
    +    // About 15% of random 20x20 [-1,1]-matrices have a singular value less than 1e-3.  NNLSbyPCG
    +    // can legitimately fail to solve these anywhere close to exactly.  So we grab a considerable
    +    // sample of these matrices and make sure that we solved a substantial fraction of them.
    +
    +    for (kase <- 0 until 100) {
    +      val (ata, atb) = genOnesData(n, rand)
    +      val x = NNLSbyPCG.solve(ata, atb, true, ws)
    +      assert(x.length == n)
    +      var error = 0.0
    +      var solved = true
    +      for (i <- 0 until n) {
    +        error = error + (x(i) - 1) * (x(i) - 1)
    +        if (Math.abs(x(i) - 1) > 1e-3) solved = false
    +      }
    +      if (error > 1e-2) solved = false
    +      if (solved) numSolved = numSolved + 1
    +    }
    +    println(numSolved)
    +
    +    assert(numSolved > 50)
    +  }
    +
    +  test("NNLSbyPCG: nonnegativity constraint active") {
    +    val n = 5
    +    val M = Array(
    +      Array( 4.377, -3.531, -1.306, -0.139,  3.418, -1.632),
    +      Array(-3.531,  4.344,  0.934,  0.305, -2.140,  2.115),
    +      Array(-1.306,  0.934,  2.644, -0.203, -0.170,  1.094),
    +      Array(-0.139,  0.305, -0.203,  5.883,  1.428, -1.025),
    +      Array( 3.418, -2.140, -0.170,  1.428,  4.684, -0.636))
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) ata.put(i, j, M(i)(j))
    +    for (i <- 0 until n) atb.put(i, M(i)(n))
    --- End diff --
    
    `DoubleMatrix#getColumn`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40912817
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14274/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985423
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    --- End diff --
    
    ~~~
    val A = new DoubleMatrix(n, n, Array.fill(n * n)(rand.nextDouble()): _*)
    val b = A.mult(DoubleMatrix.zeros(n, 1))
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985552
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    +    NativeBlas.dgemv('T', n, n, 1.0, A.data, 0, n, b.data, 0, 1, 0.0, atb.data, 0, 1)
    +
    +    (ata, atb)
    +  }
    +
    +  test("NNLSbyPCG: exact solution cases") {
    +    val n = 20
    +    val rand = new Random(12346)
    +    val ws = NNLSbyPCG.createWorkspace(n)
    +    var numSolved = 0
    +
    +    // About 15% of random 20x20 [-1,1]-matrices have a singular value less than 1e-3.  NNLSbyPCG
    +    // can legitimately fail to solve these anywhere close to exactly.  So we grab a considerable
    +    // sample of these matrices and make sure that we solved a substantial fraction of them.
    +
    +    for (kase <- 0 until 100) {
    +      val (ata, atb) = genOnesData(n, rand)
    +      val x = NNLSbyPCG.solve(ata, atb, true, ws)
    +      assert(x.length == n)
    +      var error = 0.0
    +      var solved = true
    +      for (i <- 0 until n) {
    +        error = error + (x(i) - 1) * (x(i) - 1)
    +        if (Math.abs(x(i) - 1) > 1e-3) solved = false
    +      }
    +      if (error > 1e-2) solved = false
    +      if (solved) numSolved = numSolved + 1
    +    }
    +    println(numSolved)
    +
    +    assert(numSolved > 50)
    +  }
    +
    +  test("NNLSbyPCG: nonnegativity constraint active") {
    +    val n = 5
    +    val M = Array(
    +      Array( 4.377, -3.531, -1.306, -0.139,  3.418, -1.632),
    +      Array(-3.531,  4.344,  0.934,  0.305, -2.140,  2.115),
    +      Array(-1.306,  0.934,  2.644, -0.203, -0.170,  1.094),
    +      Array(-0.139,  0.305, -0.203,  5.883,  1.428, -1.025),
    +      Array( 3.418, -2.140, -0.170,  1.428,  4.684, -0.636))
    +    val ata = new DoubleMatrix(n, n)
    --- End diff --
    
    `DoubleMatrix` has a constructor for `double[][]`. Maybe you can use it directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40970409
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11801894
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    --- End diff --
    
    Add an empty line to separate spark imports from 3rd-party ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40911791
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11802036
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +import org.apache.spark.annotation.DeveloperApi
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +object NNLSbyPCG {
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean): Array[Double] = {
    +    val n = atb.rows
    +    val scratch = new DoubleMatrix(n, 1)
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, resid: DoubleMatrix): Double = {
    +      val top = SimpleBlas.dot(dir, resid)
    +      SimpleBlas.gemv(1.0, ata, dir, 0.0, scratch)
    +      // Push the denominator upward very slightly to avoid infinities and silliness
    +      top / (SimpleBlas.dot(scratch, dir) + 1e-20)
    +    }
    +
    +    // stopping condition
    +    def stop(step: Double, ndir: Double, nx: Double): Boolean = {
    +        ((step != step) // NaN
    +      || (step < 1e-6) // too small or negative
    +      || (step > 1e40) // too small; almost certainly numerical problems
    +      || (ndir < 1e-12 * nx) // gradient relatively too small
    +      || (ndir < 1e-32) // gradient absolutely too small; numerical issues may lurk
    +      )
    +    }
    +
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastdir = new DoubleMatrix(n, 1)
    +    val resid = new DoubleMatrix(n, 1)
    +    var lastnorm = 0.0
    +    var iterno = 0
    +    var lastwall = 0
    --- End diff --
    
    rename to `lastWall` and also add comment about this variable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11984566
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    --- End diff --
    
    Remove extra empty lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40960990
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/460


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41647153
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40957072
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41443309
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11984683
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    --- End diff --
    
    Do you think `NNLS` a better name? Then we can have `solveByPCG` as the method name (and possibly add other methods in the future). We can keep the default `solve` and link it to `solveByPCG`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985041
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas}
    +
    +import org.apache.spark.annotation.DeveloperApi
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Object used to solve nonnegative least squares problems using a modified
    + * projected gradient method.
    + */
    +@DeveloperApi
    +private[mllib] object NNLSbyPCG {
    +  class Workspace(val n: Int) {
    +    val scratch = new DoubleMatrix(n, 1)
    +    val grad = new DoubleMatrix(n, 1)
    +    val x = new DoubleMatrix(n, 1)
    +    val dir = new DoubleMatrix(n, 1)
    +    val lastDir = new DoubleMatrix(n, 1)
    +    val res = new DoubleMatrix(n, 1)
    +
    +    def wipe() {
    +      var i: Int = 0
    +      while (i < n) {
    +        scratch.data(i) = 0.0
    +        grad.data(i) = 0.0
    +        x.data(i) = 0.0
    +        dir.data(i) = 0.0
    +        lastDir.data(i) = 0.0
    +        res.data(i) = 0.0
    +        i = i + 1
    +      }
    +    }
    +  }
    +
    +  def createWorkspace(n: Int): Workspace = {
    +    new Workspace(n)
    +  }
    +
    +  /**
    +   * Solve a least squares problem, possibly with nonnegativity constraints, by a modified
    +   * projected gradient method.  That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
    +   *
    +   * We solve the problem
    +   *   min_x      1/2 x^T ata x^T - x^T atb
    +   *   subject to x >= 0 (if nonnegative == true)
    +   *
    +   * The method used is similar to one described by Polyak (B. T. Polyak, The conjugate gradient
    +   * method in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112) for bound-
    +   * constrained nonlinear programming.  Polyak unconditionally uses a conjugate gradient
    +   * direction, however, while this method only uses a conjugate gradient direction if the last
    +   * iteration did not cause a previously-inactive constraint to become active.
    +   */
    +  def solve(ata: DoubleMatrix, atb: DoubleMatrix, nonnegative: Boolean,
    +      ws: Workspace): Array[Double] = {
    +    ws.wipe()
    +
    +    val n = atb.rows
    +    val scratch = ws.scratch
    +
    +    // find the optimal unconstrained step
    +    def steplen(dir: DoubleMatrix, res: DoubleMatrix): Double = {
    +      val top = SimpleBlas.dot(dir, res)
    +      SimpleBlas.gemv(1.0, ata, dir, 0.0, scratch)
    +      // Push the denominator upward very slightly to avoid infinities and silliness
    +      top / (SimpleBlas.dot(scratch, dir) + 1e-20)
    +    }
    +
    +    // stopping condition
    +    def stop(step: Double, ndir: Double, nx: Double): Boolean = {
    +        ((step != step) // NaN
    +      || (step < 1e-6) // too small or negative
    +      || (step > 1e40) // too small; almost certainly numerical problems
    +      || (ndir < 1e-12 * nx) // gradient relatively too small
    +      || (ndir < 1e-32) // gradient absolutely too small; numerical issues may lurk
    +      )
    +    }
    +
    +    val grad = ws.grad
    +    val x = ws.x
    +    val dir = ws.dir
    +    val lastDir = ws.lastDir
    +    val res = ws.res
    +    val iterMax = Math.max(400, 20 * n)
    +    var lastNorm = 0.0
    +    var iterno = 0
    +    var lastWall = 0 // Last iteration when we hit a bound constraint.
    +    var i = 0
    +    while (iterno < iterMax) {
    +      // find the residual
    +      SimpleBlas.gemv(1.0, ata, x, 0.0, res)
    +      SimpleBlas.axpy(-1.0, atb, res)
    +      SimpleBlas.copy(res, grad)
    +
    +      // project the gradient
    +      if (nonnegative) {
    +        i = 0
    +        while (i < n) {
    +          if (grad.data(i) > 0.0 && x.data(i) == 0.0) {
    +            grad.data(i) = 0.0
    +          }
    +          i = i + 1
    +        }
    +      }
    +      val ngrad = SimpleBlas.dot(grad, grad)
    +
    +      SimpleBlas.copy(grad, dir)
    +
    +      // use a CG direction under certain conditions
    +      var step = steplen(grad, res)
    +      var ndir = 0.0
    +      val nx = SimpleBlas.dot(x, x)
    +      if (iterno > lastWall + 1) {
    +        val alpha = ngrad / lastNorm
    +        SimpleBlas.axpy(alpha, lastDir, dir)
    +        val dstep = steplen(dir, res)
    +        ndir = SimpleBlas.dot(dir, dir)
    +        if (stop(dstep, ndir, nx)) {
    +          // reject the CG step if it could lead to premature termination
    +          SimpleBlas.copy(grad, dir)
    +          ndir = SimpleBlas.dot(dir, dir)
    +        } else {
    +          step = dstep
    +        }
    +      } else {
    +        ndir = SimpleBlas.dot(dir, dir)
    +      }
    +
    +      // terminate?
    +      if (stop(step, ndir, nx)) {
    +        return x.data.clone
    +      }
    +
    +      // don't run through the walls
    +      if (nonnegative) {
    +        i = 0
    +        while (i < n) {
    +          if (step * dir.data(i) > x.data(i)) {
    +            step = Math.min(step, x.data(i) / dir.data(i))
    --- End diff --
    
    `step = x.data(i) / dir.data(i)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985589
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    +    NativeBlas.dgemv('T', n, n, 1.0, A.data, 0, n, b.data, 0, 1, 0.0, atb.data, 0, 1)
    +
    +    (ata, atb)
    +  }
    +
    +  test("NNLSbyPCG: exact solution cases") {
    +    val n = 20
    +    val rand = new Random(12346)
    +    val ws = NNLSbyPCG.createWorkspace(n)
    +    var numSolved = 0
    +
    +    // About 15% of random 20x20 [-1,1]-matrices have a singular value less than 1e-3.  NNLSbyPCG
    +    // can legitimately fail to solve these anywhere close to exactly.  So we grab a considerable
    +    // sample of these matrices and make sure that we solved a substantial fraction of them.
    +
    +    for (kase <- 0 until 100) {
    +      val (ata, atb) = genOnesData(n, rand)
    +      val x = NNLSbyPCG.solve(ata, atb, true, ws)
    +      assert(x.length == n)
    +      var error = 0.0
    +      var solved = true
    +      for (i <- 0 until n) {
    +        error = error + (x(i) - 1) * (x(i) - 1)
    +        if (Math.abs(x(i) - 1) > 1e-3) solved = false
    +      }
    +      if (error > 1e-2) solved = false
    +      if (solved) numSolved = numSolved + 1
    +    }
    +    println(numSolved)
    +
    +    assert(numSolved > 50)
    +  }
    +
    +  test("NNLSbyPCG: nonnegativity constraint active") {
    +    val n = 5
    +    val M = Array(
    +      Array( 4.377, -3.531, -1.306, -0.139,  3.418, -1.632),
    +      Array(-3.531,  4.344,  0.934,  0.305, -2.140,  2.115),
    +      Array(-1.306,  0.934,  2.644, -0.203, -0.170,  1.094),
    +      Array(-0.139,  0.305, -0.203,  5.883,  1.428, -1.025),
    +      Array( 3.418, -2.140, -0.170,  1.428,  4.684, -0.636))
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) ata.put(i, j, M(i)(j))
    +    for (i <- 0 until n) atb.put(i, M(i)(n))
    +
    +    val goodx = Array(0.13025, 0.54506, 0.2874, 0.0, 0.028628)
    +
    +    val ws = NNLSbyPCG.createWorkspace(n)
    +    val x = NNLSbyPCG.solve(ata, atb, true, ws)
    +    for (i <- 0 until n) {
    +      assert(Math.abs(x(i) - goodx(i)) < 1e-3)
    --- End diff --
    
    Also need to assert `x(i) >= 0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41645936
  
    @tmyklebu Could you try to merge the latest master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41448675
  
    Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Alternating nonnegative least-squares

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-40911794
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/460#discussion_r11985522
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/NNLSSuite.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.util.Random
    +
    +import org.scalatest.FunSuite
    +
    +import org.jblas.{DoubleMatrix, SimpleBlas, NativeBlas}
    +
    +class NNLSSuite extends FunSuite {
    +  /** Generate a NNLS problem whose optimal solution is the all-ones vector. */
    +  def genOnesData(n: Int, rand: Random): (DoubleMatrix, DoubleMatrix) = {
    +    val A = new DoubleMatrix(n, n)
    +    val b = new DoubleMatrix(n, 1)
    +    for (i <- 0 until n; j <- 0 until n) {
    +      val aij = rand.nextDouble()
    +      A.put(i, j, aij)
    +      b.put(i, b.get(i, 0) + aij)
    +    }
    +
    +    val ata = new DoubleMatrix(n, n)
    +    val atb = new DoubleMatrix(n, 1)
    +
    +    NativeBlas.dgemm('T', 'N', n, n, n, 1.0, A.data, 0, n, A.data, 0, n, 0.0, ata.data, 0, n)
    +    NativeBlas.dgemv('T', n, n, 1.0, A.data, 0, n, b.data, 0, 1, 0.0, atb.data, 0, 1)
    +
    +    (ata, atb)
    +  }
    +
    +  test("NNLSbyPCG: exact solution cases") {
    +    val n = 20
    +    val rand = new Random(12346)
    +    val ws = NNLSbyPCG.createWorkspace(n)
    +    var numSolved = 0
    +
    +    // About 15% of random 20x20 [-1,1]-matrices have a singular value less than 1e-3.  NNLSbyPCG
    +    // can legitimately fail to solve these anywhere close to exactly.  So we grab a considerable
    +    // sample of these matrices and make sure that we solved a substantial fraction of them.
    +
    +    for (kase <- 0 until 100) {
    +      val (ata, atb) = genOnesData(n, rand)
    +      val x = NNLSbyPCG.solve(ata, atb, true, ws)
    +      assert(x.length == n)
    --- End diff --
    
    Use `===` instead of `==`. The former outputs more information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/460#issuecomment-41649511
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14560/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---