You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by brkyvz <gi...@git.apache.org> on 2014/09/18 23:29:01 UTC

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

GitHub user brkyvz opened a pull request:

    https://github.com/apache/spark/pull/2451

    [WIP][SPARK-1486][MLlib] Multi Model Training with Gradient Descent

    **Note: This is still a work in progress**
    
    This is the first of the pull requests to support multi-model training in MLlib. It batches examples and trains multiple models with different regularization parameters and step sizes all at once using Matrix-Matrix multiplication. It uses Native BLAS when the data matrix is dense, and uses sparse matrices as much as possible for both better memory utilization and performance (I will post performance results in the comments).
    
    This is a HUGE Pull Request, therefore I'm posting this now. It is not finished, docs need to be updated, code can be somewhat cleaned up for ease of understanding. I'm posting this now so that users can comment and make suggestions along the way.
    
    Most of the PR consists of adding additional Local Matrix operations for the calculation of gradients and losses.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brkyvz/spark SPARK-1486

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2451
    
----
commit 5138d3f220efb8ec4cc7fc944112497ab0c8c50c
Author: Burak <br...@gmail.com>
Date:   2014-09-05T20:34:24Z

    [SPARK-3418][MLlib] Sparse Matrix support and additional native BLAS operations added

commit 4362ff1d6b79604b9919da1d550ed466021b8f5c
Author: Burak <br...@gmail.com>
Date:   2014-09-05T20:51:45Z

    [SPARK-3418][MLlib] Matrix unit tests expanded with indexing and updating

commit 8dcb7632093fc2d87a626b85e08ddabcbf119543
Author: Burak <br...@gmail.com>
Date:   2014-09-05T21:58:55Z

    [SPARK-3418] Fixed Scala-style errors

commit 41b2da30797e3da011e3261f8c2b89b9c1608d23
Author: Burak <br...@gmail.com>
Date:   2014-09-06T01:30:29Z

    [SPARK-3418] Fixed failing Matrix unit test

commit 56d7c85a1d58ef68c31208e062a4458e170111d3
Author: Burak <br...@gmail.com>
Date:   2014-09-06T02:55:11Z

    [SPARK-3418] Fixed style issues and added documentation for methods

commit 848406c1b6779eea9bbcf1dd582b541db46d7dad
Author: Burak <br...@gmail.com>
Date:   2014-09-06T04:43:00Z

    [SPARK-3418] Fixed one more style issue

commit eeb13ebda3223eb2f5fc36e08303e5b33d76de96
Author: Burak <br...@gmail.com>
Date:   2014-09-09T00:55:40Z

    [SPARK-3418] Code review comments addressed and multiplication further optimized

commit a85ccb712d83d20b178b40c32bd473d7d018a88f
Author: Burak <br...@gmail.com>
Date:   2014-09-09T07:22:09Z

    [SPARK-3418] New code review comments addressed

commit d510c8f940faee3bdb2b00306f150bc99630396b
Author: Burak <br...@gmail.com>
Date:   2014-09-14T04:07:59Z

    [SPARK-3418] Squashed missing alpha bug.

commit 418def8e940b93f0d24e9b3158ecc0e130d16a83
Author: Burak <br...@gmail.com>
Date:   2014-09-17T23:53:03Z

    sealed traits Vector and Matrix

commit f79db9c0d82ceea41d59972e3bde9fa2a17b6112
Author: Burak <br...@gmail.com>
Date:   2014-09-18T06:45:08Z

    9/17 comments addressed

commit d16268496cefe06d8818b20721eede56a9de41a2
Author: Burak <br...@gmail.com>
Date:   2014-09-18T18:55:28Z

    [SPARK-3418] Fixed MiMa compatibility issues (excluded from check)

commit 272feb9f63517c52e1991988b4fbd8869a992dc4
Author: Burak <br...@gmail.com>
Date:   2014-09-18T20:29:26Z

    really fixed MiMa issue

commit 5e7d74408fd5f4e521f4e3a7e94a289d59454913
Author: Burak <br...@gmail.com>
Date:   2014-09-18T21:16:05Z

    [WIP][SPARK-1486][MLlib] Initial commit for multi-model training

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17810600
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    +      data: RDD[(Double, Vector)],
    +      gradient: MultiModelGradient,
    +      updater: Array[MultiModelUpdater],
    +      stepSize: Array[Double],
    +      numIterations: Array[Int],
    +      regParam: Array[Double],
    +      miniBatchFraction: Double,
    +      initialWeights: Vector,
    +      batchSize: Int = 64,
    +      useSparse: Boolean = true,
    +      buildSparseThreshold: Double = 0.2): (Matrix, Array[Vector]) = {
    +
    +    val maxNumIter = numIterations.max
    +    val stochasticLossHistory = new ArrayBuffer[Vector](maxNumIter)
    +
    +    val numExamples = data.count()
    +    val miniBatchSize = numExamples * miniBatchFraction
    +    val numModels = stepSize.length * regParam.length
    +    val numFeatures = initialWeights.size
    +    val numRegularizers = updater.length
    +    val updaterCounter = 0 until numRegularizers
    +    // Initialize weights as a column vector
    +    var weights = updaterCounter.map { i =>
    +      new DenseMatrix(numFeatures, 1, initialWeights.toArray).
    +        multiply(DenseMatrix.ones(1, numModels))
    +    }
    +
    +    var finalWeights: Matrix = new DenseMatrix(numFeatures, 0, Array.empty[Double])
    +
    +    // if no data, return initial weights to avoid NaNs
    +    if (numExamples == 0) {
    +
    --- End diff --
    
    spacing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17807436
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala ---
    @@ -241,4 +241,4 @@ class SparseVector(
       }
     
       private[mllib] override def toBreeze: BV[Double] = new BSV[Double](indices, values, size)
    -}
    +}
    --- End diff --
    
    newline?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17813479
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescentSuite.scala ---
    @@ -0,0 +1,444 @@
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.JavaConversions._
    +import scala.util.Random
    +
    +import org.scalatest.{FunSuite, Matchers}
    +
    +import org.apache.spark.mllib.linalg.{DenseMatrix, Matrices, Vectors}
    +import org.apache.spark.mllib.regression._
    +import org.apache.spark.mllib.util.{LinearDataGenerator, LocalClusterSparkContext, LocalSparkContext}
    +import org.apache.spark.mllib.util.TestingUtils._
    +
    +object MultiModelGradientDescentSuite {
    +
    +  def generateLogisticInputAsList(
    +                                   offset: Double,
    +                                   scale: Double,
    +                                   nPoints: Int,
    +                                   seed: Int): java.util.List[LabeledPoint] = {
    +    seqAsJavaList(generateGDInput(offset, scale, nPoints, seed))
    +  }
    +
    +  // Generate input of the form Y = logistic(offset + scale * X)
    +  def generateGDInput(
    +                       offset: Double,
    +                       scale: Double,
    +                       nPoints: Int,
    +                       seed: Int): Seq[LabeledPoint]  = {
    +    val rnd = new Random(seed)
    +    val x1 = Array.fill[Double](nPoints)(rnd.nextGaussian())
    +
    +    val unifRand = new Random(45)
    +    val rLogis = (0 until nPoints).map { i =>
    +      val u = unifRand.nextDouble()
    +      math.log(u) - math.log(1.0-u)
    +    }
    +
    +    val y: Seq[Int] = (0 until nPoints).map { i =>
    +      val yVal = offset + scale * x1(i) + rLogis(i)
    +      if (yVal > 0) 1 else 0
    +    }
    +
    +    (0 until nPoints).map(i => LabeledPoint(y(i), Vectors.dense(x1(i))))
    +  }
    +
    +  def generateSVMInputAsList(
    +                              intercept: Double,
    +                              weights: Array[Double],
    +                              nPoints: Int,
    +                              seed: Int): java.util.List[LabeledPoint] = {
    +    seqAsJavaList(generateSVMInput(intercept, weights, nPoints, seed))
    +  }
    +
    +  // Generate noisy input of the form Y = signum(x.dot(weights) + intercept + noise)
    +  def generateSVMInput(
    +                        intercept: Double,
    +                        weights: Array[Double],
    +                        nPoints: Int,
    +                        seed: Int): Seq[LabeledPoint] = {
    +    val rnd = new Random(seed)
    +    val weightsMat = new DenseMatrix(weights.length, 1, weights)
    +    val x = Array.fill[Array[Double]](nPoints)(
    +      Array.fill[Double](weights.length)(rnd.nextDouble() * 2.0 - 1.0))
    +    val y = x.map { xi =>
    +      val yD = (new DenseMatrix(1, xi.length, xi) multiply weightsMat) +
    +        intercept + 0.01 * rnd.nextGaussian()
    +      if (yD.toArray(0) < 0) 0.0 else 1.0
    +    }
    +    y.zip(x).map(p => LabeledPoint(p._1, Vectors.dense(p._2)))
    +  }
    +}
    +
    +class MultiModelGradientDescentSuite extends FunSuite with LocalSparkContext with Matchers {
    +  test("Assert the loss is decreasing.") {
    +    val nPoints = 10000
    +    val A = 2.0
    +    val B = -1.5
    +
    +    val initialB = -1.0
    +    val initialWeights = Array(initialB)
    +
    +    val gradient = new MultiModelLogisticGradient()
    +    val updater: Array[MultiModelUpdater] = Array(new MultiModelSimpleUpdater())
    +    val stepSize = Array(1.0, 0.1)
    +    val numIterations = Array(10)
    +    val regParam = Array(0.0)
    +    val miniBatchFrac = 1.0
    +
    +    // Add a extra variable consisting of all 1.0's for the intercept.
    +    val testData = GradientDescentSuite.generateGDInput(A, B, nPoints, 42)
    +    val data = testData.map { case LabeledPoint(label, features) =>
    +      label -> Vectors.dense(1.0 +: features.toArray)
    +    }
    +
    +    val dataRDD = sc.parallelize(data, 2).cache()
    +    val initialWeightsWithIntercept = Vectors.dense(1.0 +: initialWeights.toArray)
    +
    +    val (_, loss) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      dataRDD,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFrac,
    +      initialWeightsWithIntercept)
    +
    +    assert(loss.last(0) - loss.head(0) < 0, "loss isn't decreasing.")
    +
    +    val lossDiff = loss.init.zip(loss.tail).map { case (lhs, rhs) => lhs(0) - rhs(0) }
    +    assert(lossDiff.count(_ > 0).toDouble / lossDiff.size > 0.8)
    +  }
    +
    +  test("Test the loss and gradient of first iteration with regularization.") {
    +
    +    val gradient = new MultiModelLogisticGradient()
    +    val updater: Array[MultiModelUpdater] = Array(new MultiModelSquaredL2Updater())
    +
    +    // Add a extra variable consisting of all 1.0's for the intercept.
    +    val testData = GradientDescentSuite.generateGDInput(2.0, -1.5, 10000, 42)
    +    val data = testData.map { case LabeledPoint(label, features) =>
    +      label -> Vectors.dense(1.0 +: features.toArray)
    +    }
    +
    +    val dataRDD = sc.parallelize(data, 2).cache()
    +
    +    // Prepare non-zero weights
    +    val initialWeightsWithIntercept = Vectors.dense(1.0, 0.5)
    +
    +    val regParam0 = Array(0.0)
    +    val (newWeights0, loss0) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      dataRDD, gradient, updater, Array(1.0), Array(1), regParam0, 1.0, initialWeightsWithIntercept)
    +
    +    val regParam1 = Array(1.0)
    +    val (newWeights1, loss1) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      dataRDD, gradient, updater, Array(1.0), Array(1), regParam1, 1.0, initialWeightsWithIntercept)
    +
    +    assert(
    +      loss1(0)(0) ~== (loss0(0)(0) + (math.pow(initialWeightsWithIntercept(0), 2) +
    +        math.pow(initialWeightsWithIntercept(1), 2)) / 2) absTol 1E-5,
    +      """For non-zero weights, the regVal should be \frac{1}{2}\sum_i w_i^2.""")
    +
    +    assert(
    +      (newWeights1(0, 0) ~== (newWeights0(0, 0) - initialWeightsWithIntercept(0)) absTol 1E-5) &&
    +        (newWeights1(1, 0) ~== (newWeights0(1, 0) - initialWeightsWithIntercept(1)) absTol 1E-5),
    +      "The different between newWeights with/without regularization " +
    +        "should be initialWeightsWithIntercept.")
    +  }
    +
    +  test("Check for correctness: LogisticRegression-(SimpleUpdater+SquaredL2Updater)") {
    +    val nPoints = 100
    +    val A = 2.0
    +    val B = -1.5
    +
    +    val initialB = -1.0
    +    val initialWeights = Array(initialB)
    +
    +    val gradient = new MultiModelLogisticGradient()
    +    val updater: Array[MultiModelUpdater] =
    +      Array(new MultiModelSimpleUpdater(), new MultiModelSquaredL2Updater())
    +    val stepSize = Array(1.0, 0.1)
    +    val numIterations = Array(10)
    +    val regParam = Array(0.0, 0.1, 1.0)
    +    val miniBatchFrac = 1.0
    +
    +    // Add a extra variable consisting of all 1.0's for the intercept.
    +    val testData = GradientDescentSuite.generateGDInput(A, B, nPoints, 42)
    +    val data = testData.map { case LabeledPoint(label, features) =>
    +      label -> Vectors.dense(1.0 +: features.toArray)
    +    }
    +    val numModels = stepSize.length * regParam.length
    +
    +    val dataRDD = sc.parallelize(data, 2).cache()
    +
    +    val forLoop = (0 until numModels).map { i =>
    +      val (weightsGD, loss) = GradientDescent.runMiniBatchSGD(
    +        dataRDD,
    +        new LogisticGradient(),
    +        new SimpleUpdater(),
    +        stepSize(math.round(i * 1.0 / numModels).toInt),
    +        numIterations(0),
    +        regParam(i % regParam.length),
    +        miniBatchFrac,
    +        Vectors.dense(1.0 +: initialWeights.toArray.clone()))
    +      (weightsGD, loss)
    +    }
    +    val forLoop2 = (0 until numModels).map { i =>
    +      val (weightsGD2, loss) = GradientDescent.runMiniBatchSGD(
    +        dataRDD,
    +        new LogisticGradient(),
    +        new SquaredL2Updater(),
    +        stepSize(math.round(i * 1.0 / numModels).toInt),
    +        numIterations(0),
    +        regParam(i % regParam.length),
    +        miniBatchFrac,
    +        Vectors.dense(1.0 +: initialWeights.toArray.clone()))
    +      (weightsGD2, loss)
    +    }
    +
    +    val res2 = Matrices.horzCat(forLoop.map(v => new DenseMatrix(v._1.size, 1, v._1.toArray)) ++
    +      forLoop2.map(v => new DenseMatrix(v._1.size, 1, v._1.toArray)))
    +
    +    val (weightsMMGD, mmLoss) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      dataRDD,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFrac,
    +      Vectors.dense(1.0 +: initialWeights.toArray))
    +
    +    assert(res2 ~== weightsMMGD absTol 1e-10)
    +
    +    val gdLosses1 = forLoop.map(_._2.last)
    +    val gdLosses2 = forLoop2.map(_._2.last)
    +    val lastFromGD = Vectors.dense((gdLosses1 ++ gdLosses2).toArray[Double])
    +
    +    assert(lastFromGD ~== mmLoss.last absTol 1e-10)
    +  }
    +
    +  // Test if we can correctly learn Y = 10*X1 + 10*X10000
    +  test("use sparse matrices instead of dense") {
    +    val nPoints = 100
    +
    +    val denseRDD = sc.parallelize(
    +      LinearDataGenerator.generateLinearInput(0.0, Array(10.0, 10.0), nPoints, 42), 2)
    +    val sparseRDD = denseRDD.map { case LabeledPoint(label, v) =>
    +      val sv = Vectors.sparse(10000, Seq((0, v(0)), (9999, v(1))))
    +      (label, sv)
    +    }.cache()
    +    val gradient = new MultiModelLeastSquaresGradient()
    +    val updater: Array[MultiModelUpdater] = Array(new MultiModelSquaredL2Updater())
    +    val stepSize = Array(1.0, 0.1)
    +    val numIterations = Array(10)
    +    val regParam = Array(0.0, 0.1, 1.0)
    +    val miniBatchFrac = 1.0
    +    val initialWeights = Array.fill(10000)(0.0)
    +    // Add a extra variable consisting of all 1.0's for the intercept.
    +
    +    val numModels = stepSize.length * regParam.length
    +
    +    val forLoop = (0 until numModels).map { i =>
    +      val (weightsGD, loss) = GradientDescent.runMiniBatchSGD(
    +        sparseRDD,
    +        new LeastSquaresGradient(),
    +        new SquaredL2Updater(),
    +        stepSize(math.round(i * 1.0 / numModels).toInt),
    +        numIterations(0),
    +        regParam(i % regParam.length),
    +        miniBatchFrac,
    +        Vectors.dense(initialWeights.clone()))
    +      (weightsGD, loss)
    +    }
    +
    +    val res = Matrices.horzCat(forLoop.map(v => new DenseMatrix(v._1.size, 1, v._1.toArray)))
    +
    +    val (weightsMMGD, mmLoss) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      sparseRDD,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFrac,
    +      Vectors.dense(initialWeights))
    +
    +    assert(res ~== weightsMMGD absTol 1e-10)
    +
    +    val gdLosses1 = forLoop.map(_._2.last)
    +    val lastFromGD = Vectors.dense(gdLosses1.toArray[Double])
    +
    +    assert(lastFromGD ~== mmLoss.last absTol 1e-10)
    +  }
    +
    +  test("Check for correctness: LeastSquaresRegression-SquaredL2Updater & multiple numIterations") {
    +    val nPoints = 100
    +    val numFeatures = 5
    +
    +    val initialWeights = Matrices.zeros(numFeatures, 1).toArray
    +
    +    // Pick weights as random values distributed uniformly in [-0.5, 0.5]
    +    val w = Matrices.rand(numFeatures, 1) -= 0.5
    +
    +    // Use half of data for training and other half for validation
    +    val data = LinearDataGenerator.generateLinearInput(0.0, w.toArray, nPoints, 42, 10.0)
    +
    +    val gradient = new MultiModelLeastSquaresGradient()
    +    val updater: Array[MultiModelUpdater] = Array(new MultiModelSquaredL2Updater())
    +    val stepSize = Array(1.0, 0.1)
    +    val numIterations = Array(10, 20)
    +    val regParam = Array(0.0, 0.1, 1.0)
    +    val miniBatchFrac = 1.0
    +
    +    val dataRDD = sc.parallelize(data, 2).map( p => (p.label, p.features)).cache()
    +    val numModels = stepSize.length * regParam.length
    +
    +    val forLoop = (0 until numModels).map { i =>
    +      val (weightsGD2, loss) = GradientDescent.runMiniBatchSGD(
    +        dataRDD,
    +        new LeastSquaresGradient(),
    +        new SquaredL2Updater(),
    +        stepSize(math.round(i * 1.0 / numModels).toInt),
    +        numIterations(0),
    +        regParam(i % regParam.length),
    +        miniBatchFrac,
    +        Vectors.dense(initialWeights.clone()))
    +      (weightsGD2, loss)
    +    }
    +    val forLoop2 = (0 until numModels).map { i =>
    +      val (weightsGD2, loss) = GradientDescent.runMiniBatchSGD(
    +        dataRDD,
    +        new LeastSquaresGradient(),
    +        new SquaredL2Updater(),
    +        stepSize(math.round(i * 1.0 / numModels).toInt),
    +        numIterations(1),
    +        regParam(i % regParam.length),
    +        miniBatchFrac,
    +        Vectors.dense(initialWeights.clone()))
    +      (weightsGD2, loss)
    +    }
    +    val res = Matrices.horzCat(forLoop.map( v => new DenseMatrix(v._1.size, 1, v._1.toArray)) ++
    +      forLoop2.map( v => new DenseMatrix(v._1.size, 1, v._1.toArray)))
    +
    +    val (weightsMMGD, mmLoss) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      dataRDD,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFrac,
    +      Vectors.dense(initialWeights))
    +
    +    assert(res ~== weightsMMGD absTol 1e-10)
    +
    +    val gdLosses1 = forLoop.map(_._2.last)
    +    val gdLosses2 = forLoop2.map(_._2.last)
    +    val lastFromGD = Vectors.dense((gdLosses1 ++ gdLosses2).toArray)
    +
    +    val mmLossTogether = Vectors.dense(mmLoss(numIterations(0) - 1).toArray ++
    +      mmLoss(numIterations(1) - 1).toArray)
    +
    +    assert(lastFromGD ~== mmLossTogether absTol 1e-10)
    +  }
    +
    +  test("Check for correctness: SVM-(L1Updater+SquaredL2Updater)") {
    +    val nPoints = 100
    +
    +    val initialWeights = Array(1.0, 0.0, 0.0)
    +
    +    val A = 0.01
    +    val B = -1.5
    +    val C = 1.0
    +
    +    val testData = MultiModelGradientDescentSuite.
    +      generateSVMInput(A, Array[Double](B, C), nPoints, 42)
    +
    +    val data = testData.map { case LabeledPoint(label, features) =>
    +      label -> Vectors.dense(1.0 +: features.toArray)
    +    }
    +
    +    val gradient = new MultiModelHingeGradient()
    +    val updater: Array[MultiModelUpdater] =
    +      Array(new MultiModelL1Updater, new MultiModelSquaredL2Updater)
    +    val stepSize = Array(1.0, 0.1)
    +    val numIterations = Array(10)
    +    val regParam = Array(0.0, 0.1)
    +    val miniBatchFrac = 1.0
    +
    +    val dataRDD = sc.parallelize(data, 2).cache()
    +    val numModels = stepSize.length * regParam.length
    +
    +    val forLoop1 = (0 until numModels).map { i =>
    +      val (weightsGD2, loss) = GradientDescent.runMiniBatchSGD(
    +        dataRDD,
    +        new HingeGradient(),
    +        new L1Updater(),
    +        stepSize(math.round(i * 1.0 / numModels).toInt),
    +        numIterations(0),
    +        regParam(i % regParam.length),
    +        miniBatchFrac,
    +        Vectors.dense(initialWeights.clone()))
    +      (weightsGD2, loss)
    +    }
    +    val forLoop2 = (0 until numModels).map { i =>
    +      val (weightsGD2, loss) = GradientDescent.runMiniBatchSGD(
    +        dataRDD,
    +        new HingeGradient(),
    +        new SquaredL2Updater(),
    +        stepSize(math.round(i * 1.0 / numModels).toInt),
    +        numIterations(0),
    +        regParam(i % regParam.length),
    +        miniBatchFrac,
    +        Vectors.dense(initialWeights.clone()))
    +      (weightsGD2, loss)
    +    }
    +
    +    val res = Matrices.horzCat(forLoop1.map( v => new DenseMatrix(v._1.size, 1, v._1.toArray)) ++
    +      forLoop2.map( v => new DenseMatrix(v._1.size, 1, v._1.toArray)))
    +
    +    val (weightsMMGD, mmLoss) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      dataRDD,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFrac,
    +      Vectors.dense(initialWeights))
    +
    +    assert(res ~== weightsMMGD absTol 1e-10)
    +
    +    val gdLosses1 = forLoop1.map(_._2.last)
    +    val gdLosses2 = forLoop2.map(_._2.last)
    +    val lastFromGD = Vectors.dense((gdLosses1 ++ gdLosses2).toArray[Double])
    +
    +    assert(lastFromGD ~== mmLoss.last absTol 1e-10)
    +  }
    +}
    +
    +class MultiModelGradientDescentClusterSuite extends FunSuite with LocalClusterSparkContext {
    +
    +  test("task size should be small") {
    --- End diff --
    
    What is this testing?  Is this supposed to throw & then catch an error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56230988
  
    Lots more tests to do for the MatricesSuite.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765178
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        f2jBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      // Perform matrix multiplication and add to C. The rows of A are multiplied by the columns of
    +      // B, and added to C.
    +      var colCounterForB = 0 // the column to be updated in C
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Bstart = colCounterForB * kB
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA) {
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B.values(Bstart + colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA){
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B(colCounterForB, colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A and `SparseMatrix` B.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: SparseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Bvals = B.values
    +    val Brows = if (!transB) B.rowIndices else B.colPtrs
    +    val Bcols = if (!transB) B.colPtrs else B.rowIndices
    +
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB){ // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Bcols(colCounterForB)
    +            val Astart = rowCounterForA * kA
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Bvals(i) * A.values(Astart + Brows(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var rowCounterForA = 0
    +        while (rowCounterForA < mA) {
    +          var colCounterForA = 0
    +          val Astart = rowCounterForA * kA
    +          while (colCounterForA < kA) {
    +            var i = Brows(colCounterForA)
    +            val indEnd = Brows(colCounterForA + 1)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + colCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          rowCounterForA += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        nativeBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      if (!transB) { // Expensive to put the check inside the loop
    +
    +        // Loop over the columns of B, pick non-zero row in B, select corresponding column in A,
    +        // and update the whole column in C by looping over rows in A.
    +        var colCounterForB = 0 // the column to be updated in C
    +        while (colCounterForB < nB) {
    +          var i = Bcols(colCounterForB)
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (i < indEnd) {
    +            var rowCounterForA = 0
    +            val Bval = Bvals(i)
    +            val Cstart = colCounterForB * mA
    +            val Astart = mA * Brows(i)
    +            while (rowCounterForA < mA){
    +              C.values(Cstart + rowCounterForA) += A.values(Astart + rowCounterForA) * Bval * alpha
    +              rowCounterForA += 1
    +            }
    +            i += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var colCounterForA = 0
    +        while (colCounterForA < kA) {
    +          var rowCounterForA = 0
    +          val Astart = mA * colCounterForA
    +          val indEnd = Brows(colCounterForA + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Brows(colCounterForA)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + rowCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            rowCounterForA += 1
    +          }
    +          colCounterForA += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * @param trans whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param alpha a scalar to scale the multiplication A * x.
    +   * @param A the matrix A that will be left multiplied to x. Size of m x n.
    +   * @param x the vector x that will be left multiplied by A. Size of n x 1.
    +   * @param beta a scalar that can be used to scale vector y.
    +   * @param y the resulting vector y. Size of m x 1.
    +   */
    +  def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit = {
    +
    +    val mA: Int = if (!trans) A.numRows else A.numCols
    +    val nx: Int = x.size
    +    val nA: Int = if (!trans) A.numCols else A.numRows
    +
    +    require(nA == nx, s"The columns of A don't match the number of elements of x. A: $nA, x: $nx")
    +    require(mA == y.size,
    +      s"The rows of A don't match the number of elements of y. A: $mA, y:${y.size}}")
    +    if (alpha == 0.0) {
    +      logDebug("gemv: alpha is equal to 0. Returning y.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          gemv(trans, alpha, sparse, x, beta, y)
    +        case dense: DenseMatrix =>
    +          gemv(trans, alpha, dense, x, beta, y)
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemv doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   *
    +   * @param alpha a scalar to scale the multiplication A * x.
    +   * @param A the matrix A that will be left multiplied to x. Size of m x n.
    +   * @param x the vector x that will be left multiplied by A. Size of n x 1.
    +   * @param beta a scalar that can be used to scale vector y.
    +   * @param y the resulting vector y. Size of m x 1.
    +   */
    +  def gemv(
    +      alpha: Double,
    +      A: Matrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit = {
    +    gemv(false, alpha, A, x, beta, y)
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit =  {
    +    val tStrA = if (!trans) "N" else "T"
    +    nativeBLAS.dgemv(tStrA, A.numRows, A.numCols, alpha, A.values, A.numRows, x.values, 1, beta,
    +      y.values, 1)
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit =  {
    +
    +    val mA: Int = if(!trans) A.numRows else A.numCols
    +    val nA: Int = if(!trans) A.numCols else A.numRows
    +
    +    val Avals = A.values
    +    val Arows = if (!trans) A.rowIndices else A.colPtrs
    +    val Acols = if (!trans) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (trans){
    +      var rowCounter = 0
    +      while (rowCounter < mA){
    +        var i = Arows(rowCounter)
    +        val indEnd = Arows(rowCounter + 1)
    +        var sum = 0.0
    +        while(i < indEnd){
    --- End diff --
    
    spacing again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17808020
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols)
    +    BLAS.gemm(true, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */
    +  def transposeMultiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numCols))
    +    BLAS.gemv(true, 1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** A human readable representation of the matrix */
       override def toString: String = toBreeze.toString()
    +
    +  private[mllib] def map(f: Double => Double): Matrix
    +
    +  private[mllib] def update(f: Double => Double): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double,
    +                                                        y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double,
    +                                                     y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double,
    +                                                     y: Double): Matrix
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y)
    +
    +  private[mllib] def *(y: Matrix) = operate(_ * _, y)
    +
    +  private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y)
    +
    +  private[mllib] def +(y: Matrix) = operate(_ + _, y)
    +
    +  private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y)
    +
    +  private[mllib] def -(y: Matrix) = operate(_ - _, y)
    +
    +  private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y)
    +
    +  private[mllib] def /(y: Matrix) = operate(_ / _, y)
    +
    +  private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y)
    +
    +  private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y)
    +
    +  private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y)
    +
    +  private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y)
    +
    +  private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y)
    +
    +  private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y)
    +
    +  private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y)
    +
    +  private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y)
    +
    +  private[mllib] def neg: Matrix
    +
    +  private[mllib] def negInPlace: Matrix
    +
    +  /** Less-than-or-equal-to check. Outputs binary `DenseMatrix` */
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix
    +
    +  /** Returns the p-th norm for each column */
    +  private[mllib] def colNorms(p: Double): Matrix
    +
    +  private[mllib] def colSums: DenseMatrix = colSums(false)
    +
    +  private[mllib] def colSums(absolute: Boolean, skipRows: DenseMatrix = null): DenseMatrix = {
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    this match {
    +      case sparse: SparseMatrix =>
    +        while (j < numCols){
    +          var i = sparse.colPtrs(j)
    +          val indEnd = sparse.colPtrs(j + 1)
    +          while (i < indEnd){
    +            var v = sparse.values(i)
    +            if (absolute) v = math.abs(v)
    +            sums.values(j) += v
    +            i += 1
    +          }
    +          j += 1
    +        }
    +      case dense: DenseMatrix =>
    +        while (j < numCols){
    +          var i = 0
    +          while (i < numRows){
    +            if (skipRows == null) {
    +              var v = dense.values(index(i, j))
    --- End diff --
    
    This line is a good reason to implement this in DenseMatrix: You could avoid the expensive index (multiplication), and just iterate through counts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17806367
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    +  // in that partition so that we can preallocate a memory efficient buffer
    +  private[mllib] def fromRDD(
    +      rows: RDD[(Double, Vector)],
    +      partitionMetaData: Array[(Int, Int)],
    +      batchSize : Int,
    +      buildSparseThreshold: Double,
    +      generateOnTheFly: Boolean = true): RDD[(DenseMatrix, Matrix)] = {
    +
    +    if (!generateOnTheFly){
    +      rows.mapPartitions { iter =>
    +        iter.grouped(batchSize)
    +      }.map(fromSeq(_, batchSize))
    +    }else {
    +      val numFeatures = rows.first()._2.size
    +
    +      rows.mapPartitionsWithIndex{ case (ind, iter) =>
    +        val findPartition = partitionMetaData.find(_._1 == ind)
    +        val matrixBuffer =
    +          if (findPartition.get._2 != -1) {
    +            val nnz = findPartition.get._2
    +            val density = nnz * 1.0 / (numFeatures * batchSize)
    +            if (density <= buildSparseThreshold) {
    +              (DenseMatrix.zeros(batchSize, 1), new SparseMatrix(numFeatures, batchSize,
    --- End diff --
    
    style: put "new SparseMatrix(...)" on its own line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803111
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    --- End diff --
    
    That was a discussion issue. I'm happy to do it as such, but the problem is for every single function we add, we're going to have to implement the transposed versions as well. The number of functions are currently getting out of hand...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17801574
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols)
    +    BLAS.gemm(true, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */
    +  def transposeMultiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numCols))
    +    BLAS.gemv(true, 1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** A human readable representation of the matrix */
       override def toString: String = toBreeze.toString()
    +
    +  private[mllib] def map(f: Double => Double): Matrix
    +
    +  private[mllib] def update(f: Double => Double): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double,
    +                                                        y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double,
    +                                                     y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double,
    +                                                     y: Double): Matrix
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y)
    +
    +  private[mllib] def *(y: Matrix) = operate(_ * _, y)
    +
    +  private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y)
    +
    +  private[mllib] def +(y: Matrix) = operate(_ + _, y)
    +
    +  private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y)
    +
    +  private[mllib] def -(y: Matrix) = operate(_ - _, y)
    +
    +  private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y)
    +
    +  private[mllib] def /(y: Matrix) = operate(_ / _, y)
    +
    +  private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y)
    +
    +  private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y)
    +
    +  private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y)
    +
    +  private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y)
    +
    +  private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y)
    +
    +  private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y)
    +
    +  private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y)
    +
    +  private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y)
    +
    +  private[mllib] def neg: Matrix
    +
    +  private[mllib] def negInPlace: Matrix
    +
    +  /** Less-than-or-equal-to check. Outputs binary `DenseMatrix` */
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix
    +
    +  /** Returns the p-th norm for each column */
    +  private[mllib] def colNorms(p: Double): Matrix
    +
    +  private[mllib] def colSums: DenseMatrix = colSums(false)
    +
    +  private[mllib] def colSums(absolute: Boolean, skipRows: DenseMatrix = null): DenseMatrix = {
    --- End diff --
    
    skipRows should be a DenseVector, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17801735
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols)
    +    BLAS.gemm(true, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */
    +  def transposeMultiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numCols))
    +    BLAS.gemv(true, 1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** A human readable representation of the matrix */
       override def toString: String = toBreeze.toString()
    +
    +  private[mllib] def map(f: Double => Double): Matrix
    +
    +  private[mllib] def update(f: Double => Double): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double,
    +                                                        y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double,
    +                                                     y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double,
    +                                                     y: Double): Matrix
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y)
    +
    +  private[mllib] def *(y: Matrix) = operate(_ * _, y)
    +
    +  private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y)
    +
    +  private[mllib] def +(y: Matrix) = operate(_ + _, y)
    +
    +  private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y)
    +
    +  private[mllib] def -(y: Matrix) = operate(_ - _, y)
    +
    +  private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y)
    +
    +  private[mllib] def /(y: Matrix) = operate(_ / _, y)
    +
    +  private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y)
    +
    +  private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y)
    +
    +  private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y)
    +
    +  private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y)
    +
    +  private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y)
    +
    +  private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y)
    +
    +  private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y)
    +
    +  private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y)
    +
    +  private[mllib] def neg: Matrix
    +
    +  private[mllib] def negInPlace: Matrix
    +
    +  /** Less-than-or-equal-to check. Outputs binary `DenseMatrix` */
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix
    +
    +  /** Returns the p-th norm for each column */
    +  private[mllib] def colNorms(p: Double): Matrix
    +
    +  private[mllib] def colSums: DenseMatrix = colSums(false)
    +
    +  private[mllib] def colSums(absolute: Boolean, skipRows: DenseMatrix = null): DenseMatrix = {
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    this match {
    +      case sparse: SparseMatrix =>
    +        while (j < numCols){
    +          var i = sparse.colPtrs(j)
    +          val indEnd = sparse.colPtrs(j + 1)
    +          while (i < indEnd){
    +            var v = sparse.values(i)
    +            if (absolute) v = math.abs(v)
    +            sums.values(j) += v
    +            i += 1
    +          }
    +          j += 1
    +        }
    +      case dense: DenseMatrix =>
    +        while (j < numCols){
    +          var i = 0
    +          while (i < numRows){
    +            if (skipRows == null) {
    +              var v = dense.values(index(i, j))
    +              if (absolute) v = math.abs(v)
    +              sums.values(j) += v
    +            } else {
    +              if (skipRows(i) != 1.0) {
    +                var v = dense.values(index(i, j))
    +                if (absolute) v = math.abs(v)
    +                sums.values(j) += v
    +              }
    +            }
    +
    +            i += 1
    +          }
    +          j += 1
    +        }
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def rowSums: DenseMatrix = rowSums(false)
    --- End diff --
    
    Same as colSums: Why not return a DenseVector?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765167
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    --- End diff --
    
    Comment belongs inside "if (transA)"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17806143
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    --- End diff --
    
    We could implement transposed versions of other functions in a lazy manner.  For most functions, we could add a one-line transposeIfNeeded() call.
    
    I'm OK with the current state, but as this API becomes more public, I think a lazy transpose will become more important.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56202513
  
    @anantasty: If you could look through the code and mark places where you're like "What the heck is going on here", it would be easier for me to write up proper comments. I'm going to add a lot today, I can incorporate yours as well. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803218
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    --- End diff --
    
    "1/p" --> "1 / p"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by anantasty <gi...@git.apache.org>.
Github user anantasty commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17801907
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    --- End diff --
    
    +1 for some explanation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802140
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    --- End diff --
    
    ditto below; check please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765188
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        f2jBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      // Perform matrix multiplication and add to C. The rows of A are multiplied by the columns of
    +      // B, and added to C.
    +      var colCounterForB = 0 // the column to be updated in C
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Bstart = colCounterForB * kB
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA) {
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B.values(Bstart + colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA){
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B(colCounterForB, colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A and `SparseMatrix` B.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: SparseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Bvals = B.values
    +    val Brows = if (!transB) B.rowIndices else B.colPtrs
    +    val Bcols = if (!transB) B.colPtrs else B.rowIndices
    +
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB){ // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Bcols(colCounterForB)
    +            val Astart = rowCounterForA * kA
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Bvals(i) * A.values(Astart + Brows(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var rowCounterForA = 0
    +        while (rowCounterForA < mA) {
    +          var colCounterForA = 0
    +          val Astart = rowCounterForA * kA
    +          while (colCounterForA < kA) {
    +            var i = Brows(colCounterForA)
    +            val indEnd = Brows(colCounterForA + 1)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + colCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          rowCounterForA += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        nativeBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      if (!transB) { // Expensive to put the check inside the loop
    +
    +        // Loop over the columns of B, pick non-zero row in B, select corresponding column in A,
    +        // and update the whole column in C by looping over rows in A.
    +        var colCounterForB = 0 // the column to be updated in C
    +        while (colCounterForB < nB) {
    +          var i = Bcols(colCounterForB)
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (i < indEnd) {
    +            var rowCounterForA = 0
    +            val Bval = Bvals(i)
    +            val Cstart = colCounterForB * mA
    +            val Astart = mA * Brows(i)
    +            while (rowCounterForA < mA){
    +              C.values(Cstart + rowCounterForA) += A.values(Astart + rowCounterForA) * Bval * alpha
    +              rowCounterForA += 1
    +            }
    +            i += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var colCounterForA = 0
    +        while (colCounterForA < kA) {
    +          var rowCounterForA = 0
    +          val Astart = mA * colCounterForA
    +          val indEnd = Brows(colCounterForA + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Brows(colCounterForA)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + rowCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            rowCounterForA += 1
    +          }
    +          colCounterForA += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * @param trans whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param alpha a scalar to scale the multiplication A * x.
    +   * @param A the matrix A that will be left multiplied to x. Size of m x n.
    +   * @param x the vector x that will be left multiplied by A. Size of n x 1.
    +   * @param beta a scalar that can be used to scale vector y.
    +   * @param y the resulting vector y. Size of m x 1.
    +   */
    +  def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit = {
    +
    +    val mA: Int = if (!trans) A.numRows else A.numCols
    +    val nx: Int = x.size
    +    val nA: Int = if (!trans) A.numCols else A.numRows
    +
    +    require(nA == nx, s"The columns of A don't match the number of elements of x. A: $nA, x: $nx")
    +    require(mA == y.size,
    +      s"The rows of A don't match the number of elements of y. A: $mA, y:${y.size}}")
    +    if (alpha == 0.0) {
    +      logDebug("gemv: alpha is equal to 0. Returning y.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          gemv(trans, alpha, sparse, x, beta, y)
    +        case dense: DenseMatrix =>
    +          gemv(trans, alpha, dense, x, beta, y)
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemv doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   *
    +   * @param alpha a scalar to scale the multiplication A * x.
    +   * @param A the matrix A that will be left multiplied to x. Size of m x n.
    +   * @param x the vector x that will be left multiplied by A. Size of n x 1.
    +   * @param beta a scalar that can be used to scale vector y.
    +   * @param y the resulting vector y. Size of m x 1.
    +   */
    +  def gemv(
    +      alpha: Double,
    +      A: Matrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit = {
    +    gemv(false, alpha, A, x, beta, y)
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit =  {
    +    val tStrA = if (!trans) "N" else "T"
    +    nativeBLAS.dgemv(tStrA, A.numRows, A.numCols, alpha, A.values, A.numRows, x.values, 1, beta,
    +      y.values, 1)
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit =  {
    +
    +    val mA: Int = if(!trans) A.numRows else A.numCols
    +    val nA: Int = if(!trans) A.numCols else A.numRows
    +
    +    val Avals = A.values
    +    val Arows = if (!trans) A.rowIndices else A.colPtrs
    +    val Acols = if (!trans) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (trans){
    +      var rowCounter = 0
    +      while (rowCounter < mA){
    +        var i = Arows(rowCounter)
    +        val indEnd = Arows(rowCounter + 1)
    +        var sum = 0.0
    +        while(i < indEnd){
    +          sum += Avals(i) * x.values(Acols(i))
    +          i += 1
    +        }
    +        y.values(rowCounter) =  beta * y.values(rowCounter) + sum * alpha
    +        rowCounter += 1
    +      }
    +    } else {
    +      // Scale vector first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    --- End diff --
    
    maybe search the code for "){"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803601
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix =  {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val dup = this.copy
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) =
    +    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
    +
    +  def update(f: Double => Double): SparseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      while (i < indEnd){
    +        sums.values(j) += math.pow(values(i),p)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    sums.update(math.pow(_, 1/p))
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: SparseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: SparseMatrix = {
    +    val copy = this.copy
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, this.toArray)
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  def toDense: DenseMatrix = new DenseMatrix(numRows, numCols, this.toArray)
    +}
    +
    +object SparseMatrix {
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): SparseMatrix = {
    +    new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0))
    +  }
    +
    +  private def genRand(numRows: Int, numCols: Int, raw: Array[Double], nonZero: Int): SparseMatrix = {
    +    val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
    +
    +    val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
    +    val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
    +
    +    var i = 0
    +    var nnz = 0
    +    var lastCol = -1
    +
    +    raw.foreach { v =>
    +      val r = i % numRows
    +      val c = (i - r) / numRows
    +      if ( v != 0.0) {
    +        sRows.append(r)
    +        sparseA.append(v)
    +        while (c != lastCol){
    +          sCols.append(nnz)
    +          lastCol += 1
    +        }
    +        nnz += 1
    +      }
    +      i += 1
    +    }
    +    sCols.append(sparseA.length)
    +    new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray)
    +  }
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): SparseMatrix = {
    +
    +    require(density > 0.0 && density < 1.0, "density must be a double in the range " +
    +      s"0.0 < d < 1.0. Currently, density: $density")
    +    val rand = new XORShiftRandom(seed)
    +    val length = numRows * numCols
    +    val rawA = Array.fill(length)(0.0)
    +    var nnz = 0
    +    for (i <- 0 until length) {
    +      val p = rand.nextDouble()
    +      if (p < density) {
    +        rawA.update(i, rand.nextDouble())
    +        nnz += 1
    +      }
    +    }
    +    genRand(numRows, numCols, rawA, nnz)
    +  }
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `SparseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): SparseMatrix = {
    +
    +    require(density > 0.0 && density < 1.0, "density must be a double in the range " +
    +      s"0.0 < d < 1.0. Currently, density: $density")
    +    val rand = new XORShiftRandom(seed)
    +    val length = numRows * numCols
    +    val rawA = Array.fill(length)(0.0)
    +    var nnz = 0
    +    for (i <- 0 until length) {
    +      val p = rand.nextDouble()
    +      if (p < density) {
    +        rawA.update(i, rand.nextGaussian())
    +        nnz += 1
    +      }
    +    }
    +    genRand(numRows, numCols, rawA, nnz)
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    --- End diff --
    
    correct doc: DenseMatrix -> SparseMatrix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17800692
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    --- End diff --
    
    This seems too unsafe to me.  As long as these functions are all private[mllib], why not define private[mllib] static Multiply and Divide functions and check for those with match-case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17801756
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols)
    +    BLAS.gemm(true, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */
    +  def transposeMultiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numCols))
    +    BLAS.gemv(true, 1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** A human readable representation of the matrix */
       override def toString: String = toBreeze.toString()
    +
    +  private[mllib] def map(f: Double => Double): Matrix
    +
    +  private[mllib] def update(f: Double => Double): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double,
    +                                                        y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double,
    +                                                     y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double,
    +                                                     y: Double): Matrix
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y)
    +
    +  private[mllib] def *(y: Matrix) = operate(_ * _, y)
    +
    +  private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y)
    +
    +  private[mllib] def +(y: Matrix) = operate(_ + _, y)
    +
    +  private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y)
    +
    +  private[mllib] def -(y: Matrix) = operate(_ - _, y)
    +
    +  private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y)
    +
    +  private[mllib] def /(y: Matrix) = operate(_ / _, y)
    +
    +  private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y)
    +
    +  private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y)
    +
    +  private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y)
    +
    +  private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y)
    +
    +  private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y)
    +
    +  private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y)
    +
    +  private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y)
    +
    +  private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y)
    +
    +  private[mllib] def neg: Matrix
    +
    +  private[mllib] def negInPlace: Matrix
    +
    +  /** Less-than-or-equal-to check. Outputs binary `DenseMatrix` */
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix
    +
    +  /** Returns the p-th norm for each column */
    +  private[mllib] def colNorms(p: Double): Matrix
    +
    +  private[mllib] def colSums: DenseMatrix = colSums(false)
    +
    +  private[mllib] def colSums(absolute: Boolean, skipRows: DenseMatrix = null): DenseMatrix = {
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    this match {
    +      case sparse: SparseMatrix =>
    +        while (j < numCols){
    +          var i = sparse.colPtrs(j)
    +          val indEnd = sparse.colPtrs(j + 1)
    +          while (i < indEnd){
    +            var v = sparse.values(i)
    +            if (absolute) v = math.abs(v)
    +            sums.values(j) += v
    +            i += 1
    +          }
    +          j += 1
    +        }
    +      case dense: DenseMatrix =>
    +        while (j < numCols){
    +          var i = 0
    +          while (i < numRows){
    +            if (skipRows == null) {
    +              var v = dense.values(index(i, j))
    +              if (absolute) v = math.abs(v)
    +              sums.values(j) += v
    +            } else {
    +              if (skipRows(i) != 1.0) {
    +                var v = dense.values(index(i, j))
    +                if (absolute) v = math.abs(v)
    +                sums.values(j) += v
    +              }
    +            }
    +
    +            i += 1
    +          }
    +          j += 1
    +        }
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def rowSums: DenseMatrix = rowSums(false)
    +
    +  private[mllib] def rowSums(absolute: Boolean): DenseMatrix = {
    --- End diff --
    
    Same as colSums: Why not be abstract?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17810786
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    +      data: RDD[(Double, Vector)],
    +      gradient: MultiModelGradient,
    +      updater: Array[MultiModelUpdater],
    +      stepSize: Array[Double],
    +      numIterations: Array[Int],
    +      regParam: Array[Double],
    +      miniBatchFraction: Double,
    +      initialWeights: Vector,
    +      batchSize: Int = 64,
    +      useSparse: Boolean = true,
    +      buildSparseThreshold: Double = 0.2): (Matrix, Array[Vector]) = {
    +
    +    val maxNumIter = numIterations.max
    +    val stochasticLossHistory = new ArrayBuffer[Vector](maxNumIter)
    +
    +    val numExamples = data.count()
    +    val miniBatchSize = numExamples * miniBatchFraction
    +    val numModels = stepSize.length * regParam.length
    +    val numFeatures = initialWeights.size
    +    val numRegularizers = updater.length
    +    val updaterCounter = 0 until numRegularizers
    +    // Initialize weights as a column vector
    +    var weights = updaterCounter.map { i =>
    +      new DenseMatrix(numFeatures, 1, initialWeights.toArray).
    +        multiply(DenseMatrix.ones(1, numModels))
    +    }
    +
    +    var finalWeights: Matrix = new DenseMatrix(numFeatures, 0, Array.empty[Double])
    +
    +    // if no data, return initial weights to avoid NaNs
    +    if (numExamples == 0) {
    +
    +      logInfo("GradientDescent.runMiniBatchSGD returning initial weights, no data found")
    +      return (Matrices.horzCat(weights), stochasticLossHistory.toArray)
    +
    +    }
    +    val stepSizeMatrix = new DenseMatrix(1, numModels,
    +      stepSize.flatMap{ ss =>
    +        for (i <- 1 to regParam.length) yield ss
    +      }
    +    )
    +    val regMatrix = SparseMatrix.diag(Vectors.dense(stepSize.flatMap{ ss =>
    --- End diff --
    
    This merits explanation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17813034
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala ---
    @@ -37,4 +37,26 @@ class BreezeMatrixConversionSuite extends FunSuite {
         assert(mat.numCols === breeze.cols)
         assert(mat.values.eq(breeze.data), "should not copy data")
       }
    +
    +  test("sparse matrix to breeze") {
    --- End diff --
    
    Check values too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-72128781
  
    closing this PR as a lot of functionality has changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803143
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix =  {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val dup = this.copy
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) =
    +    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
    +
    +  def update(f: Double => Double): SparseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      while (i < indEnd){
    +        sums.values(j) += math.pow(values(i),p)
    --- End diff --
    
    space between args


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56106610
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20553/consoleFull) for   PR 2451 at commit [`5e7d744`](https://github.com/apache/spark/commit/5e7d74408fd5f4e521f4e3a7e94a289d59454913).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17808127
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    +
    +  /**
    +   * Compute the gradient and loss given the features of a series of data point,
    +   * add the gradient to a provided matrix to avoid creating new objects, and return loss.
    +   *
    +   * @param data features for the data points
    +   * @param label label for the data points
    +   * @param weights weights/coefficients corresponding to features
    +   * @param cumGradient the computed gradient will be added to this matrix
    +   *
    +   * @return loss
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a logistic loss function, as used in binary classification.
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLogisticGradient extends MultiModelGradient {
    +
    +  private def sigmoid(p: DenseMatrix): DenseMatrix = {
    +    def takeSigmoid(p: Double): Double = {
    +      1.0 / (math.exp(-p) + 1.0)
    +    }
    +    p.map(takeSigmoid)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +    val margin = data transposeMultiply weights
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      0.0, gradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val margin = data transposeMultiply weights
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      1.0, cumGradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +      loss.colSums(false, shouldSkip)
    +    } else {
    +      loss.colSums
    +    }
    +  }
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a Least-squared loss function, as used in linear regression.
    + * This is correct for the averaged least squares loss function (mean squared error)
    + *              L = 1/n ||A weights-y||^2
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLeastSquaresGradient extends MultiModelGradient {
    +  override def compute(data: Matrix, label: DenseMatrix,
    --- End diff --
    
    line formatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17810895
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    +      data: RDD[(Double, Vector)],
    +      gradient: MultiModelGradient,
    +      updater: Array[MultiModelUpdater],
    +      stepSize: Array[Double],
    +      numIterations: Array[Int],
    +      regParam: Array[Double],
    +      miniBatchFraction: Double,
    +      initialWeights: Vector,
    +      batchSize: Int = 64,
    +      useSparse: Boolean = true,
    +      buildSparseThreshold: Double = 0.2): (Matrix, Array[Vector]) = {
    +
    +    val maxNumIter = numIterations.max
    +    val stochasticLossHistory = new ArrayBuffer[Vector](maxNumIter)
    +
    +    val numExamples = data.count()
    +    val miniBatchSize = numExamples * miniBatchFraction
    +    val numModels = stepSize.length * regParam.length
    +    val numFeatures = initialWeights.size
    +    val numRegularizers = updater.length
    +    val updaterCounter = 0 until numRegularizers
    +    // Initialize weights as a column vector
    +    var weights = updaterCounter.map { i =>
    +      new DenseMatrix(numFeatures, 1, initialWeights.toArray).
    +        multiply(DenseMatrix.ones(1, numModels))
    +    }
    +
    +    var finalWeights: Matrix = new DenseMatrix(numFeatures, 0, Array.empty[Double])
    +
    +    // if no data, return initial weights to avoid NaNs
    +    if (numExamples == 0) {
    +
    +      logInfo("GradientDescent.runMiniBatchSGD returning initial weights, no data found")
    +      return (Matrices.horzCat(weights), stochasticLossHistory.toArray)
    +
    +    }
    +    val stepSizeMatrix = new DenseMatrix(1, numModels,
    +      stepSize.flatMap{ ss =>
    +        for (i <- 1 to regParam.length) yield ss
    +      }
    +    )
    +    val regMatrix = SparseMatrix.diag(Vectors.dense(stepSize.flatMap{ ss =>
    +      for (reg <- regParam) yield reg
    +    }))
    +
    +    val bcMetaData =
    +      if (useSparse) {
    +        data.context.broadcast(Matrices.getSparsityData(data, batchSize))
    +      } else {
    +        val emptyData: Array[(Int, Int)] = (0 until data.partitions.length).map { i =>
    +          (i, -1)}.toArray
    +        data.context.broadcast(emptyData)
    +      }
    +    val points = Matrices.fromRDD(data, bcMetaData.value, batchSize, buildSparseThreshold)
    +
    +    /**
    +     * For the first iteration, the regVal will be initialized as sum of weight squares
    +     * if it's L2 updater; for L1 updater, the same logic is followed.
    +     */
    +    val updaterWithIndex = updater.zipWithIndex
    +
    +    var regVal = updaterWithIndex.map { case (u, ind) =>
    +      u.compute(weights(ind), DenseMatrix.zeros(numFeatures, numModels),
    +        DenseMatrix.zeros(1, numModels), 1, regMatrix)._2
    +    }
    +    val orderedIters = numIterations.sorted
    +    var iterIndexCounter = 0
    +    for (i <- 1 to maxNumIter) {
    +      val bcWeights = data.context.broadcast(weights)
    +      // Sample a subset (fraction miniBatchFraction) of the total data
    --- End diff --
    
    Add punctuation; these 2 lines look like 1 sentence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17801072
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    --- End diff --
    
    Just wondering (not sure myself): Which is prefered:
    `SparseMatrix`
    or
    [[SparseMatrix]]
    in docs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803825
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix =  {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val dup = this.copy
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) =
    +    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
    +
    +  def update(f: Double => Double): SparseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      while (i < indEnd){
    +        sums.values(j) += math.pow(values(i),p)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    sums.update(math.pow(_, 1/p))
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: SparseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: SparseMatrix = {
    +    val copy = this.copy
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, this.toArray)
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  def toDense: DenseMatrix = new DenseMatrix(numRows, numCols, this.toArray)
    +}
    +
    +object SparseMatrix {
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): SparseMatrix = {
    +    new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0))
    +  }
    +
    +  private def genRand(numRows: Int, numCols: Int, raw: Array[Double], nonZero: Int): SparseMatrix = {
    +    val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
    +
    +    val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
    +    val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
    +
    +    var i = 0
    +    var nnz = 0
    +    var lastCol = -1
    +
    +    raw.foreach { v =>
    +      val r = i % numRows
    +      val c = (i - r) / numRows
    +      if ( v != 0.0) {
    +        sRows.append(r)
    +        sparseA.append(v)
    +        while (c != lastCol){
    +          sCols.append(nnz)
    +          lastCol += 1
    +        }
    +        nnz += 1
    +      }
    +      i += 1
    +    }
    +    sCols.append(sparseA.length)
    +    new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray)
    +  }
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    --- End diff --
    
    No use case in MLlib yet. Randomized SVD for big matrices (distributed) may make use of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803482
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix =  {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val dup = this.copy
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) =
    +    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
    +
    +  def update(f: Double => Double): SparseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      while (i < indEnd){
    +        sums.values(j) += math.pow(values(i),p)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    sums.update(math.pow(_, 1/p))
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: SparseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: SparseMatrix = {
    +    val copy = this.copy
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, this.toArray)
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  def toDense: DenseMatrix = new DenseMatrix(numRows, numCols, this.toArray)
    +}
    +
    +object SparseMatrix {
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): SparseMatrix = {
    +    new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0))
    +  }
    +
    +  private def genRand(numRows: Int, numCols: Int, raw: Array[Double], nonZero: Int): SparseMatrix = {
    --- End diff --
    
    I feel like "nonZeros" (plural) is more common than "nonZero"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56106639
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20553/consoleFull) for   PR 2451 at commit [`5e7d744`](https://github.com/apache/spark/commit/5e7d74408fd5f4e521f4e3a7e94a289d59454913).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `sealed trait Matrix extends Serializable `
      * `class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable `
      * `class SparseMatrix(`
      * `sealed trait Vector extends Serializable `
      * `abstract class MultiModelGradient extends Serializable `
      * `class MultiModelLogisticGradient extends MultiModelGradient `
      * `class MultiModelLeastSquaresGradient extends MultiModelGradient `
      * `class MultiModelHingeGradient extends MultiModelGradient `
      * `trait Optimizer[V] extends Serializable `
      * `abstract class MultiModelUpdater extends Serializable `
      * `class MultiModelSimpleUpdater extends MultiModelUpdater `
      * `class MultiModelL1Updater extends MultiModelUpdater `
      * `class MultiModelSquaredL2Updater extends MultiModelUpdater `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17809784
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    --- End diff --
    
    Are we trying to keep things Java-friendly?  (The default param values won't be.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17804191
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    --- End diff --
    
    Is this doing extra cumulative sums where it should not?  E.g., if the colPtrs were:
    matrix A: [0, 2, 4] (i.e., 2 cols with 2 elements each)
    matrix B: [0, 3, 6] (2 cols with 3 elements each)
    Wouldn't the new ones be incorrect?
    [0, 2, 2+4, 2+4+3, 2+4+3+6]
    The sums should just use the last elements of each matrix's colPtrs, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17801515
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols)
    +    BLAS.gemm(true, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */
    +  def transposeMultiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numCols))
    +    BLAS.gemv(true, 1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** A human readable representation of the matrix */
       override def toString: String = toBreeze.toString()
    +
    +  private[mllib] def map(f: Double => Double): Matrix
    +
    +  private[mllib] def update(f: Double => Double): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double,
    +                                                        y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double,
    +                                                     y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double,
    +                                                     y: Double): Matrix
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y)
    +
    +  private[mllib] def *(y: Matrix) = operate(_ * _, y)
    +
    +  private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y)
    +
    +  private[mllib] def +(y: Matrix) = operate(_ + _, y)
    +
    +  private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y)
    +
    +  private[mllib] def -(y: Matrix) = operate(_ - _, y)
    +
    +  private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y)
    +
    +  private[mllib] def /(y: Matrix) = operate(_ / _, y)
    +
    +  private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y)
    +
    +  private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y)
    +
    +  private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y)
    +
    +  private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y)
    +
    +  private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y)
    +
    +  private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y)
    +
    +  private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y)
    +
    +  private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y)
    +
    +  private[mllib] def neg: Matrix
    +
    +  private[mllib] def negInPlace: Matrix
    +
    +  /** Less-than-or-equal-to check. Outputs binary `DenseMatrix` */
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix
    +
    +  /** Returns the p-th norm for each column */
    +  private[mllib] def colNorms(p: Double): Matrix
    +
    +  private[mllib] def colSums: DenseMatrix = colSums(false)
    --- End diff --
    
    Why do the colSums methods return DenseMatrix instead of DenseVector?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by anantasty <gi...@git.apache.org>.
Github user anantasty commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802806
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    +
    +  /**
    +   * Compute the gradient and loss given the features of a series of data point,
    +   * add the gradient to a provided matrix to avoid creating new objects, and return loss.
    +   *
    +   * @param data features for the data points
    +   * @param label label for the data points
    +   * @param weights weights/coefficients corresponding to features
    +   * @param cumGradient the computed gradient will be added to this matrix
    +   *
    +   * @return loss
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a logistic loss function, as used in binary classification.
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLogisticGradient extends MultiModelGradient {
    +
    +  private def sigmoid(p: DenseMatrix): DenseMatrix = {
    +    def takeSigmoid(p: Double): Double = {
    +      1.0 / (math.exp(-p) + 1.0)
    +    }
    +    p.map(takeSigmoid)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +    val margin = data transposeMultiply weights
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      0.0, gradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val margin = data transposeMultiply weights
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      1.0, cumGradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +      loss.colSums(false, shouldSkip)
    +    } else {
    +      loss.colSums
    +    }
    +  }
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a Least-squared loss function, as used in linear regression.
    + * This is correct for the averaged least squares loss function (mean squared error)
    + *              L = 1/n ||A weights-y||^2
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLeastSquaresGradient extends MultiModelGradient {
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +
    +    val diff = (data transposeMultiply weights).elementWiseOperateOnColumnsInPlace(_ - _, label)
    +
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 2.0, data, diff, 0.0, gradient)
    +
    +    val loss = diff.update(v => v * v)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val diff = (data transposeMultiply weights).elementWiseOperateOnColumnsInPlace(_ - _, label)
    +
    +    gemm(false, false, 2.0, data, diff, 1.0, cumGradient)
    +    val loss = diff.update(v => v * v)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +      loss.colSums(false, shouldSkip)
    +    } else {
    +      loss.colSums
    +    }
    +  }
    +}
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a Hinge loss function, as used in SVM binary classification.
    + * See also the documentation for the precise formulation.
    + * NOTE: This assumes that the labels are {0,1}
    + */
    +@DeveloperApi
    +class MultiModelHingeGradient extends MultiModelGradient {
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +
    +    val dotProduct = data transposeMultiply weights
    +    // Our loss function with {0, 1} labels is max(0, 1 - (2y – 1) (f_w(x)))
    +    // Therefore the gradient is -(2y - 1)*x
    +    val labelScaled = new DenseMatrix(1, label.numRows, label.map(_ * 2 - 1.0).values)
    +
    +    dotProduct.elementWiseOperateOnColumnsInPlace(_ * _, labelScaled)
    +
    +    val gradientMultiplier = data.elementWiseOperateOnRows(_ * _, labelScaled.negInPlace)
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +    val activeExamples = dotProduct.compare(1.0, _ < _) // Examples where the hinge is active
    +
    +    gemm(false, false, 1.0, gradientMultiplier, activeExamples, 1.0, gradient)
    +
    +    val loss = activeExamples.elementWiseOperateInPlace(_ * _, dotProduct.update(1 - _))
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix = {
    +
    +    val dotProduct = data transposeMultiply weights
    +    // Our loss function with {0, 1} labels is max(0, 1 - (2y – 1) (f_w(x)))
    +    // Therefore the gradient is -(2y - 1)*x
    +    val labelScaled = new DenseMatrix(1, label.numRows, label.map(_ * 2 - 1.0).values)
    +    dotProduct.elementWiseOperateOnColumnsInPlace(_ * _, labelScaled)
    +
    +    val gradientMultiplier = data.elementWiseOperateOnRows(_ * _, labelScaled.negInPlace)
    +
    +    val activeExamples = dotProduct.compare(1.0, _ < _) // Examples where the hinge is active
    +
    +    gemm(false, false, 1.0, gradientMultiplier, activeExamples, 1.0, cumGradient)
    +
    +    val loss = activeExamples.elementWiseOperateInPlace(_ * _, dotProduct.update(1 - _))
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +      loss.colSums(false, shouldSkip)
    +    } else {
    +      loss.colSums
    +    }
    +  }
    +}
    --- End diff --
    
    Missing new line at end of file.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17812584
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala ---
    @@ -145,12 +150,151 @@ class SquaredL2Updater extends Updater {
         // w' = w - thisIterStepSize * (gradient + regParam * w)
         // w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient
         val thisIterStepSize = stepSize / math.sqrt(iter)
    -    val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector
    -    brzWeights :*= (1.0 - thisIterStepSize * regParam)
    -    brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
    -    val norm = brzNorm(brzWeights, 2.0)
    +    scal(1.0 - thisIterStepSize * regParam, weightsOld)
    +    axpy(-thisIterStepSize, gradient, weightsOld)
    +    val norm = brzNorm(weightsOld.toBreeze, 2.0)
     
    -    (Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)
    +    (weightsOld, 0.5 * regParam * norm * norm)
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Class used to perform steps (weight update) using Gradient Descent methods.
    + *
    + * For general minimization problems, or for regularized problems of the form
    + *         min  L(w) + regParam * R(w),
    + * the compute function performs the actual update step, when given some
    + * (e.g. stochastic) gradient direction for the loss L(w),
    + * and a desired step-size (learning rate).
    + *
    + * The updater is responsible to also perform the update coming from the
    + * regularization term R(w) (if any regularization is used).
    + */
    +@DeveloperApi
    +abstract class MultiModelUpdater extends Serializable {
    +  /**
    +   * Compute an updated value for weights given the gradient, stepSize, iteration number and
    +   * regularization parameter. Also returns the regularization value regParam * R(w)
    +   * computed using the *updated* weights.
    +   *
    +   * @param weightsOld - Column matrix of size dx1 where d is the number of features.
    +   * @param gradient - Column matrix of size dx1 where d is the number of features.
    +   * @param stepSize - step size across iterations
    +   * @param iter - Iteration number
    +   * @param regParam - Regularization parameter
    +   *
    +   * @return A tuple of 2 elements. The first element is a column matrix containing updated weights,
    +   *         and the second element is the regularization value computed using updated weights.
    +   */
    +  def compute(
    +      weightsOld: DenseMatrix,
    +      gradient: DenseMatrix,
    +      stepSize: DenseMatrix,
    +      iter: Int,
    +      regParam: Matrix): (DenseMatrix, Matrix)
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * A simple updater for gradient descent *without* any regularization.
    + * Uses a step-size decreasing with the square root of the number of iterations.
    + */
    +@DeveloperApi
    +class MultiModelSimpleUpdater extends MultiModelUpdater {
    +  def compute(
    +     weightsOld: DenseMatrix,
    +     gradient: DenseMatrix,
    +     stepSize: DenseMatrix,
    +     iter: Int,
    +     regParam: Matrix): (DenseMatrix, Matrix) = {
    +    val thisIterStepSize =
    +      SparseMatrix.diag(Vectors.dense(stepSize.map(-_ / sqrt(iter)).toArray))
    +
    +    gemm(1.0, gradient,thisIterStepSize, 1.0, weightsOld)
    --- End diff --
    
    spacing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803169
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    --- End diff --
    
    Great idea!! I was worried about the safety as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17808221
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    +
    +  /**
    +   * Compute the gradient and loss given the features of a series of data point,
    +   * add the gradient to a provided matrix to avoid creating new objects, and return loss.
    +   *
    +   * @param data features for the data points
    +   * @param label label for the data points
    +   * @param weights weights/coefficients corresponding to features
    +   * @param cumGradient the computed gradient will be added to this matrix
    +   *
    +   * @return loss
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a logistic loss function, as used in binary classification.
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLogisticGradient extends MultiModelGradient {
    +
    +  private def sigmoid(p: DenseMatrix): DenseMatrix = {
    +    def takeSigmoid(p: Double): Double = {
    +      1.0 / (math.exp(-p) + 1.0)
    +    }
    +    p.map(takeSigmoid)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +    val margin = data transposeMultiply weights
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      0.0, gradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val margin = data transposeMultiply weights
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      1.0, cumGradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +      loss.colSums(false, shouldSkip)
    +    } else {
    +      loss.colSums
    +    }
    +  }
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a Least-squared loss function, as used in linear regression.
    + * This is correct for the averaged least squares loss function (mean squared error)
    + *              L = 1/n ||A weights-y||^2
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLeastSquaresGradient extends MultiModelGradient {
    +  override def compute(data: Matrix, label: DenseMatrix,
    --- End diff --
    
    Ditto about computing in terms of below compute() method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17804577
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    --- End diff --
    
    good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17808106
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    +
    +  /**
    +   * Compute the gradient and loss given the features of a series of data point,
    +   * add the gradient to a provided matrix to avoid creating new objects, and return loss.
    +   *
    +   * @param data features for the data points
    +   * @param label label for the data points
    +   * @param weights weights/coefficients corresponding to features
    +   * @param cumGradient the computed gradient will be added to this matrix
    +   *
    +   * @return loss
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a logistic loss function, as used in binary classification.
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLogisticGradient extends MultiModelGradient {
    +
    +  private def sigmoid(p: DenseMatrix): DenseMatrix = {
    +    def takeSigmoid(p: Double): Double = {
    +      1.0 / (math.exp(-p) + 1.0)
    +    }
    +    p.map(takeSigmoid)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +    val margin = data transposeMultiply weights
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      0.0, gradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val margin = data transposeMultiply weights
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      1.0, cumGradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +      loss.colSums(false, shouldSkip)
    +    } else {
    +      loss.colSums
    +    }
    +  }
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a Least-squared loss function, as used in linear regression.
    + * This is correct for the averaged least squares loss function (mean squared error)
    + *              L = 1/n ||A weights-y||^2
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLeastSquaresGradient extends MultiModelGradient {
    --- End diff --
    
    At some point, we should rename this to SquaredError (not LeastSquares).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17808047
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols)
    +    BLAS.gemm(true, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */
    +  def transposeMultiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numCols))
    +    BLAS.gemv(true, 1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** A human readable representation of the matrix */
       override def toString: String = toBreeze.toString()
    +
    +  private[mllib] def map(f: Double => Double): Matrix
    +
    +  private[mllib] def update(f: Double => Double): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double,
    +                                                        y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double,
    +                                                     y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double,
    +                                                     y: Double): Matrix
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y)
    +
    +  private[mllib] def *(y: Matrix) = operate(_ * _, y)
    +
    +  private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y)
    +
    +  private[mllib] def +(y: Matrix) = operate(_ + _, y)
    +
    +  private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y)
    +
    +  private[mllib] def -(y: Matrix) = operate(_ - _, y)
    +
    +  private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y)
    +
    +  private[mllib] def /(y: Matrix) = operate(_ / _, y)
    +
    +  private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y)
    +
    +  private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y)
    +
    +  private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y)
    +
    +  private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y)
    +
    +  private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y)
    +
    +  private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y)
    +
    +  private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y)
    +
    +  private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y)
    +
    +  private[mllib] def neg: Matrix
    +
    +  private[mllib] def negInPlace: Matrix
    +
    +  /** Less-than-or-equal-to check. Outputs binary `DenseMatrix` */
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix
    +
    +  /** Returns the p-th norm for each column */
    +  private[mllib] def colNorms(p: Double): Matrix
    +
    +  private[mllib] def colSums: DenseMatrix = colSums(false)
    +
    +  private[mllib] def colSums(absolute: Boolean, skipRows: DenseMatrix = null): DenseMatrix = {
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    this match {
    +      case sparse: SparseMatrix =>
    +        while (j < numCols){
    +          var i = sparse.colPtrs(j)
    +          val indEnd = sparse.colPtrs(j + 1)
    +          while (i < indEnd){
    +            var v = sparse.values(i)
    +            if (absolute) v = math.abs(v)
    +            sums.values(j) += v
    +            i += 1
    +          }
    +          j += 1
    +        }
    +      case dense: DenseMatrix =>
    +        while (j < numCols){
    +          var i = 0
    +          while (i < numRows){
    +            if (skipRows == null) {
    +              var v = dense.values(index(i, j))
    +              if (absolute) v = math.abs(v)
    +              sums.values(j) += v
    +            } else {
    +              if (skipRows(i) != 1.0) {
    +                var v = dense.values(index(i, j))
    +                if (absolute) v = math.abs(v)
    +                sums.values(j) += v
    +              }
    +            }
    +
    +            i += 1
    +          }
    +          j += 1
    +        }
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def rowSums: DenseMatrix = rowSums(false)
    +
    +  private[mllib] def rowSums(absolute: Boolean): DenseMatrix = {
    +    val sums = new DenseMatrix(numRows, 1, Array.fill(numRows)(0.0))
    +    var j = 0
    +    this match {
    +      case sparse: SparseMatrix =>
    +        while (j < numCols){
    +          var i = sparse.colPtrs(j)
    +          val indEnd = sparse.colPtrs(j + 1)
    +          while (i < indEnd){
    +            var v = sparse.values(i)
    +            if (absolute) v = math.abs(v)
    +            sums.values(sparse.rowIndices(i)) += v
    +            i += 1
    +          }
    +          j += 1
    +        }
    +      case dense: DenseMatrix =>
    +        while (j < numCols){
    +          var i = 0
    +          while (i < numRows){
    +            var v = dense.values(index(i, j))
    --- End diff --
    
    Ditto about expensive indexing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17811312
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    +      data: RDD[(Double, Vector)],
    +      gradient: MultiModelGradient,
    +      updater: Array[MultiModelUpdater],
    +      stepSize: Array[Double],
    +      numIterations: Array[Int],
    +      regParam: Array[Double],
    +      miniBatchFraction: Double,
    +      initialWeights: Vector,
    +      batchSize: Int = 64,
    +      useSparse: Boolean = true,
    +      buildSparseThreshold: Double = 0.2): (Matrix, Array[Vector]) = {
    +
    +    val maxNumIter = numIterations.max
    +    val stochasticLossHistory = new ArrayBuffer[Vector](maxNumIter)
    +
    +    val numExamples = data.count()
    +    val miniBatchSize = numExamples * miniBatchFraction
    +    val numModels = stepSize.length * regParam.length
    +    val numFeatures = initialWeights.size
    +    val numRegularizers = updater.length
    +    val updaterCounter = 0 until numRegularizers
    +    // Initialize weights as a column vector
    +    var weights = updaterCounter.map { i =>
    +      new DenseMatrix(numFeatures, 1, initialWeights.toArray).
    +        multiply(DenseMatrix.ones(1, numModels))
    +    }
    +
    +    var finalWeights: Matrix = new DenseMatrix(numFeatures, 0, Array.empty[Double])
    +
    +    // if no data, return initial weights to avoid NaNs
    +    if (numExamples == 0) {
    +
    +      logInfo("GradientDescent.runMiniBatchSGD returning initial weights, no data found")
    +      return (Matrices.horzCat(weights), stochasticLossHistory.toArray)
    +
    +    }
    +    val stepSizeMatrix = new DenseMatrix(1, numModels,
    +      stepSize.flatMap{ ss =>
    +        for (i <- 1 to regParam.length) yield ss
    +      }
    +    )
    +    val regMatrix = SparseMatrix.diag(Vectors.dense(stepSize.flatMap{ ss =>
    +      for (reg <- regParam) yield reg
    +    }))
    +
    +    val bcMetaData =
    +      if (useSparse) {
    +        data.context.broadcast(Matrices.getSparsityData(data, batchSize))
    +      } else {
    +        val emptyData: Array[(Int, Int)] = (0 until data.partitions.length).map { i =>
    +          (i, -1)}.toArray
    +        data.context.broadcast(emptyData)
    +      }
    +    val points = Matrices.fromRDD(data, bcMetaData.value, batchSize, buildSparseThreshold)
    +
    +    /**
    +     * For the first iteration, the regVal will be initialized as sum of weight squares
    +     * if it's L2 updater; for L1 updater, the same logic is followed.
    +     */
    +    val updaterWithIndex = updater.zipWithIndex
    +
    +    var regVal = updaterWithIndex.map { case (u, ind) =>
    +      u.compute(weights(ind), DenseMatrix.zeros(numFeatures, numModels),
    +        DenseMatrix.zeros(1, numModels), 1, regMatrix)._2
    +    }
    +    val orderedIters = numIterations.sorted
    +    var iterIndexCounter = 0
    +    for (i <- 1 to maxNumIter) {
    +      val bcWeights = data.context.broadcast(weights)
    +      // Sample a subset (fraction miniBatchFraction) of the total data
    +      // compute and sum up the subgradients on this subset (this is one map-reduce)
    +      val (gradientSum, lossSum) = points.sample(false, miniBatchFraction, 42 + i)
    +        .treeAggregate(updaterCounter.map(j => Matrices.zeros(numFeatures, numModels)),
    +          updaterCounter.map(j => Matrices.zeros(1, numModels)))(
    +          seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, features)) =>
    +            val l = updaterCounter.map { j =>
    +              gradient.compute(features, label, bcWeights.value(j),
    +                grad(j).asInstanceOf[DenseMatrix])
    +            }
    +            (grad, loss.zip(l).map(r => r._1.elementWiseOperateInPlace(_ + _, r._2)))
    +          },
    +          combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), (grad2, loss2)) =>
    +            (grad1.zip(grad2).map(r => r._1.elementWiseOperateInPlace(_ + _, r._2)),
    +              loss1.zip(loss2).map(r => r._1.elementWiseOperateInPlace(_ + _, r._2)))
    +          })
    +      stochasticLossHistory.append(Vectors.dense(Matrices.horzCat(updaterCounter.map { i =>
    +        lossSum(i).elementWiseOperateScalarInPlace(_ / _, miniBatchSize).
    +        elementWiseOperateOnRowsInPlace(_ + _, regVal(i))
    +      }).toArray))
    +      val update = updaterWithIndex.map { case (u, ind) => u.compute(
    --- End diff --
    
    Move "u.compute(" to next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56239321
  
    @brkyvz Let's try to split this PR into small ones. For example, functions like factory methods for sparse matrices should not be included in this PR. We want to keep the vector and matrix classes in MLlib simple and let user use breeze for linear algebra operations. If breeze has performance issues, maybe we should contribute the optimization to breeze to centralize the effort on single-machine linear algebra computation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17807973
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    +
    +  /**
    +   * Compute the gradient and loss given the features of a series of data point,
    +   * add the gradient to a provided matrix to avoid creating new objects, and return loss.
    +   *
    +   * @param data features for the data points
    +   * @param label label for the data points
    +   * @param weights weights/coefficients corresponding to features
    +   * @param cumGradient the computed gradient will be added to this matrix
    +   *
    +   * @return loss
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a logistic loss function, as used in binary classification.
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLogisticGradient extends MultiModelGradient {
    +
    +  private def sigmoid(p: DenseMatrix): DenseMatrix = {
    +    def takeSigmoid(p: Double): Double = {
    +      1.0 / (math.exp(-p) + 1.0)
    +    }
    +    p.map(takeSigmoid)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +    val margin = data transposeMultiply weights
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      0.0, gradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val margin = data transposeMultiply weights
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      1.0, cumGradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    --- End diff --
    
    Is this really worthwhile?  Computation is still linear in the size of the data, and the computation for colSums is pretty light.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17807257
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix =  {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val dup = this.copy
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) =
    +    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
    +
    +  def update(f: Double => Double): SparseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      while (i < indEnd){
    +        sums.values(j) += math.pow(values(i),p)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    sums.update(math.pow(_, 1/p))
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: SparseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: SparseMatrix = {
    +    val copy = this.copy
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, this.toArray)
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  def toDense: DenseMatrix = new DenseMatrix(numRows, numCols, this.toArray)
    +}
    +
    +object SparseMatrix {
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): SparseMatrix = {
    +    new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0))
    +  }
    +
    +  private def genRand(numRows: Int, numCols: Int, raw: Array[Double], nonZero: Int): SparseMatrix = {
    +    val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
    +
    +    val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
    +    val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
    +
    +    var i = 0
    +    var nnz = 0
    +    var lastCol = -1
    +
    +    raw.foreach { v =>
    +      val r = i % numRows
    +      val c = (i - r) / numRows
    +      if ( v != 0.0) {
    +        sRows.append(r)
    +        sparseA.append(v)
    +        while (c != lastCol){
    +          sCols.append(nnz)
    +          lastCol += 1
    +        }
    +        nnz += 1
    +      }
    +      i += 1
    +    }
    +    sCols.append(sparseA.length)
    +    new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray)
    +  }
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    --- End diff --
    
    OK, sounds fine then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765173
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        f2jBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      // Perform matrix multiplication and add to C. The rows of A are multiplied by the columns of
    +      // B, and added to C.
    +      var colCounterForB = 0 // the column to be updated in C
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Bstart = colCounterForB * kB
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA) {
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B.values(Bstart + colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA){
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B(colCounterForB, colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A and `SparseMatrix` B.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: SparseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Bvals = B.values
    +    val Brows = if (!transB) B.rowIndices else B.colPtrs
    +    val Bcols = if (!transB) B.colPtrs else B.rowIndices
    +
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB){ // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Bcols(colCounterForB)
    +            val Astart = rowCounterForA * kA
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Bvals(i) * A.values(Astart + Brows(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var rowCounterForA = 0
    +        while (rowCounterForA < mA) {
    +          var colCounterForA = 0
    +          val Astart = rowCounterForA * kA
    +          while (colCounterForA < kA) {
    +            var i = Brows(colCounterForA)
    +            val indEnd = Brows(colCounterForA + 1)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + colCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          rowCounterForA += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        nativeBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      if (!transB) { // Expensive to put the check inside the loop
    +
    +        // Loop over the columns of B, pick non-zero row in B, select corresponding column in A,
    +        // and update the whole column in C by looping over rows in A.
    +        var colCounterForB = 0 // the column to be updated in C
    +        while (colCounterForB < nB) {
    +          var i = Bcols(colCounterForB)
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (i < indEnd) {
    +            var rowCounterForA = 0
    +            val Bval = Bvals(i)
    +            val Cstart = colCounterForB * mA
    +            val Astart = mA * Brows(i)
    +            while (rowCounterForA < mA){
    +              C.values(Cstart + rowCounterForA) += A.values(Astart + rowCounterForA) * Bval * alpha
    +              rowCounterForA += 1
    +            }
    +            i += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var colCounterForA = 0
    +        while (colCounterForA < kA) {
    +          var rowCounterForA = 0
    +          val Astart = mA * colCounterForA
    +          val indEnd = Brows(colCounterForA + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Brows(colCounterForA)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + rowCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            rowCounterForA += 1
    +          }
    +          colCounterForA += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * @param trans whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param alpha a scalar to scale the multiplication A * x.
    +   * @param A the matrix A that will be left multiplied to x. Size of m x n.
    +   * @param x the vector x that will be left multiplied by A. Size of n x 1.
    +   * @param beta a scalar that can be used to scale vector y.
    +   * @param y the resulting vector y. Size of m x 1.
    +   */
    +  def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit = {
    +
    +    val mA: Int = if (!trans) A.numRows else A.numCols
    +    val nx: Int = x.size
    +    val nA: Int = if (!trans) A.numCols else A.numRows
    +
    +    require(nA == nx, s"The columns of A don't match the number of elements of x. A: $nA, x: $nx")
    +    require(mA == y.size,
    +      s"The rows of A don't match the number of elements of y. A: $mA, y:${y.size}}")
    +    if (alpha == 0.0) {
    +      logDebug("gemv: alpha is equal to 0. Returning y.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          gemv(trans, alpha, sparse, x, beta, y)
    +        case dense: DenseMatrix =>
    +          gemv(trans, alpha, dense, x, beta, y)
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemv doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   *
    +   * @param alpha a scalar to scale the multiplication A * x.
    +   * @param A the matrix A that will be left multiplied to x. Size of m x n.
    +   * @param x the vector x that will be left multiplied by A. Size of n x 1.
    +   * @param beta a scalar that can be used to scale vector y.
    +   * @param y the resulting vector y. Size of m x 1.
    +   */
    +  def gemv(
    +      alpha: Double,
    +      A: Matrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit = {
    +    gemv(false, alpha, A, x, beta, y)
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit =  {
    +    val tStrA = if (!trans) "N" else "T"
    +    nativeBLAS.dgemv(tStrA, A.numRows, A.numCols, alpha, A.values, A.numRows, x.values, 1, beta,
    +      y.values, 1)
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit =  {
    +
    +    val mA: Int = if(!trans) A.numRows else A.numCols
    +    val nA: Int = if(!trans) A.numCols else A.numRows
    +
    +    val Avals = A.values
    +    val Arows = if (!trans) A.rowIndices else A.colPtrs
    +    val Acols = if (!trans) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    --- End diff --
    
    Comment belongs inside "if (trans)"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56216573
  
    Could the methods be ordered in the file (grouped by public, private[mllib], private, etc.?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17810823
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    +      data: RDD[(Double, Vector)],
    +      gradient: MultiModelGradient,
    +      updater: Array[MultiModelUpdater],
    +      stepSize: Array[Double],
    +      numIterations: Array[Int],
    +      regParam: Array[Double],
    +      miniBatchFraction: Double,
    +      initialWeights: Vector,
    +      batchSize: Int = 64,
    +      useSparse: Boolean = true,
    +      buildSparseThreshold: Double = 0.2): (Matrix, Array[Vector]) = {
    +
    +    val maxNumIter = numIterations.max
    +    val stochasticLossHistory = new ArrayBuffer[Vector](maxNumIter)
    +
    +    val numExamples = data.count()
    +    val miniBatchSize = numExamples * miniBatchFraction
    +    val numModels = stepSize.length * regParam.length
    +    val numFeatures = initialWeights.size
    +    val numRegularizers = updater.length
    +    val updaterCounter = 0 until numRegularizers
    +    // Initialize weights as a column vector
    +    var weights = updaterCounter.map { i =>
    +      new DenseMatrix(numFeatures, 1, initialWeights.toArray).
    +        multiply(DenseMatrix.ones(1, numModels))
    +    }
    +
    +    var finalWeights: Matrix = new DenseMatrix(numFeatures, 0, Array.empty[Double])
    --- End diff --
    
    Add doc: "We will concatenate results (weights) to finalWeights as we iterate."  (or something like that)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17811037
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    +      data: RDD[(Double, Vector)],
    +      gradient: MultiModelGradient,
    +      updater: Array[MultiModelUpdater],
    +      stepSize: Array[Double],
    +      numIterations: Array[Int],
    +      regParam: Array[Double],
    +      miniBatchFraction: Double,
    +      initialWeights: Vector,
    +      batchSize: Int = 64,
    +      useSparse: Boolean = true,
    +      buildSparseThreshold: Double = 0.2): (Matrix, Array[Vector]) = {
    +
    +    val maxNumIter = numIterations.max
    +    val stochasticLossHistory = new ArrayBuffer[Vector](maxNumIter)
    +
    +    val numExamples = data.count()
    +    val miniBatchSize = numExamples * miniBatchFraction
    +    val numModels = stepSize.length * regParam.length
    +    val numFeatures = initialWeights.size
    +    val numRegularizers = updater.length
    +    val updaterCounter = 0 until numRegularizers
    +    // Initialize weights as a column vector
    +    var weights = updaterCounter.map { i =>
    +      new DenseMatrix(numFeatures, 1, initialWeights.toArray).
    +        multiply(DenseMatrix.ones(1, numModels))
    +    }
    +
    +    var finalWeights: Matrix = new DenseMatrix(numFeatures, 0, Array.empty[Double])
    +
    +    // if no data, return initial weights to avoid NaNs
    +    if (numExamples == 0) {
    +
    +      logInfo("GradientDescent.runMiniBatchSGD returning initial weights, no data found")
    +      return (Matrices.horzCat(weights), stochasticLossHistory.toArray)
    +
    +    }
    +    val stepSizeMatrix = new DenseMatrix(1, numModels,
    +      stepSize.flatMap{ ss =>
    +        for (i <- 1 to regParam.length) yield ss
    +      }
    +    )
    +    val regMatrix = SparseMatrix.diag(Vectors.dense(stepSize.flatMap{ ss =>
    +      for (reg <- regParam) yield reg
    +    }))
    +
    +    val bcMetaData =
    +      if (useSparse) {
    +        data.context.broadcast(Matrices.getSparsityData(data, batchSize))
    +      } else {
    +        val emptyData: Array[(Int, Int)] = (0 until data.partitions.length).map { i =>
    +          (i, -1)}.toArray
    +        data.context.broadcast(emptyData)
    +      }
    +    val points = Matrices.fromRDD(data, bcMetaData.value, batchSize, buildSparseThreshold)
    +
    +    /**
    +     * For the first iteration, the regVal will be initialized as sum of weight squares
    +     * if it's L2 updater; for L1 updater, the same logic is followed.
    +     */
    +    val updaterWithIndex = updater.zipWithIndex
    +
    +    var regVal = updaterWithIndex.map { case (u, ind) =>
    +      u.compute(weights(ind), DenseMatrix.zeros(numFeatures, numModels),
    +        DenseMatrix.zeros(1, numModels), 1, regMatrix)._2
    +    }
    +    val orderedIters = numIterations.sorted
    +    var iterIndexCounter = 0
    +    for (i <- 1 to maxNumIter) {
    +      val bcWeights = data.context.broadcast(weights)
    +      // Sample a subset (fraction miniBatchFraction) of the total data
    +      // compute and sum up the subgradients on this subset (this is one map-reduce)
    +      val (gradientSum, lossSum) = points.sample(false, miniBatchFraction, 42 + i)
    +        .treeAggregate(updaterCounter.map(j => Matrices.zeros(numFeatures, numModels)),
    +          updaterCounter.map(j => Matrices.zeros(1, numModels)))(
    +          seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, features)) =>
    +            val l = updaterCounter.map { j =>
    +              gradient.compute(features, label, bcWeights.value(j),
    +                grad(j).asInstanceOf[DenseMatrix])
    +            }
    +            (grad, loss.zip(l).map(r => r._1.elementWiseOperateInPlace(_ + _, r._2)))
    +          },
    +          combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), (grad2, loss2)) =>
    +            (grad1.zip(grad2).map(r => r._1.elementWiseOperateInPlace(_ + _, r._2)),
    +              loss1.zip(loss2).map(r => r._1.elementWiseOperateInPlace(_ + _, r._2)))
    +          })
    +      stochasticLossHistory.append(Vectors.dense(Matrices.horzCat(updaterCounter.map { i =>
    +        lossSum(i).elementWiseOperateScalarInPlace(_ / _, miniBatchSize).
    --- End diff --
    
    miniBatchSize is inexact.  We could avoid the initial count() and instead aggregate the minibatch size during the treeAggregate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802293
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    --- End diff --
    
    style: Maybe search for "}else" too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803754
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    --- End diff --
    
    space between methods


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802344
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    --- End diff --
    
    spaces around operator==


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17807514
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    --- End diff --
    
    spacing (here and in methods below)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17764905
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    --- End diff --
    
    These are related to https://github.com/apache/spark/pull/2294
    I can add explanations there. I notice the math is hard to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17807001
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    +  // in that partition so that we can preallocate a memory efficient buffer
    +  private[mllib] def fromRDD(
    +      rows: RDD[(Double, Vector)],
    +      partitionMetaData: Array[(Int, Int)],
    +      batchSize : Int,
    +      buildSparseThreshold: Double,
    +      generateOnTheFly: Boolean = true): RDD[(DenseMatrix, Matrix)] = {
    +
    +    if (!generateOnTheFly){
    +      rows.mapPartitions { iter =>
    +        iter.grouped(batchSize)
    +      }.map(fromSeq(_, batchSize))
    +    }else {
    +      val numFeatures = rows.first()._2.size
    +
    +      rows.mapPartitionsWithIndex{ case (ind, iter) =>
    +        val findPartition = partitionMetaData.find(_._1 == ind)
    +        val matrixBuffer =
    +          if (findPartition.get._2 != -1) {
    +            val nnz = findPartition.get._2
    +            val density = nnz * 1.0 / (numFeatures * batchSize)
    +            if (density <= buildSparseThreshold) {
    +              (DenseMatrix.zeros(batchSize, 1), new SparseMatrix(numFeatures, batchSize,
    +                Array.fill(batchSize + 1)(0), Array.fill(nnz)(0), Array.fill(nnz)(0.0)))
    +            } else {
    +              (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +            }
    +          } else {
    +            (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +          }
    +        iter.grouped(batchSize).map(fromSeqIntoBuffer(_, matrixBuffer, batchSize)._2)
    +      }
    +    }
    +  }
    +
    +  // Collects data on the maximum number of non-zero elements in a partition for each
    +  // batch of matrices
    +  private[mllib] def getSparsityData(
    +      rows: RDD[(Double, Vector)],
    +      batchSize : Int = 64): Array[(Int, Int)] = {
    +    val numFeatures = rows.first()._2.size
    +
    +    val partitionMetaData = rows.mapPartitionsWithIndex { case (ind, iter) =>
    +      val matrixBuffer =
    +        (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +      var partitionMaxNNZ = -1
    +
    +      iter.grouped(batchSize).foreach { r =>
    +        val (metaData, _) = fromSeqIntoBuffer(r, matrixBuffer, batchSize)
    +        val maxNNZ =
    +          if (metaData > partitionMaxNNZ) metaData else partitionMaxNNZ
    +
    +        partitionMaxNNZ = maxNNZ
    +      }
    +
    +      Iterator((ind, partitionMaxNNZ))
    +    }
    +    partitionMetaData.collect()
    +  }
    +
    +  private def fromSeq(rows: Seq[(Double, Vector)], batchSize: Int) : (DenseMatrix, Matrix) = {
    +    val numExamples = rows.length
    +    val numFeatures = rows(0)._2.size
    +    val matrixBuffer = DenseMatrix.zeros(numExamples, numFeatures)
    +    val labelBuffer = DenseMatrix.zeros(numExamples, 1)
    +    flattenMatrix(rows, matrixBuffer, labelBuffer, batchSize)
    +
    +    (matrixBuffer, labelBuffer)
    +  }
    +
    +  private def fromSeqIntoBuffer(
    +      rows: Seq[(Double, Vector)],
    +      buffer: (DenseMatrix, Matrix),
    +      batchSize: Int) : (Int, (DenseMatrix, Matrix)) = {
    +    val labelBuffer = buffer._1
    +    val matrixBuffer = buffer._2
    +    val metadata = flattenMatrix(rows, matrixBuffer, labelBuffer, batchSize)
    +
    +    (metadata, buffer)
    +  }
    +
    +  private def flattenMatrix(
    +      vals: Seq[(Double, Vector)],
    +      matrixInto: Matrix,
    +      labelsInto: DenseMatrix,
    +      batchSize: Int): Int = {
    +    val numExamples = vals.length
    +    val numFeatures = vals(0)._2.size
    +    var i = 0
    +    var nnz = 0
    +    matrixInto match {
    +      case intoSparse: SparseMatrix =>
    +        for (r <- vals) {
    +          labelsInto.values(i) = r._1
    +          r._2 match {
    +            case sVec: SparseVector =>
    +              val len = sVec.indices.length
    +              var j = 0
    +              intoSparse.colPtrs(i) = nnz
    +              while (j < len) {
    +                intoSparse.rowIndices(nnz) = sVec.indices(j)
    +                intoSparse.values(nnz) = sVec.values(j)
    +                nnz += 1
    +                j += 1
    +              }
    +            case dVec: DenseVector =>
    +              var j = 0
    +              intoSparse.colPtrs(i) = nnz
    +              while (j < numFeatures) {
    +                val value = dVec.values(j)
    +                if (value != 0.0) {
    +                  intoSparse.rowIndices(nnz) = j
    +                  intoSparse.values(nnz) = dVec.values(j)
    +                  nnz += 1
    +                }
    +                j += 1
    +              }
    +          }
    +          i += 1
    +        }
    +        while (i < batchSize) {
    +          intoSparse.colPtrs(i) = nnz
    +          i += 1
    +        }
    +      case intoDense: DenseMatrix =>
    +        for (r <- vals) {
    +          labelsInto.values(i) = r._1
    +          val startIndex = numFeatures * i
    +          r._2 match {
    +            case sVec: SparseVector =>
    +              val len = sVec.indices.length
    +              var j = 0
    +              var sVecCounter = 0
    +              while (j < numFeatures) {
    +                intoDense.values(startIndex + j) = 0.0
    +                if (sVecCounter < len) {
    +                  if (j == sVec.indices(sVecCounter)) {
    +                    intoDense.values(startIndex + j) = sVec.values(sVecCounter)
    +                    nnz += 1
    --- End diff --
    
    efficiency: nnz could be updated using sVecCounter outside of the loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765187
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    --- End diff --
    
    The matrix multiplication algorithms are different. If A is transposed, the multiplication is very easy. A is in CSC format, therefore when you multiply something with A', we get the column indices for free. Then it is your high school matrix multiplication, but just using non-zero elements. Notice here that we update the values of C only once, therefore you can scale it inside the loop.
    
    When A is not transposed, then you multiply the columns of A with the corresponding rows of B and add it to C. Every multiplication of the column of A and the row of B has the size C. Then it turns out you update the values of C multiple times inside the loop. Therefore if you wanted to have an initial scaling of C, you have to put it before the loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765442
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BLASSuite.scala ---
    @@ -126,4 +126,142 @@ class BLASSuite extends FunSuite {
           }
         }
       }
    +
    +  test("gemm") {
    --- End diff --
    
    Shouldn't this test all 4 options for transA,transB?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17769270
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    --- End diff --
    
    I wasn't really asking; I was saying there should be doc in the code for these things.  (But the description you put here sounds fine for the doc.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17811169
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    +      data: RDD[(Double, Vector)],
    +      gradient: MultiModelGradient,
    +      updater: Array[MultiModelUpdater],
    +      stepSize: Array[Double],
    +      numIterations: Array[Int],
    +      regParam: Array[Double],
    +      miniBatchFraction: Double,
    +      initialWeights: Vector,
    +      batchSize: Int = 64,
    +      useSparse: Boolean = true,
    +      buildSparseThreshold: Double = 0.2): (Matrix, Array[Vector]) = {
    +
    +    val maxNumIter = numIterations.max
    +    val stochasticLossHistory = new ArrayBuffer[Vector](maxNumIter)
    +
    +    val numExamples = data.count()
    +    val miniBatchSize = numExamples * miniBatchFraction
    +    val numModels = stepSize.length * regParam.length
    +    val numFeatures = initialWeights.size
    +    val numRegularizers = updater.length
    +    val updaterCounter = 0 until numRegularizers
    +    // Initialize weights as a column vector
    +    var weights = updaterCounter.map { i =>
    +      new DenseMatrix(numFeatures, 1, initialWeights.toArray).
    +        multiply(DenseMatrix.ones(1, numModels))
    +    }
    +
    +    var finalWeights: Matrix = new DenseMatrix(numFeatures, 0, Array.empty[Double])
    +
    +    // if no data, return initial weights to avoid NaNs
    +    if (numExamples == 0) {
    +
    +      logInfo("GradientDescent.runMiniBatchSGD returning initial weights, no data found")
    +      return (Matrices.horzCat(weights), stochasticLossHistory.toArray)
    +
    +    }
    +    val stepSizeMatrix = new DenseMatrix(1, numModels,
    +      stepSize.flatMap{ ss =>
    +        for (i <- 1 to regParam.length) yield ss
    +      }
    +    )
    +    val regMatrix = SparseMatrix.diag(Vectors.dense(stepSize.flatMap{ ss =>
    +      for (reg <- regParam) yield reg
    +    }))
    +
    +    val bcMetaData =
    +      if (useSparse) {
    +        data.context.broadcast(Matrices.getSparsityData(data, batchSize))
    +      } else {
    +        val emptyData: Array[(Int, Int)] = (0 until data.partitions.length).map { i =>
    +          (i, -1)}.toArray
    +        data.context.broadcast(emptyData)
    +      }
    +    val points = Matrices.fromRDD(data, bcMetaData.value, batchSize, buildSparseThreshold)
    +
    +    /**
    +     * For the first iteration, the regVal will be initialized as sum of weight squares
    +     * if it's L2 updater; for L1 updater, the same logic is followed.
    +     */
    +    val updaterWithIndex = updater.zipWithIndex
    +
    +    var regVal = updaterWithIndex.map { case (u, ind) =>
    +      u.compute(weights(ind), DenseMatrix.zeros(numFeatures, numModels),
    +        DenseMatrix.zeros(1, numModels), 1, regMatrix)._2
    +    }
    +    val orderedIters = numIterations.sorted
    +    var iterIndexCounter = 0
    +    for (i <- 1 to maxNumIter) {
    +      val bcWeights = data.context.broadcast(weights)
    +      // Sample a subset (fraction miniBatchFraction) of the total data
    +      // compute and sum up the subgradients on this subset (this is one map-reduce)
    +      val (gradientSum, lossSum) = points.sample(false, miniBatchFraction, 42 + i)
    +        .treeAggregate(updaterCounter.map(j => Matrices.zeros(numFeatures, numModels)),
    --- End diff --
    
    It might be better to compute updaterCounter at nodes so that we broadcast a scalar instead of an array.  (not that the array is that big, but still)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17813573
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala ---
    @@ -169,4 +169,67 @@ object TestingUtils {
         override def toString = x.toString
       }
     
    +  case class CompareMatrixRightSide(
    +     fun: (Matrix, Matrix, Double) => Boolean, y: Matrix, eps: Double, method: String)
    +
    +  /**
    +   * Implicit class for comparing two matrices using relative tolerance or absolute tolerance.
    +   */
    +  implicit class MatrixWithAlmostEquals(val x: Matrix) {
    +
    +    /**
    +     * When the difference of two vectors are within eps, returns true; otherwise, returns false.
    +     */
    +    def ~=(r: CompareMatrixRightSide): Boolean = r.fun(x, r.y, r.eps)
    +
    +    /**
    +     * When the difference of two vectors are within eps, returns false; otherwise, returns true.
    +     */
    +    def !~=(r: CompareMatrixRightSide): Boolean = !r.fun(x, r.y, r.eps)
    +
    +    /**
    +     * Throws exception when the difference of two vectors are NOT within eps;
    +     * otherwise, returns true.
    +     */
    +    def ~==(r: CompareMatrixRightSide): Boolean = {
    +      if (!r.fun(x, r.y, r.eps)) {
    +        throw new TestFailedException(
    +          s"Expected \n$x\n and \n${r.y}\n to be within ${r.eps}${r.method} for all elements.", 0)
    +      }
    +      true
    +    }
    +
    +    /**
    +     * Throws exception when the difference of two matrices are within eps; otherwise, returns true.
    +     */
    +    def !~==(r: CompareMatrixRightSide): Boolean = {
    +      if (r.fun(x, r.y, r.eps)) {
    +        throw new TestFailedException(
    +          s"Did not expect \n$x\n and \n${r.y}\n to be within " +
    +            "${r.eps}${r.method} for all elements.", 0)
    +      }
    +      true
    +    }
    +
    +    /**
    +     * Comparison using absolute tolerance.
    +     */
    +    def absTol(eps: Double): CompareMatrixRightSide = CompareMatrixRightSide(
    +      (x: Matrix, y: Matrix, eps: Double) => {
    +        x.toArray.zip(y.toArray).forall(x => x._1 ~= x._2 absTol eps)
    +      }, x, eps, ABS_TOL_MSG)
    +
    +    /**
    +     * Comparison using relative tolerance. Note that comparing against sparse vector
    +     * with elements having value of zero will raise exception because it involves with
    +     * comparing against zero.
    +     */
    +    def relTol(eps: Double): CompareMatrixRightSide = CompareMatrixRightSide(
    +      (x: Matrix, y: Matrix, eps: Double) => {
    +        x.toArray.zip(y.toArray).forall(x => x._1 ~= x._2 relTol eps)
    --- End diff --
    
    confusing having 2 things called "x"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803390
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    --- End diff --
    
    I wonder if it's worth checking for p = 2 (which should be a very common case) since multiplication and sqrt are (I believe) much faster than pow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17813038
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala ---
    @@ -37,4 +37,26 @@ class BreezeMatrixConversionSuite extends FunSuite {
         assert(mat.numCols === breeze.cols)
         assert(mat.values.eq(breeze.data), "should not copy data")
       }
    +
    +  test("sparse matrix to breeze") {
    +    val values = Array(1.0, 2.0, 4.0, 5.0)
    +    val colPtrs = Array(0, 2, 4)
    +    val rowIndices = Array(1, 2, 1, 2)
    +    val mat = Matrices.sparse(3, 2, colPtrs, rowIndices, values)
    +    val breeze = mat.toBreeze.asInstanceOf[BSM[Double]]
    +    assert(breeze.rows === mat.numRows)
    +    assert(breeze.cols === mat.numCols)
    +    assert(breeze.data.eq(mat.asInstanceOf[SparseMatrix].values), "should not copy data")
    +  }
    +
    +  test("sparse breeze matrix to sparse matrix") {
    --- End diff --
    
    Ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17764833
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    --- End diff --
    
    I think this segment merits a one-line explanation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17764836
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    --- End diff --
    
    Ditto: I think this segment merits a one-line explanation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17808193
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    +
    +  /**
    +   * Compute the gradient and loss given the features of a series of data point,
    +   * add the gradient to a provided matrix to avoid creating new objects, and return loss.
    +   *
    +   * @param data features for the data points
    +   * @param label label for the data points
    +   * @param weights weights/coefficients corresponding to features
    +   * @param cumGradient the computed gradient will be added to this matrix
    +   *
    +   * @return loss
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a logistic loss function, as used in binary classification.
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLogisticGradient extends MultiModelGradient {
    +
    +  private def sigmoid(p: DenseMatrix): DenseMatrix = {
    +    def takeSigmoid(p: Double): Double = {
    +      1.0 / (math.exp(-p) + 1.0)
    +    }
    +    p.map(takeSigmoid)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +    val margin = data transposeMultiply weights
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      0.0, gradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val margin = data transposeMultiply weights
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      1.0, cumGradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +      loss.colSums(false, shouldSkip)
    +    } else {
    +      loss.colSums
    +    }
    +  }
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a Least-squared loss function, as used in linear regression.
    + * This is correct for the averaged least squares loss function (mean squared error)
    + *              L = 1/n ||A weights-y||^2
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLeastSquaresGradient extends MultiModelGradient {
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +
    +    val diff = (data transposeMultiply weights).elementWiseOperateOnColumnsInPlace(_ - _, label)
    +
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 2.0, data, diff, 0.0, gradient)
    +
    +    val loss = diff.update(v => v * v)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val diff = (data transposeMultiply weights).elementWiseOperateOnColumnsInPlace(_ - _, label)
    +
    +    gemm(false, false, 2.0, data, diff, 1.0, cumGradient)
    +    val loss = diff.update(v => v * v)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +      loss.colSums(false, shouldSkip)
    +    } else {
    +      loss.colSums
    +    }
    +  }
    +}
    +
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a Hinge loss function, as used in SVM binary classification.
    + * See also the documentation for the precise formulation.
    + * NOTE: This assumes that the labels are {0,1}
    + */
    +@DeveloperApi
    +class MultiModelHingeGradient extends MultiModelGradient {
    +  override def compute(data: Matrix, label: DenseMatrix,
    --- End diff --
    
    Ditto about implementing this in terms of the below compute() method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17800735
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    --- End diff --
    
    (unless this is common practice in BLAS libraries?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17803546
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix =  {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val dup = this.copy
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) =
    +    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
    +
    +  def update(f: Double => Double): SparseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      while (i < indEnd){
    +        sums.values(j) += math.pow(values(i),p)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    sums.update(math.pow(_, 1/p))
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: SparseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: SparseMatrix = {
    +    val copy = this.copy
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, this.toArray)
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  def toDense: DenseMatrix = new DenseMatrix(numRows, numCols, this.toArray)
    +}
    +
    +object SparseMatrix {
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): SparseMatrix = {
    +    new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0))
    +  }
    +
    +  private def genRand(numRows: Int, numCols: Int, raw: Array[Double], nonZero: Int): SparseMatrix = {
    +    val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
    +
    +    val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
    +    val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
    +
    +    var i = 0
    +    var nnz = 0
    +    var lastCol = -1
    +
    +    raw.foreach { v =>
    +      val r = i % numRows
    +      val c = (i - r) / numRows
    +      if ( v != 0.0) {
    +        sRows.append(r)
    +        sparseA.append(v)
    +        while (c != lastCol){
    +          sCols.append(nnz)
    +          lastCol += 1
    +        }
    +        nnz += 1
    +      }
    +      i += 1
    +    }
    +    sCols.append(sparseA.length)
    +    new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray)
    +  }
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    --- End diff --
    
    Sorry, I did not see the "density" argument.  Sounds OK to me (but is there a use case?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17801264
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
    --- End diff --
    
    long line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17800699
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    --- End diff --
    
    spacing between methods (here and below)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17808151
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    +
    +  /**
    +   * Compute the gradient and loss given the features of a series of data point,
    +   * add the gradient to a provided matrix to avoid creating new objects, and return loss.
    +   *
    +   * @param data features for the data points
    +   * @param label label for the data points
    +   * @param weights weights/coefficients corresponding to features
    +   * @param cumGradient the computed gradient will be added to this matrix
    +   *
    +   * @return loss
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a logistic loss function, as used in binary classification.
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLogisticGradient extends MultiModelGradient {
    +
    +  private def sigmoid(p: DenseMatrix): DenseMatrix = {
    +    def takeSigmoid(p: Double): Double = {
    +      1.0 / (math.exp(-p) + 1.0)
    +    }
    +    p.map(takeSigmoid)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix) = {
    +    val margin = data transposeMultiply weights
    +    val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols)
    +
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      0.0, gradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    val lossVector =
    +      if (data.isInstanceOf[DenseMatrix]) {
    +        val numFeatures = data.numRows
    +        val zeroEntries = data.compare(0.0, _ == _)
    +        val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    +        loss.colSums(false, shouldSkip)
    +      } else {
    +        loss.colSums
    +      }
    +    (gradient, lossVector)
    +  }
    +
    +  override def compute(data: Matrix,
    +                       label: DenseMatrix,
    +                       weights: DenseMatrix,
    +                       cumGradient: DenseMatrix): Matrix = {
    +    val margin = data transposeMultiply weights
    +    gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label),
    +      1.0, cumGradient)
    +
    +    val negativeLabels = label.compare(0.0, _ == _)
    +    val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels)
    +
    +    val loss = margin.update(v => math.log1p(math.exp(-v))).
    +      elementWiseOperateInPlace(_ + _, addMargin)
    +
    +    if (data.isInstanceOf[DenseMatrix]) {
    +      val numFeatures = data.numRows
    +      val zeroEntries = data.compare(0.0, _ == _)
    +      val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _)
    --- End diff --
    
    This applies elsewhere too, but I won't repeat the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by anantasty <gi...@git.apache.org>.
Github user anantasty commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802195
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -17,12 +17,19 @@
     
     package org.apache.spark.mllib.linalg
     
    -import breeze.linalg.{Matrix => BM, DenseMatrix => BDM}
    +import breeze.linalg.{Matrix => BM, DenseMatrix => BDM, CSCMatrix => BSM}
    +
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.util.random.XORShiftRandom
    +import org.apache.spark.util.Utils
    +
    +import scala.collection.mutable.ArrayBuffer
    +import java.util.Arrays
     
     /**
      * Trait for a local matrix.
      */
    -trait Matrix extends Serializable {
    +sealed trait Matrix extends Serializable {
    --- End diff --
    
    Good use of sealed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802108
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    --- End diff --
    
    Please add warning messages (here and in other require statements).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802128
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    --- End diff --
    
    "){"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17800664
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix =  {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val dup = this.copy
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) =
    +    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
    +
    +  def update(f: Double => Double): SparseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      while (i < indEnd){
    +        sums.values(j) += math.pow(values(i),p)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    sums.update(math.pow(_, 1/p))
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: SparseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: SparseMatrix = {
    +    val copy = this.copy
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, this.toArray)
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  def toDense: DenseMatrix = new DenseMatrix(numRows, numCols, this.toArray)
    +}
    +
    +object SparseMatrix {
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): SparseMatrix = {
    +    new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0))
    +  }
    +
    +  private def genRand(numRows: Int, numCols: Int, raw: Array[Double], nonZero: Int): SparseMatrix = {
    +    val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
    +
    +    val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
    +    val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
    +
    +    var i = 0
    +    var nnz = 0
    +    var lastCol = -1
    +
    +    raw.foreach { v =>
    +      val r = i % numRows
    +      val c = (i - r) / numRows
    +      if ( v != 0.0) {
    +        sRows.append(r)
    +        sparseA.append(v)
    +        while (c != lastCol){
    +          sCols.append(nnz)
    +          lastCol += 1
    +        }
    +        nnz += 1
    +      }
    +      i += 1
    +    }
    +    sCols.append(sparseA.length)
    +    new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray)
    +  }
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    --- End diff --
    
    Do we really need sparse versions of rand and randn?  It should not be too much more expensive to use the dense versions, and then convert to a sparse matrix.  (I figure < 2x the cost.)  I can not think of use cases for these either, except unit testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802211
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    --- End diff --
    
    long line (run dev/scalastyle to check all)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56245433
  
    @brkyvz  I've made a rough pass, and have listed all of my comments.  I can make future passes as needed.  Lots of work & it will be great to have!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17813577
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala ---
    @@ -169,4 +169,67 @@ object TestingUtils {
         override def toString = x.toString
       }
     
    +  case class CompareMatrixRightSide(
    +     fun: (Matrix, Matrix, Double) => Boolean, y: Matrix, eps: Double, method: String)
    +
    +  /**
    +   * Implicit class for comparing two matrices using relative tolerance or absolute tolerance.
    +   */
    +  implicit class MatrixWithAlmostEquals(val x: Matrix) {
    +
    +    /**
    +     * When the difference of two vectors are within eps, returns true; otherwise, returns false.
    +     */
    +    def ~=(r: CompareMatrixRightSide): Boolean = r.fun(x, r.y, r.eps)
    +
    +    /**
    +     * When the difference of two vectors are within eps, returns false; otherwise, returns true.
    +     */
    +    def !~=(r: CompareMatrixRightSide): Boolean = !r.fun(x, r.y, r.eps)
    +
    +    /**
    +     * Throws exception when the difference of two vectors are NOT within eps;
    +     * otherwise, returns true.
    +     */
    +    def ~==(r: CompareMatrixRightSide): Boolean = {
    +      if (!r.fun(x, r.y, r.eps)) {
    +        throw new TestFailedException(
    +          s"Expected \n$x\n and \n${r.y}\n to be within ${r.eps}${r.method} for all elements.", 0)
    +      }
    +      true
    +    }
    +
    +    /**
    +     * Throws exception when the difference of two matrices are within eps; otherwise, returns true.
    +     */
    +    def !~==(r: CompareMatrixRightSide): Boolean = {
    +      if (r.fun(x, r.y, r.eps)) {
    +        throw new TestFailedException(
    +          s"Did not expect \n$x\n and \n${r.y}\n to be within " +
    +            "${r.eps}${r.method} for all elements.", 0)
    +      }
    +      true
    +    }
    +
    +    /**
    +     * Comparison using absolute tolerance.
    +     */
    +    def absTol(eps: Double): CompareMatrixRightSide = CompareMatrixRightSide(
    +      (x: Matrix, y: Matrix, eps: Double) => {
    +        x.toArray.zip(y.toArray).forall(x => x._1 ~= x._2 absTol eps)
    --- End diff --
    
    confusing having 2 things called x


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765001
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    --- End diff --
    
    Note why this is outside the loop: Values of C are updated multiple times since A is not transposed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17806667
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    +  // in that partition so that we can preallocate a memory efficient buffer
    +  private[mllib] def fromRDD(
    +      rows: RDD[(Double, Vector)],
    +      partitionMetaData: Array[(Int, Int)],
    +      batchSize : Int,
    +      buildSparseThreshold: Double,
    +      generateOnTheFly: Boolean = true): RDD[(DenseMatrix, Matrix)] = {
    +
    +    if (!generateOnTheFly){
    +      rows.mapPartitions { iter =>
    +        iter.grouped(batchSize)
    +      }.map(fromSeq(_, batchSize))
    +    }else {
    +      val numFeatures = rows.first()._2.size
    +
    +      rows.mapPartitionsWithIndex{ case (ind, iter) =>
    +        val findPartition = partitionMetaData.find(_._1 == ind)
    +        val matrixBuffer =
    +          if (findPartition.get._2 != -1) {
    +            val nnz = findPartition.get._2
    +            val density = nnz * 1.0 / (numFeatures * batchSize)
    +            if (density <= buildSparseThreshold) {
    +              (DenseMatrix.zeros(batchSize, 1), new SparseMatrix(numFeatures, batchSize,
    +                Array.fill(batchSize + 1)(0), Array.fill(nnz)(0), Array.fill(nnz)(0.0)))
    +            } else {
    +              (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +            }
    +          } else {
    +            (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +          }
    +        iter.grouped(batchSize).map(fromSeqIntoBuffer(_, matrixBuffer, batchSize)._2)
    +      }
    +    }
    +  }
    +
    +  // Collects data on the maximum number of non-zero elements in a partition for each
    +  // batch of matrices
    +  private[mllib] def getSparsityData(
    +      rows: RDD[(Double, Vector)],
    +      batchSize : Int = 64): Array[(Int, Int)] = {
    +    val numFeatures = rows.first()._2.size
    +
    +    val partitionMetaData = rows.mapPartitionsWithIndex { case (ind, iter) =>
    +      val matrixBuffer =
    +        (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +      var partitionMaxNNZ = -1
    +
    +      iter.grouped(batchSize).foreach { r =>
    +        val (metaData, _) = fromSeqIntoBuffer(r, matrixBuffer, batchSize)
    +        val maxNNZ =
    +          if (metaData > partitionMaxNNZ) metaData else partitionMaxNNZ
    +
    +        partitionMaxNNZ = maxNNZ
    +      }
    +
    +      Iterator((ind, partitionMaxNNZ))
    +    }
    +    partitionMetaData.collect()
    +  }
    +
    +  private def fromSeq(rows: Seq[(Double, Vector)], batchSize: Int) : (DenseMatrix, Matrix) = {
    --- End diff --
    
    More descriptive name: "fromSeq" --> "seqToMatrix"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17807758
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -157,3 +157,221 @@ class HingeGradient extends Gradient {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * Class used to compute the gradient for a loss function, given a series of data points.
    + */
    +@DeveloperApi
    +abstract class MultiModelGradient extends Serializable {
    +  /**
    +   * Compute the gradient and loss given the features of all data points.
    +   *
    +   * @param data features for one data point
    +   * @param label label for this data point
    +   * @param weights weights/coefficients corresponding to features
    +   *
    +   * @return (gradient: DenseMatrix, loss: Double)
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix): (DenseMatrix, Matrix)
    +
    +  /**
    +   * Compute the gradient and loss given the features of a series of data point,
    +   * add the gradient to a provided matrix to avoid creating new objects, and return loss.
    +   *
    +   * @param data features for the data points
    +   * @param label label for the data points
    +   * @param weights weights/coefficients corresponding to features
    +   * @param cumGradient the computed gradient will be added to this matrix
    +   *
    +   * @return loss
    +   */
    +  def compute(data: Matrix, label: DenseMatrix,
    +                       weights: DenseMatrix, cumGradient: DenseMatrix): Matrix
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Compute gradient and loss for a logistic loss function, as used in binary classification.
    + * See also the documentation for the precise formulation.
    + */
    +@DeveloperApi
    +class MultiModelLogisticGradient extends MultiModelGradient {
    +
    +  private def sigmoid(p: DenseMatrix): DenseMatrix = {
    +    def takeSigmoid(p: Double): Double = {
    +      1.0 / (math.exp(-p) + 1.0)
    +    }
    +    p.map(takeSigmoid)
    +  }
    +
    +  override def compute(data: Matrix, label: DenseMatrix,
    --- End diff --
    
    Can this be implemented using the below compute method to avoid code duplication?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz closed the pull request at:

    https://github.com/apache/spark/pull/2451


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17812518
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala ---
    @@ -145,12 +150,151 @@ class SquaredL2Updater extends Updater {
         // w' = w - thisIterStepSize * (gradient + regParam * w)
         // w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient
         val thisIterStepSize = stepSize / math.sqrt(iter)
    -    val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector
    -    brzWeights :*= (1.0 - thisIterStepSize * regParam)
    -    brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
    -    val norm = brzNorm(brzWeights, 2.0)
    +    scal(1.0 - thisIterStepSize * regParam, weightsOld)
    +    axpy(-thisIterStepSize, gradient, weightsOld)
    +    val norm = brzNorm(weightsOld.toBreeze, 2.0)
     
    -    (Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)
    +    (weightsOld, 0.5 * regParam * norm * norm)
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Class used to perform steps (weight update) using Gradient Descent methods.
    + *
    + * For general minimization problems, or for regularized problems of the form
    + *         min  L(w) + regParam * R(w),
    + * the compute function performs the actual update step, when given some
    + * (e.g. stochastic) gradient direction for the loss L(w),
    + * and a desired step-size (learning rate).
    + *
    + * The updater is responsible to also perform the update coming from the
    + * regularization term R(w) (if any regularization is used).
    + */
    +@DeveloperApi
    +abstract class MultiModelUpdater extends Serializable {
    +  /**
    +   * Compute an updated value for weights given the gradient, stepSize, iteration number and
    +   * regularization parameter. Also returns the regularization value regParam * R(w)
    +   * computed using the *updated* weights.
    +   *
    +   * @param weightsOld - Column matrix of size dx1 where d is the number of features.
    --- End diff --
    
    update doc (matrix size)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17806894
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    +  // in that partition so that we can preallocate a memory efficient buffer
    +  private[mllib] def fromRDD(
    +      rows: RDD[(Double, Vector)],
    +      partitionMetaData: Array[(Int, Int)],
    +      batchSize : Int,
    +      buildSparseThreshold: Double,
    +      generateOnTheFly: Boolean = true): RDD[(DenseMatrix, Matrix)] = {
    +
    +    if (!generateOnTheFly){
    +      rows.mapPartitions { iter =>
    +        iter.grouped(batchSize)
    +      }.map(fromSeq(_, batchSize))
    +    }else {
    +      val numFeatures = rows.first()._2.size
    +
    +      rows.mapPartitionsWithIndex{ case (ind, iter) =>
    +        val findPartition = partitionMetaData.find(_._1 == ind)
    +        val matrixBuffer =
    +          if (findPartition.get._2 != -1) {
    +            val nnz = findPartition.get._2
    +            val density = nnz * 1.0 / (numFeatures * batchSize)
    +            if (density <= buildSparseThreshold) {
    +              (DenseMatrix.zeros(batchSize, 1), new SparseMatrix(numFeatures, batchSize,
    +                Array.fill(batchSize + 1)(0), Array.fill(nnz)(0), Array.fill(nnz)(0.0)))
    +            } else {
    +              (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +            }
    +          } else {
    +            (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +          }
    +        iter.grouped(batchSize).map(fromSeqIntoBuffer(_, matrixBuffer, batchSize)._2)
    +      }
    +    }
    +  }
    +
    +  // Collects data on the maximum number of non-zero elements in a partition for each
    +  // batch of matrices
    +  private[mllib] def getSparsityData(
    --- End diff --
    
    Should this and other methods which operate on labeled data be in a separate object from Matrices?  E.g., LabeledMatrices?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17809626
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    --- End diff --
    
    Here and in the other methods, maybe append an "s" if it takes multiple parameter settings: "setStepSizes"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17801649
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols)
    +    BLAS.gemm(true, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */
    +  def transposeMultiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numCols))
    +    BLAS.gemv(true, 1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** A human readable representation of the matrix */
       override def toString: String = toBreeze.toString()
    +
    +  private[mllib] def map(f: Double => Double): Matrix
    +
    +  private[mllib] def update(f: Double => Double): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double,
    +                                                        y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double,
    +                                                     y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double,
    +                                                     y: Double): Matrix
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix
    +
    +  private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y)
    +
    +  private[mllib] def *(y: Matrix) = operate(_ * _, y)
    +
    +  private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y)
    +
    +  private[mllib] def +(y: Matrix) = operate(_ + _, y)
    +
    +  private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y)
    +
    +  private[mllib] def -(y: Matrix) = operate(_ - _, y)
    +
    +  private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y)
    +
    +  private[mllib] def /(y: Matrix) = operate(_ / _, y)
    +
    +  private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y)
    +
    +  private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y)
    +
    +  private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y)
    +
    +  private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y)
    +
    +  private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y)
    +
    +  private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y)
    +
    +  private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y)
    +
    +  private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y)
    +
    +  private[mllib] def neg: Matrix
    +
    +  private[mllib] def negInPlace: Matrix
    +
    +  /** Less-than-or-equal-to check. Outputs binary `DenseMatrix` */
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix
    +
    +  /** Returns the p-th norm for each column */
    +  private[mllib] def colNorms(p: Double): Matrix
    +
    +  private[mllib] def colSums: DenseMatrix = colSums(false)
    +
    +  private[mllib] def colSums(absolute: Boolean, skipRows: DenseMatrix = null): DenseMatrix = {
    --- End diff --
    
    Also, why not have this an abstract method implemented in Dense/SparseMatrix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17806323
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    +  // in that partition so that we can preallocate a memory efficient buffer
    +  private[mllib] def fromRDD(
    +      rows: RDD[(Double, Vector)],
    +      partitionMetaData: Array[(Int, Int)],
    +      batchSize : Int,
    +      buildSparseThreshold: Double,
    +      generateOnTheFly: Boolean = true): RDD[(DenseMatrix, Matrix)] = {
    +
    +    if (!generateOnTheFly){
    +      rows.mapPartitions { iter =>
    +        iter.grouped(batchSize)
    +      }.map(fromSeq(_, batchSize))
    +    }else {
    +      val numFeatures = rows.first()._2.size
    +
    +      rows.mapPartitionsWithIndex{ case (ind, iter) =>
    +        val findPartition = partitionMetaData.find(_._1 == ind)
    --- End diff --
    
    (This is the inefficiency---having to search for an index.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17806620
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    +  // in that partition so that we can preallocate a memory efficient buffer
    +  private[mllib] def fromRDD(
    +      rows: RDD[(Double, Vector)],
    +      partitionMetaData: Array[(Int, Int)],
    +      batchSize : Int,
    +      buildSparseThreshold: Double,
    +      generateOnTheFly: Boolean = true): RDD[(DenseMatrix, Matrix)] = {
    +
    +    if (!generateOnTheFly){
    +      rows.mapPartitions { iter =>
    +        iter.grouped(batchSize)
    +      }.map(fromSeq(_, batchSize))
    +    }else {
    +      val numFeatures = rows.first()._2.size
    +
    +      rows.mapPartitionsWithIndex{ case (ind, iter) =>
    +        val findPartition = partitionMetaData.find(_._1 == ind)
    +        val matrixBuffer =
    +          if (findPartition.get._2 != -1) {
    +            val nnz = findPartition.get._2
    +            val density = nnz * 1.0 / (numFeatures * batchSize)
    +            if (density <= buildSparseThreshold) {
    +              (DenseMatrix.zeros(batchSize, 1), new SparseMatrix(numFeatures, batchSize,
    +                Array.fill(batchSize + 1)(0), Array.fill(nnz)(0), Array.fill(nnz)(0.0)))
    +            } else {
    +              (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +            }
    +          } else {
    +            (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +          }
    +        iter.grouped(batchSize).map(fromSeqIntoBuffer(_, matrixBuffer, batchSize)._2)
    +      }
    +    }
    +  }
    +
    +  // Collects data on the maximum number of non-zero elements in a partition for each
    +  // batch of matrices
    +  private[mllib] def getSparsityData(
    +      rows: RDD[(Double, Vector)],
    +      batchSize : Int = 64): Array[(Int, Int)] = {
    +    val numFeatures = rows.first()._2.size
    +
    +    val partitionMetaData = rows.mapPartitionsWithIndex { case (ind, iter) =>
    +      val matrixBuffer =
    +        (DenseMatrix.zeros(batchSize, 1), DenseMatrix.zeros(numFeatures, batchSize))
    +      var partitionMaxNNZ = -1
    +
    +      iter.grouped(batchSize).foreach { r =>
    +        val (metaData, _) = fromSeqIntoBuffer(r, matrixBuffer, batchSize)
    --- End diff --
    
    "metaData" is vague; use some name with "nnz"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802391
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    --- End diff --
    
    spaces between args


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17811267
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala ---
    @@ -0,0 +1,256 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.optimization
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import breeze.linalg.{DenseVector => BDV}
    +
    +import org.apache.spark.annotation.{Experimental, DeveloperApi}
    +import org.apache.spark.Logging
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.mllib.linalg._
    +import org.apache.spark.mllib.rdd.RDDFunctions._
    +
    +class MultiModelGradientDescent private[mllib] (
    +    private var gradient: MultiModelGradient,
    +    private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging {
    +
    +  private var stepSize: Array[Double] = Array(1.0, 0.1)
    +  private var numIterations: Array[Int] = Array(100)
    +  private var regParam: Array[Double] = Array(0.0, 0.1, 1.0)
    +  private var miniBatchFraction: Double = 1.0
    +
    +  /**
    +   * Set the initial step size of SGD for the first step. Default (1.0, 0.1).
    +   * In subsequent steps, the step size will decrease with stepSize/sqrt(t)
    +   */
    +  def setStepSize(step: Array[Double]): this.type = {
    +    this.stepSize = step
    +    this
    +  }
    +
    +  /**
    +   * :: Experimental ::
    +   * Set fraction of data to be used for each SGD iteration.
    +   * Default 1.0 (corresponding to deterministic/classical gradient descent)
    +   */
    +  @Experimental
    +  def setMiniBatchFraction(fraction: Double): this.type = {
    +    this.miniBatchFraction = fraction
    +    this
    +  }
    +
    +  /**
    +   * Set the number of iterations for SGD. Default 100.
    +   */
    +  def setNumIterations(iters: Array[Int]): this.type = {
    +    this.numIterations = iters
    +    this
    +  }
    +
    +  /**
    +   * Set the regularization parameter. Default (0.0, 0.1, 1.0).
    +   */
    +  def setRegParam(regParam: Array[Double]): this.type = {
    +    this.regParam = regParam
    +    this
    +  }
    +
    +  /**
    +   * Set the gradient function (of the loss function of one single data example)
    +   * to be used for SGD.
    +   */
    +  def setGradient(gradient: MultiModelGradient): this.type = {
    +    this.gradient = gradient
    +    this
    +  }
    +
    +
    +  /**
    +   * Set the updater function to actually perform a gradient step in a given direction.
    +   * The updater is responsible to perform the update from the regularization term as well,
    +   * and therefore determines what kind or regularization is used, if any.
    +   */
    +  def setUpdater(updater: Array[MultiModelUpdater]): this.type = {
    +    this.updater = updater
    +    this
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Runs gradient descent on the given training data.
    +   * @param data training data
    +   * @param initialWeights initial weights
    +   * @return solution vector
    +   */
    +  @DeveloperApi
    +  def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = {
    +    val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD(
    +      data,
    +      gradient,
    +      updater,
    +      stepSize,
    +      numIterations,
    +      regParam,
    +      miniBatchFraction,
    +      initialWeights)
    +    weights
    +  }
    +
    +}
    +
    +/**
    + * :: DeveloperApi ::
    + * Top-level method to run gradient descent.
    + */
    +@DeveloperApi
    +object MultiModelGradientDescent extends Logging {
    +  /**
    +   * Run stochastic gradient descent (SGD) in parallel using mini batches.
    +   * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
    +   * in order to compute a gradient estimate.
    +   * Sampling, and averaging the subgradients over this subset is performed using one standard
    +   * spark map-reduce in each iteration.
    +   *
    +   * @param data - Input data for SGD. RDD of the set of data examples, each of
    +   *               the form (label, [feature values]).
    +   * @param gradient - Gradient object (used to compute the gradient of the loss function of
    +   *                   one single data example)
    +   * @param updater - Updater function to actually perform a gradient step in a given direction.
    +   * @param stepSize - initial step size for the first step
    +   * @param numIterations - number of iterations that SGD should be run.
    +   * @param regParam - regularization parameter
    +   * @param miniBatchFraction - fraction of the input data set that should be used for
    +   *                            one iteration of SGD. Default value 1.0.
    +   *
    +   * @return A tuple containing two elements. The first element is a column matrix containing
    +   *         weights for every feature, and the second element is an array containing the
    +   *         stochastic loss computed for every iteration.
    +   */
    +  def runMiniBatchMMSGD(
    +      data: RDD[(Double, Vector)],
    +      gradient: MultiModelGradient,
    +      updater: Array[MultiModelUpdater],
    +      stepSize: Array[Double],
    +      numIterations: Array[Int],
    +      regParam: Array[Double],
    +      miniBatchFraction: Double,
    +      initialWeights: Vector,
    +      batchSize: Int = 64,
    +      useSparse: Boolean = true,
    +      buildSparseThreshold: Double = 0.2): (Matrix, Array[Vector]) = {
    +
    +    val maxNumIter = numIterations.max
    +    val stochasticLossHistory = new ArrayBuffer[Vector](maxNumIter)
    +
    +    val numExamples = data.count()
    +    val miniBatchSize = numExamples * miniBatchFraction
    +    val numModels = stepSize.length * regParam.length
    +    val numFeatures = initialWeights.size
    +    val numRegularizers = updater.length
    +    val updaterCounter = 0 until numRegularizers
    +    // Initialize weights as a column vector
    +    var weights = updaterCounter.map { i =>
    +      new DenseMatrix(numFeatures, 1, initialWeights.toArray).
    +        multiply(DenseMatrix.ones(1, numModels))
    +    }
    +
    +    var finalWeights: Matrix = new DenseMatrix(numFeatures, 0, Array.empty[Double])
    +
    +    // if no data, return initial weights to avoid NaNs
    +    if (numExamples == 0) {
    +
    +      logInfo("GradientDescent.runMiniBatchSGD returning initial weights, no data found")
    +      return (Matrices.horzCat(weights), stochasticLossHistory.toArray)
    +
    +    }
    +    val stepSizeMatrix = new DenseMatrix(1, numModels,
    +      stepSize.flatMap{ ss =>
    +        for (i <- 1 to regParam.length) yield ss
    +      }
    +    )
    +    val regMatrix = SparseMatrix.diag(Vectors.dense(stepSize.flatMap{ ss =>
    +      for (reg <- regParam) yield reg
    +    }))
    +
    +    val bcMetaData =
    +      if (useSparse) {
    +        data.context.broadcast(Matrices.getSparsityData(data, batchSize))
    +      } else {
    +        val emptyData: Array[(Int, Int)] = (0 until data.partitions.length).map { i =>
    +          (i, -1)}.toArray
    +        data.context.broadcast(emptyData)
    +      }
    +    val points = Matrices.fromRDD(data, bcMetaData.value, batchSize, buildSparseThreshold)
    +
    +    /**
    +     * For the first iteration, the regVal will be initialized as sum of weight squares
    +     * if it's L2 updater; for L1 updater, the same logic is followed.
    +     */
    +    val updaterWithIndex = updater.zipWithIndex
    +
    +    var regVal = updaterWithIndex.map { case (u, ind) =>
    +      u.compute(weights(ind), DenseMatrix.zeros(numFeatures, numModels),
    +        DenseMatrix.zeros(1, numModels), 1, regMatrix)._2
    +    }
    +    val orderedIters = numIterations.sorted
    +    var iterIndexCounter = 0
    +    for (i <- 1 to maxNumIter) {
    +      val bcWeights = data.context.broadcast(weights)
    +      // Sample a subset (fraction miniBatchFraction) of the total data
    +      // compute and sum up the subgradients on this subset (this is one map-reduce)
    +      val (gradientSum, lossSum) = points.sample(false, miniBatchFraction, 42 + i)
    +        .treeAggregate(updaterCounter.map(j => Matrices.zeros(numFeatures, numModels)),
    --- End diff --
    
    In this treeAggregate, it might be nice to include some explicit types for clarity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765077
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        f2jBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      // Perform matrix multiplication and add to C. The rows of A are multiplied by the columns of
    +      // B, and added to C.
    +      var colCounterForB = 0 // the column to be updated in C
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Bstart = colCounterForB * kB
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA) {
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B.values(Bstart + colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA){
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B(colCounterForB, colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A and `SparseMatrix` B.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: SparseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Bvals = B.values
    +    val Brows = if (!transB) B.rowIndices else B.colPtrs
    +    val Bcols = if (!transB) B.colPtrs else B.rowIndices
    +
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB){ // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Bcols(colCounterForB)
    +            val Astart = rowCounterForA * kA
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Bvals(i) * A.values(Astart + Brows(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var rowCounterForA = 0
    +        while (rowCounterForA < mA) {
    +          var colCounterForA = 0
    +          val Astart = rowCounterForA * kA
    +          while (colCounterForA < kA) {
    +            var i = Brows(colCounterForA)
    +            val indEnd = Brows(colCounterForA + 1)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + colCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          rowCounterForA += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        nativeBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      if (!transB) { // Expensive to put the check inside the loop
    +
    --- End diff --
    
    extra space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17800687
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -37,11 +44,197 @@ trait Matrix extends Serializable {
       private[mllib] def toBreeze: BM[Double]
     
       /** Gets the (i, j)-th element. */
    -  private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j)
    +  private[mllib] def apply(i: Int, j: Int): Double
    +
    +  /** Return the index for the (i, j)-th element in the backing array. */
    +  private[mllib] def index(i: Int, j: Int): Int
    +
    +  /** Update element at (i, j) */
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit
    +
    +  /** Get a deep copy of the matrix. */
    +  def copy: Matrix
     
    +  /** Convenience method for `Matrix`-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def multiply(y: Matrix): DenseMatrix = {
    +    val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols)
    +    BLAS.gemm(false, false, 1.0, this, y, 0.0, C)
    +    C
    +  }
    +
    +  /** Convenience method for `Matrix`-`DenseVector` multiplication. */
    +  def multiply(y: DenseVector): DenseVector = {
    +    val output = new DenseVector(new Array[Double](numRows))
    +    BLAS.gemv(1.0, this, y, 0.0, output)
    +    output
    +  }
    +
    +  /** Convenience method for `Matrix`^T^-`Matrix` multiplication.
    +    * Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */
    +  def transposeMultiply(y: Matrix): DenseMatrix = {
    --- End diff --
    
    How hard would it be to have matrices store a transpose bit indicated if they are transposed (without the data being moved)?  I envision:
    * transpose() function which sets this bit (so transpose is a lazy operation)
    * eliminate transposeMultiply
    * perhaps include a transposePhysical or tranpose(physical: Boolean) method which forces data movement
    I'm also OK with adding that support later on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by anantasty <gi...@git.apache.org>.
Github user anantasty commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56108815
  
    With some guidance I could help you with the docs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by anantasty <gi...@git.apache.org>.
Github user anantasty commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56217049
  
    @brkyvz  I will get on it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17802982
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -57,13 +250,709 @@ trait Matrix extends Serializable {
      * @param numCols number of columns
      * @param values matrix entries in column major
      */
    -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix {
    +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable {
     
    -  require(values.length == numRows * numCols)
    +  require(values.length == numRows * numCols, "The number of values supplied doesn't match the " +
    +    s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}")
     
       override def toArray: Array[Double] = values
     
    -  private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +  private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values)
    +
    +  private[mllib] def apply(i: Int): Double = values(i)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
    +
    +  private[mllib] def index(i: Int, j: Int): Int = i + numRows * j
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    values(index(i, j)) = v
    +  }
    +
    +  override def copy = new DenseMatrix(numRows, numCols, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    val len = y_vals.length
    +    require(y_vals.length == numRows)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < len){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(i))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +     f: (Double, Double) => Double,
    +     y: Matrix): DenseMatrix = {
    +    val y_vals = y.toArray
    +    require(y_vals.length == numCols)
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        values(idx) = f(values(idx), y_vals(j))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val y_val = y.toArray
    +    val len = values.length
    +    require(y_val.length == values.length)
    +    var j = 0
    +    while (j < len){
    +      values(j) = f(values(j), y_val(j))
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = f(values(j), y)
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    if (y.numCols==1 || y.numRows == 1){
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " +
    +        "or elementWiseOperateOnColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1){
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols==1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows==1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix =  {
    +    val dup = this.copy
    +    dup.elementWiseOperateScalarInPlace(f, y)
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = {
    +    val dup = this.copy
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) = new DenseMatrix(numRows, numCols, values.map(f))
    +
    +  def update(f: Double => Double): DenseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = 0
    +      while (i < numRows){
    +        val idx = index(i, j)
    +        sums.update(0,j, sums(j) + math.pow(values(idx),p))
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    j = 0
    +    while (j < numCols){
    +      sums.update(0, j, math.pow(sums(j), 1/p))
    +      j += 1
    +    }
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compareInPlace(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) = if (f(values(j), v)) 1.0 else 0.0
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, values.clone())
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  private[mllib] def multiplyInPlace(y: Matrix): DenseMatrix = {
    +    val copy = this multiply y
    +    BLAS.copy(Vectors.dense(copy.values), Vectors.dense(values))
    +    this
    +  }
    +}
    +
    +object DenseMatrix {
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(0.0))
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): DenseMatrix =
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): DenseMatrix = {
    +    val identity = DenseMatrix.zeros(n, n)
    +    var i = 0
    +    while (i < n){
    +      identity.update(i, i, 1.0)
    +      i += 1
    +    }
    +    identity
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextDouble()))
    +  }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): DenseMatrix = {
    +    val rand = new XORShiftRandom
    +    new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rand.nextGaussian()))
    +  }
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): DenseMatrix = {
    +    val n = vector.size
    +    val matrix = DenseMatrix.eye(n)
    +    val values = vector.toArray
    +    var i = 0
    +    while (i < n) {
    +      matrix.update(i, i, values(i))
    +      i += 1
    +    }
    +    matrix
    +  }
    +}
    +
    +/**
    + * Column-majored sparse matrix.
    + * The entry values are stored in Compressed Sparse Column (CSC) format.
    + * For example, the following matrix
    + * {{{
    + *   1.0 0.0 4.0
    + *   0.0 3.0 5.0
    + *   2.0 0.0 6.0
    + * }}}
    + * is stored as `values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]`,
    + * `rowIndices=[0, 2, 1, 0, 1, 2]`, `colPointers=[0, 2, 3, 6]`.
    + *
    + * @param numRows number of rows
    + * @param numCols number of columns
    + * @param colPtrs the index corresponding to the start of a new column
    + * @param rowIndices the row index of the entry. They must be in strictly increasing order for each
    + *                   column
    + * @param values non-zero matrix entries in column major
    + */
    +class SparseMatrix(
    +    val numRows: Int,
    +    val numCols: Int,
    +    val colPtrs: Array[Int],
    +    val rowIndices: Array[Int],
    +    val values: Array[Double]) extends Matrix with Serializable {
    +
    +  require(values.length == rowIndices.length, "The number of row indices and values don't match! " +
    +    s"values.length: ${values.length}, rowIndices.length: ${rowIndices.length}")
    +  require(colPtrs.length == numCols + 1, "The length of the column indices should be the " +
    +    s"number of columns + 1. Currently, colPointers.length: ${colPtrs.length}, " +
    +    s"numCols: $numCols")
    +
    +  override def toArray: Array[Double] = {
    +    val arr = new Array[Double](numRows * numCols)
    +    var j = 0
    +    while (j < numCols) {
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      val offset = j * numRows
    +      while (i < indEnd) {
    +        val rowIndex = rowIndices(i)
    +        arr(offset + rowIndex) = values(i)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    arr
    +  }
    +
    +  private[mllib] def toBreeze: BM[Double] =
    +    new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
    +
    +  private[mllib] def apply(i: Int, j: Int): Double = {
    +    val ind = index(i, j)
    +    if (ind < 0) 0.0 else values(ind)
    +  }
    +
    +  private[mllib] def index(i: Int, j: Int): Int = {
    +    Arrays.binarySearch(rowIndices, colPtrs(j), colPtrs(j + 1), i)
    +  }
    +
    +  private[mllib] def update(i: Int, j: Int, v: Double): Unit = {
    +    val ind = index(i, j)
    +    if (ind == -1){
    +      throw new NoSuchElementException("The given row and column indices correspond to a zero " +
    +        "value. Only non-zero elements in Sparse Matrices can be updated.")
    +    } else {
    +      values(index(i, j)) = v
    +    }
    +  }
    +
    +  override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone())
    +
    +  private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numRows)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(rowIndices(i)))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnColumnsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnRowsInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val y_vals = y.toArray
    +      require(y_vals.length == numCols)
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd){
    +          values(i) = f(values(i), y_vals(j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateOnRowsInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateInPlace(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix =  {
    +    require(y.numCols == numCols)
    +    require(y.numRows == numRows)
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      while (j < numCols){
    +        var i = colPtrs(j)
    +        val indEnd = colPtrs(j + 1)
    +        while (i < indEnd) {
    +          values(i) = f(values(i), y(rowIndices(i), j))
    +          i += 1
    +        }
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateScalarInPlace(
    +      f: (Double, Double) => Double,
    +      y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      var j = 0
    +      val len = values.length
    +      while (j < len){
    +        values(j) = f(values(j), y)
    +        j += 1
    +      }
    +      this
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private def isMultiplication(f: (Double, Double) => Double): Boolean = {
    +    if (f(2, 9) != 18) return false
    +    if (f(3, 7) != 21) return false
    +    if (f(8, 9) != 72) return false
    +    true
    +  }
    +
    +  private def isDivision(f: (Double, Double) => Double): Boolean = {
    +    if (f(12, 3) != 4) return false
    +    if (f(72, 4) != 18) return false
    +    if (f(72, 9) != 8) return false
    +    true
    +  }
    +
    +  private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    if (y.numCols==1 || y.numRows == 1) {
    +      require(numCols != numRows, "Operation is ambiguous. Please use elementWiseMultiplyRows " +
    +        "or elementWiseMultiplyColumns instead")
    +    }
    +    if (y.numCols == 1 && y.numRows == 1) {
    +      elementWiseOperateScalarInPlace(f, y.toArray(0))
    +    } else {
    +      if (y.numCols == 1) {
    +        elementWiseOperateOnColumnsInPlace(f, y)
    +      }else if (y.numRows == 1){
    +        elementWiseOperateOnRowsInPlace(f, y)
    +      }else{
    +        elementWiseOperateInPlace(f, y)
    +      }
    +    }
    +  }
    +
    +  private[mllib] def elementWiseOperateOnColumns(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnColumnsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateOnRows(
    +      f: (Double, Double) => Double,
    +      y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateOnRowsInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix =  {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.elementWiseOperateInPlace(f, y)
    +  }
    +  private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix =  {
    +    if (isMultiplication(f) || isDivision(f)) {
    +      val dup = this.copy
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    } else {
    +      val dup = this.toDense
    +      dup.elementWiseOperateScalarInPlace(f, y)
    +    }
    +  }
    +
    +  private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix = {
    +    val dup = y match {
    +      case sy: SparseMatrix => this.copy
    +      case dy: DenseMatrix => this.toDense
    +    }
    +    dup.operateInPlace(f, y)
    +  }
    +
    +  def map(f: Double => Double) =
    +    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
    +
    +  def update(f: Double => Double): SparseMatrix = {
    +    val len = values.length
    +    var i = 0
    +    while (i < len) {
    +      values(i) = f(values(i))
    +      i += 1
    +    }
    +    this
    +  }
    +
    +  def colNorms(p: Double): DenseMatrix = {
    +    if (p==1.0) return colSums(true)
    +    val sums = new DenseMatrix(1, numCols, Array.fill(numCols)(0.0))
    +    var j = 0
    +    while (j < numCols){
    +      var i = colPtrs(j)
    +      val indEnd = colPtrs(j + 1)
    +      while (i < indEnd){
    +        sums.values(j) += math.pow(values(i),p)
    +        i += 1
    +      }
    +      j += 1
    +    }
    +    sums.update(math.pow(_, 1/p))
    +    sums
    +  }
    +
    +  private[mllib] def negInPlace: SparseMatrix = {
    +    var j = 0
    +    val len = values.length
    +    while (j < len){
    +      values(j) *= -1
    +      j += 1
    +    }
    +    this
    +  }
    +
    +  private[mllib] def neg: SparseMatrix = {
    +    val copy = this.copy
    +    copy.negInPlace
    +  }
    +
    +  private[mllib] def compare(v: Double, f: (Double, Double) => Boolean): DenseMatrix = {
    +    val copy = new DenseMatrix(numRows, numCols, this.toArray)
    +    copy.compareInPlace(v, f)
    +  }
    +
    +  def toDense: DenseMatrix = new DenseMatrix(numRows, numCols, this.toArray)
    +}
    +
    +object SparseMatrix {
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): SparseMatrix = {
    +    new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0))
    +  }
    +
    +  private def genRand(numRows: Int, numCols: Int, raw: Array[Double], nonZero: Int): SparseMatrix = {
    +    val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
    +
    +    val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
    +    val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
    +
    +    var i = 0
    +    var nnz = 0
    +    var lastCol = -1
    +
    +    raw.foreach { v =>
    +      val r = i % numRows
    +      val c = (i - r) / numRows
    +      if ( v != 0.0) {
    +        sRows.append(r)
    +        sparseA.append(v)
    +        while (c != lastCol){
    +          sCols.append(nnz)
    +          lastCol += 1
    +        }
    +        nnz += 1
    +      }
    +      i += 1
    +    }
    +    sCols.append(sparseA.length)
    +    new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray)
    +  }
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    --- End diff --
    
    They're nice functions to have. It will be helpful for people who want to do random projections


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17765175
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
    @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {
             throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.")
         }
       }
    +
    +  // For level-3 routines, we use the native BLAS.
    +  private def nativeBLAS: NetlibBLAS = {
    +    if (_nativeBLAS == null) {
    +      _nativeBLAS = NativeBLAS
    +    }
    +    _nativeBLAS
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * @param transA whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param transB whether to use the transpose of matrix B (true), or B itself (false).
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    if (alpha == 0.0) {
    +      logDebug("gemm: alpha is equal to 0. Returning C.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C)
    +            case sB: SparseMatrix =>
    +              throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " +
    +                s"multiplication")
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case dense: DenseMatrix =>
    +          B match {
    +            case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C)
    +            case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C)
    +            case _ =>
    +              throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.")
    +          }
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   *
    +   * @param alpha a scalar to scale the multiplication A * B.
    +   * @param A the matrix A that will be left multiplied to B. Size of m x k.
    +   * @param B the matrix B that will be left multiplied by A. Size of k x n.
    +   * @param beta a scalar that can be used to scale matrix C.
    +   * @param C the resulting matrix C. Size of m x n.
    +   */
    +  def gemm(
    +      alpha: Double,
    +      A: Matrix,
    +      B: Matrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    gemm(false, false, alpha, A, B, beta, C)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +    val tAstr = if (!transA) "N" else "T"
    +    val tBstr = if (!transB) "N" else "T"
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows,
    +      beta, C.values, C.numRows)
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      B: DenseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Avals = A.values
    +    val Arows = if (!transA) A.rowIndices else A.colPtrs
    +    val Acols = if (!transA) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val Bstart = colCounterForB * kA
    +          while (rowCounterForA < mA) {
    +            var i = Arows(rowCounterForA)
    +            val indEnd = Arows(rowCounterForA + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B.values(Bstart + Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var rowCounter = 0
    +          val Cstart = colCounterForB * mA
    +          while (rowCounter < mA) {
    +            var i = Arows(rowCounter)
    +            val indEnd = Arows(rowCounter + 1)
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Avals(i) * B(colCounterForB, Acols(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounter
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounter += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        f2jBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      // Perform matrix multiplication and add to C. The rows of A are multiplied by the columns of
    +      // B, and added to C.
    +      var colCounterForB = 0 // the column to be updated in C
    +      if (!transB) { // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Bstart = colCounterForB * kB
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA) {
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B.values(Bstart + colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        while (colCounterForB < nB) {
    +          var colCounterForA = 0 // The column of A to multiply with the row of B
    +          val Cstart = colCounterForB * mA
    +          while (colCounterForA < kA){
    +            var i = Acols(colCounterForA)
    +            val indEnd = Acols(colCounterForA + 1)
    +            val Bval = B(colCounterForB, colCounterForA) * alpha
    +            while (i < indEnd){
    +              C.values(Cstart + Arows(i)) += Avals(i) * Bval
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * C := alpha * A * B + beta * C
    +   * For `DenseMatrix` A and `SparseMatrix` B.
    +   */
    +  private def gemm(
    +      transA: Boolean,
    +      transB: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      B: SparseMatrix,
    +      beta: Double,
    +      C: DenseMatrix): Unit = {
    +    val mA: Int = if (!transA) A.numRows else A.numCols
    +    val nB: Int = if (!transB) B.numCols else B.numRows
    +    val kA: Int = if (!transA) A.numCols else A.numRows
    +    val kB: Int = if (!transB) B.numRows else B.numCols
    +
    +    require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB")
    +    require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA")
    +    require(nB == C.numCols,
    +      s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB")
    +
    +    val Bvals = B.values
    +    val Brows = if (!transB) B.rowIndices else B.colPtrs
    +    val Bcols = if (!transB) B.colPtrs else B.rowIndices
    +
    +    if (transA){
    +      var colCounterForB = 0
    +      if (!transB){ // Expensive to put the check inside the loop
    +        while (colCounterForB < nB) {
    +          var rowCounterForA = 0
    +          val Cstart = colCounterForB * mA
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Bcols(colCounterForB)
    +            val Astart = rowCounterForA * kA
    +            var sum = 0.0
    +            while (i < indEnd) {
    +              sum += Bvals(i) * A.values(Astart + Brows(i))
    +              i += 1
    +            }
    +            val Cindex = Cstart + rowCounterForA
    +            C.values(Cindex) = beta * C.values(Cindex) + sum * alpha
    +            rowCounterForA += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var rowCounterForA = 0
    +        while (rowCounterForA < mA) {
    +          var colCounterForA = 0
    +          val Astart = rowCounterForA * kA
    +          while (colCounterForA < kA) {
    +            var i = Brows(colCounterForA)
    +            val indEnd = Brows(colCounterForA + 1)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + colCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            colCounterForA += 1
    +          }
    +          rowCounterForA += 1
    +        }
    +      }
    +    } else {
    +      // Scale matrix first if `beta` is not equal to 0.0
    +      if (beta != 0.0){
    +        nativeBLAS.dscal(C.values.length, beta, C.values, 1)
    +      }
    +      if (!transB) { // Expensive to put the check inside the loop
    +
    +        // Loop over the columns of B, pick non-zero row in B, select corresponding column in A,
    +        // and update the whole column in C by looping over rows in A.
    +        var colCounterForB = 0 // the column to be updated in C
    +        while (colCounterForB < nB) {
    +          var i = Bcols(colCounterForB)
    +          val indEnd = Bcols(colCounterForB + 1)
    +          while (i < indEnd) {
    +            var rowCounterForA = 0
    +            val Bval = Bvals(i)
    +            val Cstart = colCounterForB * mA
    +            val Astart = mA * Brows(i)
    +            while (rowCounterForA < mA){
    +              C.values(Cstart + rowCounterForA) += A.values(Astart + rowCounterForA) * Bval * alpha
    +              rowCounterForA += 1
    +            }
    +            i += 1
    +          }
    +          colCounterForB += 1
    +        }
    +      } else {
    +        var colCounterForA = 0
    +        while (colCounterForA < kA) {
    +          var rowCounterForA = 0
    +          val Astart = mA * colCounterForA
    +          val indEnd = Brows(colCounterForA + 1)
    +          while (rowCounterForA < mA) {
    +            var i = Brows(colCounterForA)
    +            while (i < indEnd){
    +              val Cindex = Bcols(i) * mA + rowCounterForA
    +              C.values(Cindex) += A.values(Astart + rowCounterForA) * Bvals(i) * alpha
    +              i += 1
    +            }
    +            rowCounterForA += 1
    +          }
    +          colCounterForA += 1
    +        }
    +      }
    +    }
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * @param trans whether to use the transpose of matrix A (true), or A itself (false).
    +   * @param alpha a scalar to scale the multiplication A * x.
    +   * @param A the matrix A that will be left multiplied to x. Size of m x n.
    +   * @param x the vector x that will be left multiplied by A. Size of n x 1.
    +   * @param beta a scalar that can be used to scale vector y.
    +   * @param y the resulting vector y. Size of m x 1.
    +   */
    +  def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: Matrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit = {
    +
    +    val mA: Int = if (!trans) A.numRows else A.numCols
    +    val nx: Int = x.size
    +    val nA: Int = if (!trans) A.numCols else A.numRows
    +
    +    require(nA == nx, s"The columns of A don't match the number of elements of x. A: $nA, x: $nx")
    +    require(mA == y.size,
    +      s"The rows of A don't match the number of elements of y. A: $mA, y:${y.size}}")
    +    if (alpha == 0.0) {
    +      logDebug("gemv: alpha is equal to 0. Returning y.")
    +    } else {
    +      A match {
    +        case sparse: SparseMatrix =>
    +          gemv(trans, alpha, sparse, x, beta, y)
    +        case dense: DenseMatrix =>
    +          gemv(trans, alpha, dense, x, beta, y)
    +        case _ =>
    +          throw new IllegalArgumentException(s"gemv doesn't support matrix type ${A.getClass}.")
    +      }
    +    }
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   *
    +   * @param alpha a scalar to scale the multiplication A * x.
    +   * @param A the matrix A that will be left multiplied to x. Size of m x n.
    +   * @param x the vector x that will be left multiplied by A. Size of n x 1.
    +   * @param beta a scalar that can be used to scale vector y.
    +   * @param y the resulting vector y. Size of m x 1.
    +   */
    +  def gemv(
    +      alpha: Double,
    +      A: Matrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit = {
    +    gemv(false, alpha, A, x, beta, y)
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * For `DenseMatrix` A.
    +   */
    +  private def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: DenseMatrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit =  {
    +    val tStrA = if (!trans) "N" else "T"
    +    nativeBLAS.dgemv(tStrA, A.numRows, A.numCols, alpha, A.values, A.numRows, x.values, 1, beta,
    +      y.values, 1)
    +  }
    +
    +  /**
    +   * y := alpha * A * x + beta * y
    +   * For `SparseMatrix` A.
    +   */
    +  private def gemv(
    +      trans: Boolean,
    +      alpha: Double,
    +      A: SparseMatrix,
    +      x: DenseVector,
    +      beta: Double,
    +      y: DenseVector): Unit =  {
    +
    +    val mA: Int = if(!trans) A.numRows else A.numCols
    +    val nA: Int = if(!trans) A.numCols else A.numRows
    +
    +    val Avals = A.values
    +    val Arows = if (!trans) A.rowIndices else A.colPtrs
    +    val Acols = if (!trans) A.colPtrs else A.rowIndices
    +
    +    // Slicing is easy in this case. This is the optimal multiplication setting for sparse matrices
    +    if (trans){
    +      var rowCounter = 0
    +      while (rowCounter < mA){
    --- End diff --
    
    space between ) and {


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56122908
  
    @brkyvz  Just wondering: Which reference library are you using to determine the order of arguments for BLAS routines?  E.g., it's different from [Netlib LAPACK](http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17808588
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala ---
    @@ -181,6 +181,7 @@ object GradientDescent extends Logging {
         var regVal = updater.compute(
           weights, Vectors.dense(new Array[Double](weights.size)), 0, 1, regParam)._2
     
    +    //println(s"initial:\n$weights\n\n")
    --- End diff --
    
    remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17806514
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    +  // in that partition so that we can preallocate a memory efficient buffer
    +  private[mllib] def fromRDD(
    +      rows: RDD[(Double, Vector)],
    +      partitionMetaData: Array[(Int, Int)],
    +      batchSize : Int,
    +      buildSparseThreshold: Double,
    +      generateOnTheFly: Boolean = true): RDD[(DenseMatrix, Matrix)] = {
    +
    +    if (!generateOnTheFly){
    +      rows.mapPartitions { iter =>
    +        iter.grouped(batchSize)
    +      }.map(fromSeq(_, batchSize))
    +    }else {
    +      val numFeatures = rows.first()._2.size
    +
    +      rows.mapPartitionsWithIndex{ case (ind, iter) =>
    +        val findPartition = partitionMetaData.find(_._1 == ind)
    --- End diff --
    
    (Maybe this is trivial since it is per-worker; feel free to ignore.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2451#issuecomment-56216806
  
    Also, is it odd that the user can't access the matrix data, except via toArray (or maybe side effects of the function given to map)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17812287
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala ---
    @@ -111,18 +112,22 @@ class L1Updater extends Updater {
           regParam: Double): (Vector, Double) = {
         val thisIterStepSize = stepSize / math.sqrt(iter)
         // Take gradient step
    -    val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector
    -    brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
    +    //println(s"\n$iter:")
    --- End diff --
    
    old comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2451#discussion_r17806308
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -93,9 +1000,310 @@ object Matrices {
             require(dm.majorStride == dm.rows,
               "Do not support stride size different from the number of rows.")
             new DenseMatrix(dm.rows, dm.cols, dm.data)
    +      case sm: BSM[Double] =>
    +        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
           case _ =>
             throw new UnsupportedOperationException(
               s"Do not support conversion from type ${breeze.getClass.getName}.")
         }
       }
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of zeros.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
    +   */
    +  def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of ones.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values of ones
    +   */
    +  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols)
    +
    +  /**
    +   * Generate an Identity Matrix in `DenseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def eye(n: Int): Matrix = DenseMatrix.eye(n)
    +
    +  /**
    +   * Generate an Identity Matrix in `SparseMatrix` format.
    +   * @param n number of rows and columns of the matrix
    +   * @return `Matrix` with size `n` x `n` and values of ones on the diagonal
    +   */
    +  def speye(n: Int): Matrix = SparseMatrix.speye(n)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols)
    +
    +  /**
    +   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
    +   */
    +  def sprand(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprand(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers.
    +   * @param numRows number of rows of the matrix
    +   * @param numCols number of columns of the matrix
    +   * @param density the desired density for the matrix
    +   * @param seed the seed for the random generator
    +   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
    +   */
    +  def sprandn(
    +      numRows: Int,
    +      numCols: Int,
    +      density: Double,
    +      seed: Long = Utils.random.nextLong()): Matrix =
    +    SparseMatrix.sprandn(numRows, numCols, density, seed)
    +
    +  /**
    +   * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use
    +   * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in
    +   * `SparseMatrix` format.
    +   * @param vector a `Vector` that will form the values on the diagonal of the matrix
    +   * @return Square `Matrix` with size `values.length` x `values.length` and `values`
    +   *         on the diagonal
    +   */
    +  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
    +
    +  /**
    +   * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format
    +   * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported.
    +   * @param matrices sequence of matrices
    +   * @return a single `Matrix` composed of the matrices that were horizontally concatenated
    +   */
    +  private[mllib] def horzCat(matrices: Seq[Matrix]): Matrix = {
    +    if (matrices.size == 1) {
    +      return matrices(0)
    +    }
    +    val numRows = matrices(0).numRows
    +    var rowsMatch = true
    +    var isDense = false
    +    var isSparse = false
    +    for (mat <- matrices) {
    +      if (numRows != mat.numRows) rowsMatch = false
    +      mat match {
    +        case sparse: SparseMatrix => isSparse = true
    +        case dense: DenseMatrix => isDense = true
    +      }
    +    }
    +    require(rowsMatch, "The number of rows of the matrices in this array, don't match!")
    +    var numCols = 0
    +    matrices.foreach(numCols += _.numCols)
    +    if (isSparse && !isDense) {
    +      val allColPtrs: Array[Int] = Array(0) ++ matrices.flatMap { mat =>
    +        val ptr = mat.asInstanceOf[SparseMatrix].colPtrs
    +        ptr.slice(1, ptr.length)
    +      }
    +      var counter = 0
    +      val adjustedPtrs = allColPtrs.map { p =>
    +        counter += p
    +        counter
    +      }
    +      new SparseMatrix(numRows, numCols, adjustedPtrs,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].rowIndices).toArray,
    +        matrices.flatMap(_.asInstanceOf[SparseMatrix].values).toArray)
    +    } else if (!isSparse && !isDense) {
    +      throw new IllegalArgumentException("The supplied matrices are neither in SparseMatrix or" +
    +        " DenseMatrix format!")
    +    }else {
    +      new DenseMatrix(numRows, numCols, matrices.flatMap(_.toArray).toArray)
    +    }
    +  }
    +  // partitionMetaData correspond to the index of the partition and the max number of non-zeros
    +  // in that partition so that we can preallocate a memory efficient buffer
    +  private[mllib] def fromRDD(
    +      rows: RDD[(Double, Vector)],
    +      partitionMetaData: Array[(Int, Int)],
    --- End diff --
    
    Could this be a Map[Int, Int] instead (for efficiency)?  It looks like so, based on the 1 place it is used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org