You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by feynmanliang <gi...@git.apache.org> on 2015/09/07 23:43:21 UTC

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

GitHub user feynmanliang opened a pull request:

    https://github.com/apache/spark/pull/8648

    [SPARK-10478][ML] Performance, organization, and style improvements for multi-layer perceptron

    * Changes manual iterations into higher-performance  `UFunc`s, vectorized operations, and broadcasted operations
    * Refactors multiple `while` and `for` loops into `foreach` and `map`s
    * Fixes various style issues
    * Adds comments and improves scaladocs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/feynmanliang/spark ann-improvements

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8648
    
----
commit 611c76f723405fe40ed1ba5c871d5e88b617aed3
Author: Feynman Liang <fl...@databricks.com>
Date:   2015-09-07T20:00:32Z

    Documentation and indentation fixes

commit 7b192db010ff660da39404009ce883f903f9980a
Author: Feynman Liang <fl...@databricks.com>
Date:   2015-09-07T20:02:36Z

    Refactors unneeded helpers

commit 12169d7236caaad0db57d941cb9999f47a947b61
Author: Feynman Liang <fl...@databricks.com>
Date:   2015-09-07T20:08:05Z

    More doc and style fixes

commit bc52b652541f2d6c4508662af937e0a26e207651
Author: Feynman Liang <fl...@databricks.com>
Date:   2015-09-07T20:39:21Z

    Cleans up documentation and uses functional code

commit 84f8bea0f4a3b24f8aae6eadd98d7166311f951d
Author: Feynman Liang <fl...@databricks.com>
Date:   2015-09-07T21:27:53Z

    Vectorizes linalg using ufuncs and vector ops

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138415826
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138383216
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8648#discussion_r40146842
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -260,127 +263,126 @@ private[ann] trait ActivationFunction extends Serializable {
     }
     
     /**
    - * Implements in-place application of functions
    + * Implements in-place application of functions.
      */
     private[ann] object ActivationFunction {
     
    -  def apply(x: BDM[Double], y: BDM[Double], func: Double => Double): Unit = {
    -    var i = 0
    -    while (i < x.rows) {
    -      var j = 0
    -      while (j < x.cols) {
    -        y(i, j) = func(x(i, j))
    -        j += 1
    -      }
    -      i += 1
    -    }
    +  def apply(x: BDM[Double], y: BDM[Double], func: UFunc with MappingUFunc)(
    +      implicit impl: func.Impl[BDM[Double], BDM[Double]]): Unit = {
    +    y := func(x)
       }
     
       def apply(
    -    x1: BDM[Double],
    -    x2: BDM[Double],
    -    y: BDM[Double],
    -    func: (Double, Double) => Double): Unit = {
    -    var i = 0
    -    while (i < x1.rows) {
    -      var j = 0
    -      while (j < x1.cols) {
    -        y(i, j) = func(x1(i, j), x2(i, j))
    -        j += 1
    -      }
    -      i += 1
    -    }
    +      x1: BDM[Double],
    +      x2: BDM[Double],
    +      y: BDM[Double],
    +      func: UFunc with MappingUFunc)(
    +      implicit impl: func.Impl2[BDM[Double], BDM[Double], BDM[Double]]): Unit = {
    +    y := func(x1, x2)
       }
     }
     
     /**
    - * Implements SoftMax activation function
    + * Implements Softmax activation function.
      */
     private[ann] class SoftmaxFunction extends ActivationFunction {
       override def eval(x: BDM[Double], y: BDM[Double]): Unit = {
    -    var j = 0
    -    // find max value to make sure later that exponent is computable
    -    while (j < x.cols) {
    -      var i = 0
    -      var max = Double.MinValue
    -      while (i < x.rows) {
    -        if (x(i, j) > max) {
    -          max = x(i, j)
    -        }
    -        i += 1
    -      }
    -      var sum = 0.0
    -      i = 0
    -      while (i < x.rows) {
    -        val res = Math.exp(x(i, j) - max)
    -        y(i, j) = res
    -        sum += res
    -        i += 1
    -      }
    -      i = 0
    -      while (i < x.rows) {
    -        y(i, j) /= sum
    -        i += 1
    -      }
    -      j += 1
    +    (0 until x.cols).foreach { j =>
    +      // subtract max value to prevent overflow during exp
    +      // does not affect correctness since we normalize right after
    +      val maxVal = Bmax(x(::, j))
    +      y(::, j) := breeze.numerics.exp(x(::, j) - maxVal)
    +      y(::, j) :/= Bsum(y(::, j))
    --- End diff --
    
    @feynmanliang Could you run some micro-benchmark on this function? I think this is the only place that might cause performance issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by brkyvz <gi...@git.apache.org>.

Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8648#discussion_r39191027
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -573,47 +582,27 @@ private[ml] class FeedForwardModel private(
           case _ =>
             throw new UnsupportedOperationException("Non-functional layer not supported at the top")
         }
    +    // backward pass (back-propagate deltas given errors)
         deltas(L) = new BDM[Double](0, 0)
         deltas(L - 1) = newE
         for (i <- (L - 2) to (0, -1)) {
           deltas(i) = layerModels(i + 1).prevDelta(deltas(i + 1), outputs(i + 1))
         }
    -    val grads = new Array[Array[Double]](layerModels.length)
    -    for (i <- 0 until layerModels.length) {
    -      val input = if (i==0) data else outputs(i - 1)
    -      grads(i) = layerModels(i).grad(deltas(i), input)
    +    // forward pass (forward-propagate gradients given inputs)
    +    val grads = layerModels.zipWithIndex.map { case (layer, i) =>
    +      val input = if (i == 0) data else outputs(i - 1)
    +      layer.grad(deltas(i), input)
         }
    -    // update cumGradient
    +    // update cumulative gradients
         val cumGradientArray = cumGradient.toArray
    -    var offset = 0
    -    // TODO: extract roll
    -    for (i <- 0 until grads.length) {
    -      val gradArray = grads(i)
    -      var k = 0
    -      while (k < gradArray.length) {
    -        cumGradientArray(offset + k) += gradArray(k)
    -        k += 1
    -      }
    -      offset += gradArray.length
    +    grads.flatten.zipWithIndex.foreach { case (newGrad, i) =>
    +      cumGradientArray(i) += newGrad
         }
         newError
       }
     
    -  // TODO: do we really need to copy the weights? they should be read-only
       override def weights(): Vector = {
    -    // TODO: extract roll
    -    var size = 0
    -    for (i <- 0 until layerModels.length) {
    -      size += layerModels(i).size
    -    }
    -    val array = new Array[Double](size)
    -    var offset = 0
    -    for (i <- 0 until layerModels.length) {
    -      val layerWeights = layerModels(i).weights().toArray
    -      System.arraycopy(layerWeights, 0, array, offset, layerWeights.length)
    -      offset += layerWeights.length
    -    }
    -    Vectors.dense(array)
    +    Vectors.dense(layerModels.flatMap(_.weights().toArray))
    --- End diff --
    
    I imagine System.arraycopy would be much faster and efficient here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by avulanov <gi...@git.apache.org>.

Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138679510
  
    @feynmanliang Thank you for reviewing the code! I made one pass. It seems that UFunc simplifies it a lot. However I am not sure about `.flatten` and `.flatMap` on array of large arrays. We need to perform performance comparison. Could you run the benchmark from https://github.com/avulanov/ann-benchmark before and after refactoring to see the difference?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by avulanov <gi...@git.apache.org>.

Github user avulanov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8648#discussion_r38969255
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -573,47 +582,27 @@ private[ml] class FeedForwardModel private(
           case _ =>
             throw new UnsupportedOperationException("Non-functional layer not supported at the top")
         }
    +    // backward pass (back-propagate deltas given errors)
         deltas(L) = new BDM[Double](0, 0)
         deltas(L - 1) = newE
         for (i <- (L - 2) to (0, -1)) {
           deltas(i) = layerModels(i + 1).prevDelta(deltas(i + 1), outputs(i + 1))
         }
    -    val grads = new Array[Array[Double]](layerModels.length)
    -    for (i <- 0 until layerModels.length) {
    -      val input = if (i==0) data else outputs(i - 1)
    -      grads(i) = layerModels(i).grad(deltas(i), input)
    +    // forward pass (forward-propagate gradients given inputs)
    +    val grads = layerModels.zipWithIndex.map { case (layer, i) =>
    +      val input = if (i == 0) data else outputs(i - 1)
    +      layer.grad(deltas(i), input)
         }
    -    // update cumGradient
    +    // update cumulative gradients
         val cumGradientArray = cumGradient.toArray
    -    var offset = 0
    -    // TODO: extract roll
    -    for (i <- 0 until grads.length) {
    -      val gradArray = grads(i)
    -      var k = 0
    -      while (k < gradArray.length) {
    -        cumGradientArray(offset + k) += gradArray(k)
    -        k += 1
    -      }
    -      offset += gradArray.length
    +    grads.flatten.zipWithIndex.foreach { case (newGrad, i) =>
    --- End diff --
    
    Flatten might be expensive for array of large arrays, is not it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by avulanov <gi...@git.apache.org>.

Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-140234035
  
    @feynmanliang I suggest using native BLAS for testing. It worth checking the impact of using UFunc as well. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138383081
  
      [Test build #42107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42107/consoleFull) for   PR 8648 at commit [`84f8bea`](https://github.com/apache/spark/commit/84f8bea0f4a3b24f8aae6eadd98d7166311f951d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138415827
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42113/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138415825
  
      [Test build #42113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42113/console) for   PR 8648 at commit [`22ba174`](https://github.com/apache/spark/commit/22ba174f1880fb99fe778bd2fafc54f0afe3ba01).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138414447
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138648745
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-139649708
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42348/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138637439
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-139649707
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138383214
  
      [Test build #42107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42107/console) for   PR 8648 at commit [`84f8bea`](https://github.com/apache/spark/commit/84f8bea0f4a3b24f8aae6eadd98d7166311f951d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138415609
  
      [Test build #42113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42113/consoleFull) for   PR 8648 at commit [`22ba174`](https://github.com/apache/spark/commit/22ba174f1880fb99fe778bd2fafc54f0afe3ba01).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by feynmanliang <gi...@git.apache.org>.

Github user feynmanliang commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-148893887
  
    @mengxr added benchmarks, can you make another pass when you have a chance


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by feynmanliang <gi...@git.apache.org>.

Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8648#discussion_r41700924
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -260,127 +263,126 @@ private[ann] trait ActivationFunction extends Serializable {
     }
     
     /**
    - * Implements in-place application of functions
    + * Implements in-place application of functions.
      */
     private[ann] object ActivationFunction {
     
    -  def apply(x: BDM[Double], y: BDM[Double], func: Double => Double): Unit = {
    -    var i = 0
    -    while (i < x.rows) {
    -      var j = 0
    -      while (j < x.cols) {
    -        y(i, j) = func(x(i, j))
    -        j += 1
    -      }
    -      i += 1
    -    }
    +  def apply(x: BDM[Double], y: BDM[Double], func: UFunc with MappingUFunc)(
    +      implicit impl: func.Impl[BDM[Double], BDM[Double]]): Unit = {
    +    y := func(x)
       }
     
       def apply(
    -    x1: BDM[Double],
    -    x2: BDM[Double],
    -    y: BDM[Double],
    -    func: (Double, Double) => Double): Unit = {
    -    var i = 0
    -    while (i < x1.rows) {
    -      var j = 0
    -      while (j < x1.cols) {
    -        y(i, j) = func(x1(i, j), x2(i, j))
    -        j += 1
    -      }
    -      i += 1
    -    }
    +      x1: BDM[Double],
    +      x2: BDM[Double],
    +      y: BDM[Double],
    +      func: UFunc with MappingUFunc)(
    +      implicit impl: func.Impl2[BDM[Double], BDM[Double], BDM[Double]]): Unit = {
    +    y := func(x1, x2)
       }
     }
     
     /**
    - * Implements SoftMax activation function
    + * Implements Softmax activation function.
      */
     private[ann] class SoftmaxFunction extends ActivationFunction {
       override def eval(x: BDM[Double], y: BDM[Double]): Unit = {
    -    var j = 0
    -    // find max value to make sure later that exponent is computable
    -    while (j < x.cols) {
    -      var i = 0
    -      var max = Double.MinValue
    -      while (i < x.rows) {
    -        if (x(i, j) > max) {
    -          max = x(i, j)
    -        }
    -        i += 1
    -      }
    -      var sum = 0.0
    -      i = 0
    -      while (i < x.rows) {
    -        val res = Math.exp(x(i, j) - max)
    -        y(i, j) = res
    -        sum += res
    -        i += 1
    -      }
    -      i = 0
    -      while (i < x.rows) {
    -        y(i, j) /= sum
    -        i += 1
    -      }
    -      j += 1
    +    (0 until x.cols).foreach { j =>
    +      // subtract max value to prevent overflow during exp
    +      // does not affect correctness since we normalize right after
    +      val maxVal = Bmax(x(::, j))
    +      y(::, j) := breeze.numerics.exp(x(::, j) - maxVal)
    +      y(::, j) :/= Bsum(y(::, j))
    --- End diff --
    
    @mengxr [Local benchmarks here](https://gist.github.com/feynmanliang/bc64b82a1258c4e86b9a). Performance is more or less equivalent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8648


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138414336
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138438082
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138447486
  
      [Test build #42121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42121/console) for   PR 8648 at commit [`abdba81`](https://github.com/apache/spark/commit/abdba81135154aab3065fe8316ca51b1cd03885f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138648747
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42137/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138438023
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138439943
  
      [Test build #42121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42121/consoleFull) for   PR 8648 at commit [`abdba81`](https://github.com/apache/spark/commit/abdba81135154aab3065fe8316ca51b1cd03885f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138447506
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138382940
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138382936
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-139635124
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by feynmanliang <gi...@git.apache.org>.

Github user feynmanliang commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-139633496
  
    @avulanov The benchmarking code is written against a WIP implementation; I sent you a PR for bringing it up to date.
    
    LBFGS is taking significantly long time on my machine:
    ![image](https://cloud.githubusercontent.com/assets/990069/9823779/512a0e28-587d-11e5-8c09-205fb4c0caa9.png)
    
    I've removed the `flatten/flatMap` changes from this PR and will save them for when I have more time to properly perf test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138383217
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42107/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138447507
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42121/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-139636610
  
      [Test build #42348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42348/consoleFull) for   PR 8648 at commit [`f56e2d5`](https://github.com/apache/spark/commit/f56e2d5301f10f23c985defa63a4461a9e8d0f1b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138648577
  
      [Test build #42137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42137/console) for   PR 8648 at commit [`f6731ff`](https://github.com/apache/spark/commit/f6731ff10a5bf671d1d727cf590bccd7fb6e13c1).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class BlockFetchException(messages: String, throwable: Throwable)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-139635181
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138638322
  
      [Test build #42137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42137/consoleFull) for   PR 8648 at commit [`f6731ff`](https://github.com/apache/spark/commit/f6731ff10a5bf671d1d727cf590bccd7fb6e13c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-139649524
  
      [Test build #42348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42348/console) for   PR 8648 at commit [`f56e2d5`](https://github.com/apache/spark/commit/f56e2d5301f10f23c985defa63a4461a9e8d0f1b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8648#issuecomment-138637405
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10478][ML] Performance, organization, a...

Posted by avulanov <gi...@git.apache.org>.

Github user avulanov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8648#discussion_r38969473
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
    @@ -573,47 +582,27 @@ private[ml] class FeedForwardModel private(
           case _ =>
             throw new UnsupportedOperationException("Non-functional layer not supported at the top")
         }
    +    // backward pass (back-propagate deltas given errors)
         deltas(L) = new BDM[Double](0, 0)
         deltas(L - 1) = newE
         for (i <- (L - 2) to (0, -1)) {
           deltas(i) = layerModels(i + 1).prevDelta(deltas(i + 1), outputs(i + 1))
         }
    -    val grads = new Array[Array[Double]](layerModels.length)
    -    for (i <- 0 until layerModels.length) {
    -      val input = if (i==0) data else outputs(i - 1)
    -      grads(i) = layerModels(i).grad(deltas(i), input)
    +    // forward pass (forward-propagate gradients given inputs)
    +    val grads = layerModels.zipWithIndex.map { case (layer, i) =>
    +      val input = if (i == 0) data else outputs(i - 1)
    +      layer.grad(deltas(i), input)
         }
    -    // update cumGradient
    +    // update cumulative gradients
         val cumGradientArray = cumGradient.toArray
    -    var offset = 0
    -    // TODO: extract roll
    -    for (i <- 0 until grads.length) {
    -      val gradArray = grads(i)
    -      var k = 0
    -      while (k < gradArray.length) {
    -        cumGradientArray(offset + k) += gradArray(k)
    -        k += 1
    -      }
    -      offset += gradArray.length
    +    grads.flatten.zipWithIndex.foreach { case (newGrad, i) =>
    +      cumGradientArray(i) += newGrad
         }
         newError
       }
     
    -  // TODO: do we really need to copy the weights? they should be read-only
       override def weights(): Vector = {
    -    // TODO: extract roll
    -    var size = 0
    -    for (i <- 0 until layerModels.length) {
    -      size += layerModels(i).size
    -    }
    -    val array = new Array[Double](size)
    -    var offset = 0
    -    for (i <- 0 until layerModels.length) {
    -      val layerWeights = layerModels(i).weights().toArray
    -      System.arraycopy(layerWeights, 0, array, offset, layerWeights.length)
    -      offset += layerWeights.length
    -    }
    -    Vectors.dense(array)
    +    Vectors.dense(layerModels.flatMap(_.weights().toArray))
    --- End diff --
    
    Ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org