You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by numbnut <gi...@git.apache.org> on 2014/10/23 12:08:56 UTC

[GitHub] spark pull request: MLlib, exposing special rdd functions to the p...

GitHub user numbnut opened a pull request:

    https://github.com/apache/spark/pull/2907

    MLlib, exposing special rdd functions to the public

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/numbnut/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2907
    
----
commit b3d8945d6fa0bc28b90a8409ced29fd78b34e752
Author: Niklas Wilcke <1w...@informatik.uni-hamburg.de>
Date:   2014-10-23T09:43:27Z

    expose mllib specific rdd functions to the public

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61431908
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by numbnut <gi...@git.apache.org>.
Github user numbnut commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2907#discussion_r19663363
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/rdd/RDDFunctionsSuite.scala ---
    @@ -42,9 +42,14 @@ class RDDFunctionsSuite extends FunSuite with LocalSparkContext {
         val data = Seq(Seq(1, 2, 3), Seq.empty[Int], Seq(4), Seq.empty[Int], Seq(5, 6, 7))
         val rdd = sc.parallelize(data, data.length).flatMap(s => s)
         assert(rdd.partitions.size === data.length)
    -    val sliding = rdd.sliding(3)
    -    val expected = data.flatMap(x => x).sliding(3).toList
    -    assert(sliding.collect().toList === expected)
    +    val sliding = rdd.sliding(3).collect().toList 
    +    val expected = data.flatMap(x => x).sliding(3).map(_.toArray).toList
    +    // scalatest does not support multi dimensional array comparison
    --- End diff --
    
    Thanks for the much better solution. That works just fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61431907
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-60459079
  
    Sounds good. @numbnut Could you update the PR and change the following?
    
    1) add @DeveloperApi to RDDFunctions
    2) change the return type of `sliding` to `RDD[Array[T]]` and update the code in other places
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-60454077
  
    Cool, you would know best if it's ready for external use. Looks good on unit tests.
    
    What if it returned `RDD[Array[T]]`? I experimented briefly with making that change and it looked like it would work out fairly simple. That's Java friendlier?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61138697
  
    Agree that using aggregate vs. treeAggregate depends on the computation, reduction function -- but I don't think its specific to MLLib per se. Any Spark application that has CPU intensive code can benefit from treeAggregate. My view is that we shouldn't replace `aggregate` with this -- we should just allow users to choose the right aggregation strategy based on what they need 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61684665
  
    LGTM. There is a minor issue with @DeveloperApi annotation, where we also need `:: DeveloperApi ::` at the beginning of the doc. I will fix that later. I've merged this into master and branch-1.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by numbnut <gi...@git.apache.org>.
Github user numbnut commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61619953
  
    I'm sorry for breaking the tests. I thought they had run cleanly on my machine but I found the mistake and corrected it. Can't explain that.
    
    I also changed "private[mllib]" to "@DeveloperApi" to make it visible in the docs.
    
    Am I supposed to rebase to the branch-1.2 or what can I do to simplify the merge process?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-60877866
  
    Yes - treeAggregate is very useful -- In fact I was going to suggest moving it to the core RDD API. Any reasons to not do that ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61434214
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22784/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61434211
  
      [Test build #22784 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22784/consoleFull) for   PR 2907 at commit [`0840e6e`](https://github.com/apache/spark/commit/0840e6e28d442df778dfd2f02ed9c509258b8237).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class RDDFunctions[T: ClassTag](self: RDD[T]) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61628112
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22875/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2907#discussion_r19642074
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/rdd/RDDFunctionsSuite.scala ---
    @@ -42,9 +42,14 @@ class RDDFunctionsSuite extends FunSuite with LocalSparkContext {
         val data = Seq(Seq(1, 2, 3), Seq.empty[Int], Seq(4), Seq.empty[Int], Seq(5, 6, 7))
         val rdd = sc.parallelize(data, data.length).flatMap(s => s)
         assert(rdd.partitions.size === data.length)
    -    val sliding = rdd.sliding(3)
    -    val expected = data.flatMap(x => x).sliding(3).toList
    -    assert(sliding.collect().toList === expected)
    +    val sliding = rdd.sliding(3).collect().toList 
    +    val expected = data.flatMap(x => x).sliding(3).map(_.toArray).toList
    +    // scalatest does not support multi dimensional array comparison
    --- End diff --
    
    You can try nested Seq:
    
    ~~~
    val sliding = rdd.sliding(3).collect().toSeq.map(_.toSeq)
    val expected = data.flatMap(x => x).sliding(3).map(_.toSeq)
    assert(sliding === expected)
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-60218336
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61619311
  
      [Test build #22875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22875/consoleFull) for   PR 2907 at commit [`7f7c767`](https://github.com/apache/spark/commit/7f7c767f4341b7c043d015196ae46493db4b937a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-60393813
  
    At best, this would become an "Experimental" API and marked as such, and need a unit test or two more maybe. What's the use case that makes it worth committing to support these externally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61628106
  
      [Test build #22875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22875/consoleFull) for   PR 2907 at commit [`7f7c767`](https://github.com/apache/spark/commit/7f7c767f4341b7c043d015196ae46493db4b937a).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class RDDFunctions[T: ClassTag](self: RDD[T]) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by numbnut <gi...@git.apache.org>.
Github user numbnut commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2907#discussion_r19663373
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala ---
    @@ -45,15 +45,16 @@ class SlidingRDDPartition[T](val idx: Int, val prev: Partition, val tail: Seq[T]
      */
     private[mllib]
     class SlidingRDD[T: ClassTag](@transient val parent: RDD[T], val windowSize: Int)
    -  extends RDD[Seq[T]](parent) {
    +  extends RDD[Array[T]](parent) {
     
       require(windowSize > 1, s"Window size must be greater than 1, but got $windowSize.")
     
    -  override def compute(split: Partition, context: TaskContext): Iterator[Seq[T]] = {
    +  override def compute(split: Partition, context: TaskContext): Iterator[Array[T]] = {
         val part = split.asInstanceOf[SlidingRDDPartition[T]]
    -    (firstParent[T].iterator(part.prev, context) ++ part.tail)
    +     (firstParent[T].iterator(part.prev, context) ++ part.tail)
    --- End diff --
    
    Sorry! I fixed that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by numbnut <gi...@git.apache.org>.
Github user numbnut commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61110597
  
    I updated the pull request like proposed. Please review it carefully because I'm new to Spark and Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61137430
  
    @shivaram For common RDD operations in core/sql, each task is small (including the result) and there are more partitions than executors. `treeAggregate` creates a shuffle stage and holds data there, while `aggregate` can start working when partial results are available.
    
    @pwendell Are you comfortable with adding those RDD functions to core? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2907#discussion_r19642063
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala ---
    @@ -45,15 +45,16 @@ class SlidingRDDPartition[T](val idx: Int, val prev: Partition, val tail: Seq[T]
      */
     private[mllib]
     class SlidingRDD[T: ClassTag](@transient val parent: RDD[T], val windowSize: Int)
    -  extends RDD[Seq[T]](parent) {
    +  extends RDD[Array[T]](parent) {
     
       require(windowSize > 1, s"Window size must be greater than 1, but got $windowSize.")
     
    -  override def compute(split: Partition, context: TaskContext): Iterator[Seq[T]] = {
    +  override def compute(split: Partition, context: TaskContext): Iterator[Array[T]] = {
         val part = split.asInstanceOf[SlidingRDDPartition[T]]
    -    (firstParent[T].iterator(part.prev, context) ++ part.tail)
    +     (firstParent[T].iterator(part.prev, context) ++ part.tail)
    --- End diff --
    
    2-space indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2907#discussion_r19717031
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala ---
    @@ -29,7 +30,7 @@ import org.apache.spark.util.Utils
      * Machine learning specific RDD functions.
      */
     private[mllib]
    --- End diff --
    
    I think it is necessary to replace `private[mllib]` with `@DeveloperApi`. Otherwise, the methods under `RDDFunctions` won't show up in the generated doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by numbnut <gi...@git.apache.org>.
Github user numbnut commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2907#discussion_r19662046
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala ---
    @@ -29,7 +30,7 @@ import org.apache.spark.util.Utils
      * Machine learning specific RDD functions.
      */
     private[mllib]
    --- End diff --
    
    In the tests it works fine. I just wanted to expose as less as possible.
    Shall I replace "private[mllib]" with "@DeveloperApi"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-61432096
  
      [Test build #22784 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22784/consoleFull) for   PR 2907 at commit [`0840e6e`](https://github.com/apache/spark/commit/0840e6e28d442df778dfd2f02ed9c509258b8237).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-60445965
  
    @srowen Unit tests are in 
    
    https://github.com/numbnut/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/rdd/RDDFunctionsSuite.scala
    
    I think we can mark it `@DeveloperApi`. I'm a little concerned about the return type of `sliding`, which is not Java-friendly. Any suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by javadba <gi...@git.apache.org>.
Github user javadba commented on the pull request:

    https://github.com/apache/spark/pull/2907#issuecomment-60875650
  
    RE: use case.  We are considering to  use the treeAggregate function within the implementation of SpectralClustering. In addition it was noted that the EigenvalueDecomposition.symmetricEigs is private: it is likely we would like to use that too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2907#discussion_r19642054
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala ---
    @@ -29,7 +30,7 @@ import org.apache.spark.util.Utils
      * Machine learning specific RDD functions.
      */
     private[mllib]
    --- End diff --
    
    Does it work if we still leave the class as `private[mllib]`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2907


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org