You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by YPares <gi...@git.apache.org> on 2016/02/05 11:27:03 UTC

[GitHub] spark pull request: PairRDDFunctions.reduceByKey should be stated ...

GitHub user YPares opened a pull request:

    https://github.com/apache/spark/pull/11091

    PairRDDFunctions.reduceByKey should be stated as requiring a commutative binary op

    According to http://stackoverflow.com/questions/35205107/spark-difference-of-semantics-between-reduce-and-reducebykey , PairRDDFunctions.reduceByKey requires, just like RDD.reduce, an associative AND commutative binary operator.
    This wasn't stated in the docs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/YPares/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11091
    
----
commit b14f702ac8c3a5b018715b6763cfeda99000b292
Author: Yves Parès (Ywen) <yv...@gmail.com>
Date:   2016-02-05T10:22:59Z

    Small fix in PairRDDFunctions.reduceByKey
    
    Make the doc more coherent wrt RDD.reduce

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-184684769
  
    You can close this @YPares ; I created https://github.com/apache/spark/pull/11217


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: PairRDDFunctions.reduceByKey should be stated ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-180286394
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-183625276
  
    @YPares are you going to update this or should I continue it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-180287413
  
    That's fine though this is pretty much by definition for reduce. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-180836001
  
    Yeah I think there similar statements about an 'associative' operation that really mean 'associative and commutative'. There are more occurrences in this file. There are some in Accumulator.scala, JavaPairRDD.scala, JavaRDDLike.scala, JavaDStreamLike.scala, JavaPairDStream.scala, DStream.scala, PairDStreamFunctions.scala, rdd.py, dstream.py, pairRDD.R. In each


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by YPares <gi...@git.apache.org>.
Github user YPares closed the pull request at:

    https://github.com/apache/spark/pull/11091


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11091#discussion_r52072999
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -300,7 +300,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
       }
     
       /**
    -   * Merge the values for each key using an associative reduce function. This will also perform
    +   * Merge the values for each key using an associative and commutative binary operator. This will also perform
    --- End diff --
    
    this line exceeds 100 chars and will fail style checker


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-183921709
  
    See my previous message @YPares -- I think I found all the other ones. You're welcome to address them so we can merge your PR, but I can do it too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by YPares <gi...@git.apache.org>.
Github user YPares commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-180289035
  
    @srowen I agree but the difference between the documentation of `reduce` and `reduceByKey` seemed to imply a difference in behaviour.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-180554675
  
    Yea we should make them consistent. Are there more inconsistencies you find?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core] Doc: PairRDDFunctions.reduceByKey shoul...

Posted by YPares <gi...@git.apache.org>.
Github user YPares commented on the pull request:

    https://github.com/apache/spark/pull/11091#issuecomment-183919690
  
    @srowen Oh, sorry, I was waiting a bit to see if I found other inconsistencies.
    But none came to mind, so yes you may continue it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org