You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jagadeesan A S (JIRA)" <ji...@apache.org> on 2016/08/10 12:30:20 UTC

[jira] [Commented] (SPARK-12844) Spark documentation should be more precise about the algebraic properties of functions in various transformations

    [ https://issues.apache.org/jira/browse/SPARK-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415194#comment-15415194 ] 

Jagadeesan A S commented on SPARK-12844:
----------------------------------------

Started working on this.

> Spark documentation should be more precise about the algebraic properties of functions in various transformations
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12844
>                 URL: https://issues.apache.org/jira/browse/SPARK-12844
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>            Reporter: Jimmy Lin
>            Priority: Minor
>
> Spark documentation should be more precise about the algebraic properties of functions in various transformations. The way the current documentation is written is potentially confusing. For example, in Spark 1.6, the scaladoc for reduce in RDD says:
> > Reduces the elements of this RDD using the specified commutative and associative binary operator.
> This is precise and accurate. In the documentation of reduceByKey in PairRDDFunctions, on the other hand, it says:
> > Merge the values for each key using an associative reduce function.
> To be more precise, this function must also be commutative in order for the computation to be correct. Writing commutative for reduce and not reduceByKey gives the false impression that the function in the latter does not need to be commutative.
> The same applies to aggregateByKey. To be precise, both seqOp and combOp need to be associative (mentioned) AND commutative (not mentioned) in order for the computation to be correct. It would be desirable to fix these inconsistencies throughout the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org