You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "François Garillot (JIRA)" <ji...@apache.org> on 2015/08/11 14:57:45 UTC

[jira] [Issue Comment Deleted] (SPARK-9819) reduceBy(KeyAnd)Window should specify which is the accumulator argument in invReduceFunc

     [ https://issues.apache.org/jira/browse/SPARK-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

François Garillot updated SPARK-9819:
-------------------------------------
    Comment: was deleted

(was: https://github.com/apache/spark/pull/8103)

> reduceBy(KeyAnd)Window should specify which is the accumulator argument in invReduceFunc
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-9819
>                 URL: https://issues.apache.org/jira/browse/SPARK-9819
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, Streaming
>    Affects Versions: 1.4.1
>         Environment: All
>            Reporter: François Garillot
>            Priority: Minor
>              Labels: documentation, streaming
>
> {{reduceByWindow}} has an optional {{invReduceFunc}} argument which allows the reduction to be performed incrementally.
> The incremental reduction [performed in {{ReducedWindowedDStream}}|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala#L157] only depends on the reduction and its inverse function being associative (as shown by the reduce applied to {{oldValues}}), but does not require those functions to be commutative.
> In particular, if the inverse reduction is the non-commutative, but associative substraction (e.g. what you're computing is a running sum), it's necessary to know that the intermediate result (to be substracted from) is the first argument of {{invReduceFunc}} and that the second argument is the old value to substract.
> It's only in the commutative case that we don't care which is which.
> The Scaladoc for the various overloads of {{reduceByWindow}} should let the user know which is the accumulator, and which is the old value. A concise, unambiguous way to state this is to write an inversion law in the Scaladoc:
> {{invReduceFunc(reduceFunc(x, y), y) = x}}
> We should also remind the user that he should use associative reduction (& inverse reduction) functions, since the computation makes that assumption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org