You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/08/11 14:38:46 UTC
[jira] [Updated] (SPARK-9819) reduceBy(KeyAnd)Window should specify
which is the accumulator argument in invReduceFunc
[ https://issues.apache.org/jira/browse/SPARK-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-9819:
-----------------------------
Affects Version/s: (was: 1.5.0)
1.4.1
Priority: Minor (was: Major)
Component/s: Documentation
Issue Type: Improvement (was: Bug)
> reduceBy(KeyAnd)Window should specify which is the accumulator argument in invReduceFunc
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-9819
> URL: https://issues.apache.org/jira/browse/SPARK-9819
> Project: Spark
> Issue Type: Improvement
> Components: Documentation, Streaming
> Affects Versions: 1.4.1
> Environment: All
> Reporter: François Garillot
> Priority: Minor
> Labels: documentation, streaming
>
> {{reduceByWindow}} has an optional {{invReduceFunc}} argument which allows the reduction to be performed incrementally.
> The incremental reduction [performed in {{ReducedWindowedDStream}}|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala#L157] only depends on the reduction and its inverse function being associative (as shown by the reduce applied to {{oldValues}}), but does not require those functions to be commutative.
> In particular, if the inverse reduction is the non-commutative, but associative substraction (e.g. what you're computing is a running sum), it's necessary to know that the intermediate result (to be substracted from) is the first argument of {{invReduceFunc}} and that the second argument is the old value to substract.
> It's only in the commutative case that we don't care which is which.
> The Scaladoc for the various overloads of {{reduceByWindow}} should let the user know which is the accumulator, and which is the old value. A concise, unambiguous way to state this is to write an inversion law in the Scaladoc:
> {{invReduceFunc(reduceFunc(x, y), y) = x}}
> We should also remind the user that he should use associative reduction (& inverse reduction) functions, since the computation makes that assumption.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org