You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Amit Sela <am...@gmail.com> on 2016/11/29 21:16:52 UTC

Does MapWithState follow with a shuffle ?

Hi all,

I've been digging into MapWithState code (branch 1.6), and I came across
the compute
<https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159>
implementation in *InternalMapWithStateDStream*.

Looking at the defined partitioner
<https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L112>
it
looks like it could be different from the parent RDD partitioner (if
defaultParallelism() changed for instance, or input partitioning was
smaller to begin with), which will eventually create
<https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L537>
a ShuffleRDD.

Am I reading this right ?

Thanks,
Amit

Re: Does MapWithState follow with a shuffle ?

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
Right. And you can specify the partitioner via
"StateSpec.partitioner(partitioner: Partitioner)".

On Tue, Nov 29, 2016 at 1:16 PM, Amit Sela <am...@gmail.com> wrote:

> Hi all,
>
> I've been digging into MapWithState code (branch 1.6), and I came across
> the compute
> <https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159>
> implementation in *InternalMapWithStateDStream*.
>
> Looking at the defined partitioner
> <https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L112> it
> looks like it could be different from the parent RDD partitioner (if
> defaultParallelism() changed for instance, or input partitioning was
> smaller to begin with), which will eventually create
> <https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L537>
> a ShuffleRDD.
>
> Am I reading this right ?
>
> Thanks,
> Amit
>