You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Lin Zhao <li...@exabeam.com> on 2016/01/17 20:00:15 UTC

Spark Streaming: Does mapWithState implicitly partition the dsteram?

When the state is passed to the task that handles a mapWithState for a particular key, if the key is distributed, it seems extremely difficult to coordinate and synchronise the state. Is there a partition by key before a mapWithState? If not what exactly is the execution model?

Thanks,

Lin


Re: Spark Streaming: Does mapWithState implicitly partition the dsteram?

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
mapWithState uses HashPartitioner by default. You can use
"StateSpec.partitioner" to set your custom partitioner.

On Sun, Jan 17, 2016 at 11:00 AM, Lin Zhao <li...@exabeam.com> wrote:

> When the state is passed to the task that handles a mapWithState for a
> particular key, if the key is distributed, it seems extremely difficult to
> coordinate and synchronise the state. Is there a partition by key before a
> mapWithState? If not what exactly is the execution model?
>
> Thanks,
>
> Lin
>
>