You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by DandyDev <de...@gmail.com> on 2016/10/11 11:28:32 UTC

Can mapWithState state func be called every batchInterval?

Hi there,

I've built a Spark Streaming app that accepts certain events from Kafka, and
I want to keep some state between the events. So I've successfully used
mapWithState for that. The problem is, that I want the state for keys to be
updated on every batchInterval, because "lack" of events is also significant
to the use case. This doesn't seem possible with mapWithState, unless I'm
missing something.

Previously I looked at updateStateByKey, which says:
> In every batch, Spark will apply the state update function for all
> existing keys, regardless of whether they have new data in a batch or not.

That is what I want, however, I've seen several tutorials/blog posts where
the advise was not to use updateStateByKey anymore, and use mapWithState
instead.

So my questions:

- Can mapWithState state function be called every batchInterval, even when
no events exist for that interval?
- If not, is it okay to use updateStateByKey instead? Or will it be
deprecated in the near future?
- If mapWithState doesn't support my need, is there another way to
accomplish the goal of updating state every batchInterval, that still uses
mapWithState, together with some other mechanism?

Thanks in advance!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-mapWithState-state-func-be-called-every-batchInterval-tp27877.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Can mapWithState state func be called every batchInterval?

Posted by manasdebashiskar <po...@gmail.com>.
Actually each element of mapwithstate has a time out component. You can write
a function to "treat" your time out.

You can match it with your batch size and do fun stuff when the batch ends.

People do session management with the same approach.
When activity is registered the session is refreshed, and the session is
deleted("one way to treat it") when time out happens.

..Mana






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-mapWithState-state-func-be-called-every-batchInterval-tp27877p27898.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Can mapWithState state func be called every batchInterval?

Posted by Cody Koeninger <co...@koeninger.org>.
What are you expecting?  If you want to update every key on every
batch, it's going to be linear on the number of keys... there's no
real way around that.

On Tue, Oct 11, 2016 at 9:49 AM, Daan Debie <de...@gmail.com> wrote:
> That's nice and all, but I'd rather have a solution involving mapWithState
> of course :) I'm just wondering why it doesn't support this use case yet.
>
> On Tue, Oct 11, 2016 at 3:41 PM, Cody Koeninger <co...@koeninger.org> wrote:
>>
>> They're telling you not to use the old function because it's linear on the
>> total number of keys, not keys in the batch, so it's slow.
>>
>> But if that's what you really want, go ahead and do it, and see if it
>> performs well enough.
>>
>>
>> On Oct 11, 2016 6:28 AM, "DandyDev" <de...@gmail.com> wrote:
>>
>> Hi there,
>>
>> I've built a Spark Streaming app that accepts certain events from Kafka,
>> and
>> I want to keep some state between the events. So I've successfully used
>> mapWithState for that. The problem is, that I want the state for keys to
>> be
>> updated on every batchInterval, because "lack" of events is also
>> significant
>> to the use case. This doesn't seem possible with mapWithState, unless I'm
>> missing something.
>>
>> Previously I looked at updateStateByKey, which says:
>> > In every batch, Spark will apply the state update function for all
>> > existing keys, regardless of whether they have new data in a batch or
>> > not.
>>
>> That is what I want, however, I've seen several tutorials/blog posts where
>> the advise was not to use updateStateByKey anymore, and use mapWithState
>> instead.
>>
>> So my questions:
>>
>> - Can mapWithState state function be called every batchInterval, even when
>> no events exist for that interval?
>> - If not, is it okay to use updateStateByKey instead? Or will it be
>> deprecated in the near future?
>> - If mapWithState doesn't support my need, is there another way to
>> accomplish the goal of updating state every batchInterval, that still uses
>> mapWithState, together with some other mechanism?
>>
>> Thanks in advance!
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Can-mapWithState-state-func-be-called-every-batchInterval-tp27877.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Can mapWithState state func be called every batchInterval?

Posted by Daan Debie <de...@gmail.com>.
That's nice and all, but I'd rather have a solution involving mapWithState
of course :) I'm just wondering why it doesn't support this use case yet.

On Tue, Oct 11, 2016 at 3:41 PM, Cody Koeninger <co...@koeninger.org> wrote:

> They're telling you not to use the old function because it's linear on the
> total number of keys, not keys in the batch, so it's slow.
>
> But if that's what you really want, go ahead and do it, and see if it
> performs well enough.
>
> On Oct 11, 2016 6:28 AM, "DandyDev" <de...@gmail.com> wrote:
>
> Hi there,
>
> I've built a Spark Streaming app that accepts certain events from Kafka,
> and
> I want to keep some state between the events. So I've successfully used
> mapWithState for that. The problem is, that I want the state for keys to be
> updated on every batchInterval, because "lack" of events is also
> significant
> to the use case. This doesn't seem possible with mapWithState, unless I'm
> missing something.
>
> Previously I looked at updateStateByKey, which says:
> > In every batch, Spark will apply the state update function for all
> > existing keys, regardless of whether they have new data in a batch or
> not.
>
> That is what I want, however, I've seen several tutorials/blog posts where
> the advise was not to use updateStateByKey anymore, and use mapWithState
> instead.
>
> So my questions:
>
> - Can mapWithState state function be called every batchInterval, even when
> no events exist for that interval?
> - If not, is it okay to use updateStateByKey instead? Or will it be
> deprecated in the near future?
> - If mapWithState doesn't support my need, is there another way to
> accomplish the goal of updating state every batchInterval, that still uses
> mapWithState, together with some other mechanism?
>
> Thanks in advance!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Can-mapWithState-state-func-be-called-
> every-batchInterval-tp27877.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>

Re: Can mapWithState state func be called every batchInterval?

Posted by Cody Koeninger <co...@koeninger.org>.
They're telling you not to use the old function because it's linear on the
total number of keys, not keys in the batch, so it's slow.

But if that's what you really want, go ahead and do it, and see if it
performs well enough.

On Oct 11, 2016 6:28 AM, "DandyDev" <de...@gmail.com> wrote:

Hi there,

I've built a Spark Streaming app that accepts certain events from Kafka, and
I want to keep some state between the events. So I've successfully used
mapWithState for that. The problem is, that I want the state for keys to be
updated on every batchInterval, because "lack" of events is also significant
to the use case. This doesn't seem possible with mapWithState, unless I'm
missing something.

Previously I looked at updateStateByKey, which says:
> In every batch, Spark will apply the state update function for all
> existing keys, regardless of whether they have new data in a batch or not.

That is what I want, however, I've seen several tutorials/blog posts where
the advise was not to use updateStateByKey anymore, and use mapWithState
instead.

So my questions:

- Can mapWithState state function be called every batchInterval, even when
no events exist for that interval?
- If not, is it okay to use updateStateByKey instead? Or will it be
deprecated in the near future?
- If mapWithState doesn't support my need, is there another way to
accomplish the goal of updating state every batchInterval, that still uses
mapWithState, together with some other mechanism?

Thanks in advance!



--
View this message in context: http://apache-spark-user-list.
1001560.n3.nabble.com/Can-mapWithState-state-func-be-
called-every-batchInterval-tp27877.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org