You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/06/29 18:55:00 UTC

Interesting Stateful Streaming question

Hi All,

Here is a problem and I am wondering if Spark Streaming is the right tool
for this ?

I have stream of messages m1, m2, m3....and each of those messages can be
in state s1, s2, s3,....sn (you can imagine the number of states are about
100) and I want to compute some metrics that visit all the states from s1
to sn but these state transitions can happen at indefinite amount of
time. A simple example of that would be count all messages that visited
state s1, s2, s3. Other words, the transition function should know that say
message m1 had visited state s1 and s2 but not s3 yet and once the message
m1 visits s3 increment the counter +=1 .

If it makes anything easier I can say a message has to visit s1 before
visiting s2 and s2 before visiting s3 and so on but would like to know both
with and without order.

Thanks!

Re: Interesting Stateful Streaming question

Posted by Michael Armbrust <mi...@databricks.com>.
This does sound like a good use case for that feature.  Note that Spark
2.2. adds a similar [flat]MapGroupsWithState operation to structured
streaming.  Stay tuned for a blog post on that!

On Thu, Jun 29, 2017 at 6:11 PM, kant kodali <ka...@gmail.com> wrote:

> Is mapWithState an answer for this ? https://databricks.com/blog/
> 2016/02/01/faster-stateful-stream-processing-in-apache-
> spark-streaming.html
>
> On Thu, Jun 29, 2017 at 11:55 AM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi All,
>>
>> Here is a problem and I am wondering if Spark Streaming is the right tool
>> for this ?
>>
>> I have stream of messages m1, m2, m3....and each of those messages can be
>> in state s1, s2, s3,....sn (you can imagine the number of states are about
>> 100) and I want to compute some metrics that visit all the states from s1
>> to sn but these state transitions can happen at indefinite amount of
>> time. A simple example of that would be count all messages that visited
>> state s1, s2, s3. Other words, the transition function should know that say
>> message m1 had visited state s1 and s2 but not s3 yet and once the message
>> m1 visits s3 increment the counter +=1 .
>>
>> If it makes anything easier I can say a message has to visit s1 before
>> visiting s2 and s2 before visiting s3 and so on but would like to know both
>> with and without order.
>>
>> Thanks!
>>
>>
>

Re: Interesting Stateful Streaming question

Posted by kant kodali <ka...@gmail.com>.
Is mapWithState an answer for this ?
https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html

On Thu, Jun 29, 2017 at 11:55 AM, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> Here is a problem and I am wondering if Spark Streaming is the right tool
> for this ?
>
> I have stream of messages m1, m2, m3....and each of those messages can be
> in state s1, s2, s3,....sn (you can imagine the number of states are about
> 100) and I want to compute some metrics that visit all the states from s1
> to sn but these state transitions can happen at indefinite amount of
> time. A simple example of that would be count all messages that visited
> state s1, s2, s3. Other words, the transition function should know that say
> message m1 had visited state s1 and s2 but not s3 yet and once the message
> m1 visits s3 increment the counter +=1 .
>
> If it makes anything easier I can say a message has to visit s1 before
> visiting s2 and s2 before visiting s3 and so on but would like to know both
> with and without order.
>
> Thanks!
>
>