You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gaurav Jain <ja...@student.ethz.ch> on 2014/06/16 20:01:27 UTC

Accessing the per-key state maintained by updateStateByKey for transformation of JavaPairDStream

Hello Spark Streaming Experts

I have a use-case, where I have a bunch of log-entries coming in, say every
10 seconds (Batch-interval). I create a JavaPairDStream[K,V] from these
log-entries. Now, there are two things I want to do with this
JavaPairDStream:

1. Use key-dependent state (updated by updateStateByKey) to apply a
transformation function on the JavaPairDStream[K, V]. I know that we get a
JavaPairDStream[K, S] as return value of updateStateByKey. However, I can't
possibly pass a JavaPairDStream to a transformation function, nor can I
convert JavaPairDStream[K,S]  to let's say a HashMap<K,S> (Or is there a way
to do this?). Even if I could convert it to a HashMap<K,S>, could I really
pass it to a transformation function, since this HashMap<K,S> changes after
every batch computation?

2. Update key-dependent state using Iterable<V>: This should be easily
doable using updateStateByKey.

In a nutshell, how do I access the very state updated by updateStateByKey
for applying let's say a map function on the JavaPairDStream[K,V]. Note that
I am not using any sliding windows at all. Just plain batches. 

Thanks
Gaurav



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Accessing-the-per-key-state-maintained-by-updateStateByKey-for-transformation-of-JavaPairDStream-tp7680.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.