You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by JayeshLalwani <Ja...@capitalone.com> on 2017/05/03 22:18:50 UTC

Re: In-order processing using spark streaming

Option A

If you can get all the messages in a session into the same Spark partition,
you can use df.mapWithPartition to process the whole partition. This will
allow you to control the order in which the messages are processed within
the partition.
This will work if messages are posted in Kafka in order and are guaranteed
by Kafka to be delivered in order

Option B
If the messages can come out of order, and  have a timestamp associated with
them, you can use window operations to sort messages within a window. You
will need to make sure that messages in the same session land in the same
Spark partition. This will add latency to the system though, because you
won't process the messages until the watermark has expired. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/In-order-processing-using-spark-streaming-tp28457p28646.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org