You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tech Meme <ev...@gmail.com> on 2015/08/14 01:32:44 UTC

Custom serialization and checkpointing

Hi Guys,
   We need to do some state checkpointing (an rdd thats updated using
updateStateByKey). We would like finer control over the serialization.
Also, this would allow us to do schema evolution in the deserialization
code when we need to modify the structure of the classes associated with
the state.

I guess I can do foreachRDD and write to any location (either to a blob
store or a dynamo).

A) How I can make the checkpoint recovery read data from this persisted
location.
B) I notice that calling checkpoint cleans up older versions of the
checkpoint. Where should i be writing this cleanup code.
C) My understanding is that checkpointing is atomic. Is there anything I
need to be aware of to not loose the atomicity semantics.


Thanks,
Arun