You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by ja...@apache.org on 2016/11/22 22:36:30 UTC

samza git commit: SAMZA-1046: Docs for checkpointable consumer

Repository: samza
Updated Branches:
  refs/heads/master 362658938 -> 202a15809


SAMZA-1046: Docs for checkpointable consumer


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/202a1580
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/202a1580
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/202a1580

Branch: refs/heads/master
Commit: 202a15809ee4abc7700af9b02e903f2ebd57a662
Parents: 3626589
Author: Boris Shkolnik <bo...@apache.org>
Authored: Tue Nov 22 14:36:09 2016 -0800
Committer: vjagadish1989 <jv...@linkedin.com>
Committed: Tue Nov 22 14:36:14 2016 -0800

----------------------------------------------------------------------
 .../versioned/container/checkpointing.md          | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/202a1580/docs/learn/documentation/versioned/container/checkpointing.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/container/checkpointing.md b/docs/learn/documentation/versioned/container/checkpointing.md
index 6f8c6d6..9fb7e6d 100644
--- a/docs/learn/documentation/versioned/container/checkpointing.md
+++ b/docs/learn/documentation/versioned/container/checkpointing.md
@@ -121,4 +121,22 @@ samza-example/target/bin/checkpoint-tool.sh \
 
 Note that Samza only reads checkpoints on container startup. In order for your checkpoint change to take effect, you need to first stop the job, then save the modified offsets, and then start the job again. If you write a checkpoint while the job is running, it will most likely have no effect.
 
+### Checkpoint Callbacks
+Currently Samza takes care of checkpointing for all the systems. But there are some use-cases when we may need to inform the Consumer about each checkpoint we make.
+Here are few examples:
+
+* Samza cannot do checkpointing correctly or efficiently. One such case is when Samza is not doing the partitioning. In this case the container doesn\u2019t know which SSPs it is responsible for, and thus cannot checkpoint them. An actual example could be a system which relies on an auto-balanced High Level Kafka Consumer for partitioning.
+* Systems in which the consumer itself needs to control the checkpointed offset. Some systems do not support seek() operation (are not replayable), but they rely on ACKs for the delivered messages. Example could be a Kinesis consumer. Kinesis library provides a checkpoint callback in the* process() *call (push system). This callback needs to be invoked after the records are processed. This can only be done by the consumer itself.
+* Systems that use checkpoint/offset information for some maintenance actions. This information may be used to implement a smart retention policy (deleting all the data after it has been consumed).
+
+In order to use the checkpoint callback a SystemConsumer needs to implement the CheckpointListener interface:
+{% highlight java %}
+public interface CheckpointListener {
+  void onCheckpoint(Map<SystemStreamPartition, String> offsets);
+}
+{% endhighlight %}
+For the SystemConsumers which implement this interface Samza will invoke onCheckpoint() callback every time OffsetManager checkpoints. Checkpoints are done per task, and 'offsets' are all the offsets Samza checkpoints for a task,
+and these are the offsets which will be passed to the consumer on restart.
+Note that the callback will happen after the checkpoint and is **not** atomic.
+
 ## [State Management &raquo;](state-management.html)