You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Nicolas Bär (JIRA)" <ji...@apache.org> on 2014/04/29 00:43:14 UTC

[jira] [Created] (SAMZA-255) Rewinding Streams within a StreamTask

Nicolas Bär created SAMZA-255:
---------------------------------

             Summary: Rewinding Streams within a StreamTask
                 Key: SAMZA-255
                 URL: https://issues.apache.org/jira/browse/SAMZA-255
             Project: Samza
          Issue Type: Wish
            Reporter: Nicolas Bär
            Priority: Minor


The many benefits of Kafka include persistent storage and its resulting possibility to rewind streams to a specific offset. Samza does currently not support rewinding of streams within a StreamTask. I'd like to place this functionality as a feature request and provide two use cases to further describe the benefits of such a feature. Let's consider a general use case to aggregate values within sliding windows.

1. Offline-Processing
In case of offline-processing the sliding window does not correlate to the system time. In this case any node failure will result in samza restoring from a checkpointed offset that most probably does not match the beginning of the most recent sliding window. But in order to gain precise results, one could rewind to the specific offset and process the missing events of the sliding window. The same holds for any use case where the data has to be processed in small batches and these batches do not correspond to the system time.

2. Late Arrival
Messages might get delayed before they are stored into Kafka. In this case one could rewind the offset in order to process older messages corresponding to the same sliding window.

I'd be happy to further discuss these cases and the proposed feature request.



--
This message was sent by Atlassian JIRA
(v6.2#6252)