You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jake Maes (JIRA)" <ji...@apache.org> on 2015/10/12 20:15:05 UTC

[jira] [Created] (SAMZA-794) Add the ability to manually commit specific offsets

Jake Maes created SAMZA-794:
-------------------------------

             Summary: Add the ability to manually commit specific offsets 
                 Key: SAMZA-794
                 URL: https://issues.apache.org/jira/browse/SAMZA-794
             Project: Samza
          Issue Type: Improvement
            Reporter: Jake Maes


Background:
At LinkedIn we have seen a number of cases where users need to make remote calls in their Samza jobs. If the remote calls are done synchronously and single-threaded, they add significant delay to the Samza event loop. To mitigate this, users could increase the container count to get more parallelism, but that allocates excessive resources in order to wait on IO. Instead, some users employ the following approach:
* Implement a threadpool to do the IO asynchronously
* In window() wait for all threads to complete
* Commit the offsets manually (auto commit is turned off)

This works but has the effect of blocking incoming requests while they wait for the others to flush. The throughput comes and goes in spurts. 

One way to fix this throughput issue is:
* Put the offsets from the IncomingMessageEnvelopes into a queue
* Associate the offset with the IO thread
* When the IO thread completes, add the offset to a set
* Periodically pop as many items off the queue as possible (by first checking if they exist in the set) and then commit the last offset we were able to pop from the queue.

Unfortunately, the TaskCoordinator doesn't expose a commit() for specific offsets and instead uses the lastProcessedOffsets from the OffsetManager. 

The goal of this ticket is to add the ability to commit user-specified offsets to support the scenario above. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)