You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by matan <ma...@cloudaloe.org> on 2013/03/02 21:53:26 UTC

Producer-side persistence and re-sending

Hi,

I am designing a patch on top the 0.8 code base.
The patch would provide persistence on the producer side. Meaning that 
messages being passed to the producer are persisted rather than kept 
transiently in memory. So that if the broker/s cannot be reached, 
messages can  accumulate and will be sent through to the broker/s, when 
they are available again. Although this would be somewhat superfluous in 
the new replication paradigm of 0.8, it's still possible to have some 
failures that disconnect a producer from the entire set of brokers. In 
that case, this patch-under-design would prevent data loss. Making the 
pipeline even more secured, and relieving producers of the need to 
handle persistence on their own. Plan is to use the Kafka Logger 
component for that. Of course having this behavior completely optional 
through a configuration option.

The slightly-deeper level design details are to use a Kafka Log per 
topic & partition (otherwise given the existing 0.8 code, in and around 
/producer.async.DefaultEventHandler.dispatchSerializedData/, it would 
seem resource intensive to keep track of messages sent v.s. failed ones 
for managing resending). Using the logger, behavior for keeping replica 
sets in sync would be skipped through choice of parameters, or would be 
made parameterized to fully neutralize it regarding the producer's own 
logging.

Now of course, given that 0.8 seems to be far along its runway, I assume 
this should go on top the trunk, which I'd like to confirm with you is 
where post 0.8 lives.

I'd appreciate your comments...

Thanks,
Matan

Re: Producer-side persistence and re-sending

Posted by Jun Rao <ju...@gmail.com>.
Matan,

Thanks for picking this up. Yes, this is likely a post 0.8 item and should
go into trunk. Could you file a jira and post your initial design there for
discussion?

Jun

On Sat, Mar 2, 2013 at 12:53 PM, matan <ma...@cloudaloe.org> wrote:

> Hi,
>
> I am designing a patch on top the 0.8 code base.
> The patch would provide persistence on the producer side. Meaning that
> messages being passed to the producer are persisted rather than kept
> transiently in memory. So that if the broker/s cannot be reached, messages
> can  accumulate and will be sent through to the broker/s, when they are
> available again. Although this would be somewhat superfluous in the new
> replication paradigm of 0.8, it's still possible to have some failures that
> disconnect a producer from the entire set of brokers. In that case, this
> patch-under-design would prevent data loss. Making the pipeline even more
> secured, and relieving producers of the need to handle persistence on their
> own. Plan is to use the Kafka Logger component for that. Of course having
> this behavior completely optional through a configuration option.
>
> The slightly-deeper level design details are to use a Kafka Log per topic
> & partition (otherwise given the existing 0.8 code, in and around
> /producer.async.**DefaultEventHandler.**dispatchSerializedData/, it would
> seem resource intensive to keep track of messages sent v.s. failed ones for
> managing resending). Using the logger, behavior for keeping replica sets in
> sync would be skipped through choice of parameters, or would be made
> parameterized to fully neutralize it regarding the producer's own logging.
>
> Now of course, given that 0.8 seems to be far along its runway, I assume
> this should go on top the trunk, which I'd like to confirm with you is
> where post 0.8 lives.
>
> I'd appreciate your comments...
>
> Thanks,
> Matan
>