You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Vinoth <vi...@gmail.com> on 2023/01/16 04:32:55 UTC

A query on log truncation.

I was reading through about kafka , the way leader election works , log
truncation etc. One thing that kind of struck me was how records which were
written to log but then were not committed (It has not propagated
successfully through to all of the isr and and the high watermark has not
increased and so not committed ) ,were truncated following the replication
reconciliation logic . In case they are not committed they would not be
available for the consumer since the reads are  only upto to the high
watermark. the producer client will also be notified or will eventually
know if the message has not successfully propagated and it should be
handled thru application logic. It seems straight forward in this case.

KIP-405
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage>
talks about tiered storage and kafka being an important part of and an
entry point for data infrastructure . Else where i have read that kafka
also serves as way of replaying data to restore state / viewing data.
KIP-320
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation>
mentions users wanting higher availability opting for unclean leader
election.

Would it be fair to assume that users might be interested in a feature or
at least  one that can be user enabled where a write to kafka (even with a
0 or no acks configuration or unlcean leader election ) will remain written
until the event where clean or delete config is acted upon?.

If this is a valid use case , i have thoughts of suggesting a kip around
picking up the data that is to be truncated at time of truncation and
replaying it as if it came through a fresh produce request. That is a
truncation of data will not result in the data being removed from kafka but
rather be placed differently at a different offset.

Regards,
Vinoth

Re: A query on log truncation.

Posted by Vinoth <vi...@gmail.com>.
Hi Luke ,
              Thanks for acknowledging my mail. Sorry for the late reply.
My query was not on keeping uncommitted records but on how teams managed
the loss of committed data in case of unclean leader election. Is there a
means to track lost data?. Is this a common problem?. I am asking based on
kip-320 which mentions committed data might be lost when unclean leader
election is enabled.

Regards,
Vinoth

On Mon, 16 Jan 2023 at 10:37, Luke Chen <sh...@gmail.com> wrote:

> Hi Vinoth,
>
> I'm wondering what's the use case or pain point you're trying to resolve?
> Like you said, the client will be notified the data is not successfully
> sent or propagated and handle the error, why should we keep the un-commited
> records?
> Could you elaborate more on the motivation?
>
> Thank you.
> Luke
>
> On Mon, Jan 16, 2023 at 12:33 PM Vinoth <vi...@gmail.com> wrote:
>
> > I was reading through about kafka , the way leader election works , log
> > truncation etc. One thing that kind of struck me was how records which
> were
> > written to log but then were not committed (It has not propagated
> > successfully through to all of the isr and and the high watermark has not
> > increased and so not committed ) ,were truncated following the
> replication
> > reconciliation logic . In case they are not committed they would not be
> > available for the consumer since the reads are  only upto to the high
> > watermark. the producer client will also be notified or will eventually
> > know if the message has not successfully propagated and it should be
> > handled thru application logic. It seems straight forward in this case.
> >
> > KIP-405
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
> > >
> > talks about tiered storage and kafka being an important part of and an
> > entry point for data infrastructure . Else where i have read that kafka
> > also serves as way of replaying data to restore state / viewing data.
> > KIP-320
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation
> > >
> > mentions users wanting higher availability opting for unclean leader
> > election.
> >
> > Would it be fair to assume that users might be interested in a feature or
> > at least  one that can be user enabled where a write to kafka (even with
> a
> > 0 or no acks configuration or unlcean leader election ) will remain
> written
> > until the event where clean or delete config is acted upon?.
> >
> > If this is a valid use case , i have thoughts of suggesting a kip around
> > picking up the data that is to be truncated at time of truncation and
> > replaying it as if it came through a fresh produce request. That is a
> > truncation of data will not result in the data being removed from kafka
> but
> > rather be placed differently at a different offset.
> >
> > Regards,
> > Vinoth
> >
>

Re: A query on log truncation.

Posted by Luke Chen <sh...@gmail.com>.
Hi Vinoth,

I'm wondering what's the use case or pain point you're trying to resolve?
Like you said, the client will be notified the data is not successfully
sent or propagated and handle the error, why should we keep the un-commited
records?
Could you elaborate more on the motivation?

Thank you.
Luke

On Mon, Jan 16, 2023 at 12:33 PM Vinoth <vi...@gmail.com> wrote:

> I was reading through about kafka , the way leader election works , log
> truncation etc. One thing that kind of struck me was how records which were
> written to log but then were not committed (It has not propagated
> successfully through to all of the isr and and the high watermark has not
> increased and so not committed ) ,were truncated following the replication
> reconciliation logic . In case they are not committed they would not be
> available for the consumer since the reads are  only upto to the high
> watermark. the producer client will also be notified or will eventually
> know if the message has not successfully propagated and it should be
> handled thru application logic. It seems straight forward in this case.
>
> KIP-405
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
> >
> talks about tiered storage and kafka being an important part of and an
> entry point for data infrastructure . Else where i have read that kafka
> also serves as way of replaying data to restore state / viewing data.
> KIP-320
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation
> >
> mentions users wanting higher availability opting for unclean leader
> election.
>
> Would it be fair to assume that users might be interested in a feature or
> at least  one that can be user enabled where a write to kafka (even with a
> 0 or no acks configuration or unlcean leader election ) will remain written
> until the event where clean or delete config is acted upon?.
>
> If this is a valid use case , i have thoughts of suggesting a kip around
> picking up the data that is to be truncated at time of truncation and
> replaying it as if it came through a fresh produce request. That is a
> truncation of data will not result in the data being removed from kafka but
> rather be placed differently at a different offset.
>
> Regards,
> Vinoth
>