You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Pranavi Chandramohan <pr...@campaignmonitor.com> on 2019/06/03 06:02:08 UTC

Re: Transactional markers are not deleted from log segments when policy is compact

Hi Jonathan,

After looking at the changes in PR, I would like to ensure the fix would
work for our scenario too.
We create a new TransactionId for every producer instance every time it
restarts and this would result in creating new ProducerIds. Old producers
and their transactions will get expired.
Our expectation is that the markers of inactive ProducerIds should be
deleted. Does this hold true?

Thanks,
Pranavic

On Sat, Jun 1, 2019 at 8:45 AM Pranavi Chandramohan <
pranavic@campaignmonitor.com> wrote:

> Thanks Jonathan! That should help.
>
> On Fri, 31 May 2019, 6:44 pm Jonathan Santilli, <
> jonathansantilli@gmail.com> wrote:
>
>> Hello Pranavi,
>>
>> it sounds like this was solved in the release candidate 2.2.1-RC1 (
>> https://issues.apache.org/jira/browse/KAFKA-8335)
>> Take a look at it.
>>
>> Hope that helps.
>>
>>
>> Cheers!
>> --
>> Jonathan
>>
>>
>>
>>
>> On Fri, May 31, 2019 at 8:59 AM Pranavi Chandramohan <
>> pranavic@campaignmonitor.com> wrote:
>>
>> > Hi all,
>> >
>> > We use Kafka version 2.11-1.1.1. We produce and consume transactional
>> > messages and recently we noticed that 2 partitions of the
>> __consumer_offset
>> > topic have very high disk usage (256GB)
>> > When we looked at the log segments for these 2 partitions, there were
>> files
>> > that were 6 months old.
>> > By dumping the content of an old log segment using the following command
>> >
>> > kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration
>> > --print-data-log  --files 00000000003949894887.log | less
>> >
>> >
>> > we found that all the records were COMMIT transaction markers.
>> >
>> > offset: 1924582627 position: 183 CreateTime: 1548972578376 isvalid:
>> > true keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId:
>> > 126015 producerEpoch: 0 sequence: -1 isTransactional: true headerKeys:
>> > [] endTxnMarker: COMMIT coordinatorEpoch: 28
>> >
>> >
>> > Why are the commit transaction markers not compacted and deleted?
>> >
>> >
>> > Log cleaner config
>> >
>> > max.message.bytes 10000120
>> > min.cleanable.dirty.ratio 0.1
>> > compression.type uncompressed
>> > cleanup.policy compact
>> > retention.ms 2160000000
>> > segment.bytes 104857600
>> >
>> > # By default the log cleaner is disabled and the log retention policy
>> > will default to just delete segments after their retention expires.
>> > # If log.cleaner.enable=true is set the cleaner will be enabled and
>> > individual logs can then be marked for log compaction.
>> > log.cleaner.enable=true
>> > # give larger heap space to log cleaner
>> > log.cleaner.dedupe.buffer.size=1342177280
>> >
>>
>>
>> --
>> Santilli Jonathan
>>
>

Re: Transactional markers are not deleted from log segments when policy is compact

Posted by Pranavi Chandramohan <pr...@campaignmonitor.com>.
Thanks for confirming that Jonathan.

Cheers!

On Mon, 3 Jun 2019, 7:27 pm Jonathan Santilli, <jo...@gmail.com>
wrote:

> Hello Pranavi,
>
> according to the changes, that should be the case: "The producer is no
> longer active, which means we can delete all records for that producer."
>
>
> https://github.com/apache/kafka/pull/6715/files#diff-d7330411812d23e8a34889bee42fedfeR622
>
>
> Cheers!
> --
> Jonathan
>
>
>
>
> On Mon, Jun 3, 2019 at 7:02 AM Pranavi Chandramohan <
> pranavic@campaignmonitor.com> wrote:
>
> > Hi Jonathan,
> >
> > After looking at the changes in PR, I would like to ensure the fix would
> > work for our scenario too.
> > We create a new TransactionId for every producer instance every time it
> > restarts and this would result in creating new ProducerIds. Old producers
> > and their transactions will get expired.
> > Our expectation is that the markers of inactive ProducerIds should be
> > deleted. Does this hold true?
> >
> > Thanks,
> > Pranavic
> >
> > On Sat, Jun 1, 2019 at 8:45 AM Pranavi Chandramohan <
> > pranavic@campaignmonitor.com> wrote:
> >
> > > Thanks Jonathan! That should help.
> > >
> > > On Fri, 31 May 2019, 6:44 pm Jonathan Santilli, <
> > > jonathansantilli@gmail.com> wrote:
> > >
> > >> Hello Pranavi,
> > >>
> > >> it sounds like this was solved in the release candidate 2.2.1-RC1 (
> > >> https://issues.apache.org/jira/browse/KAFKA-8335)
> > >> Take a look at it.
> > >>
> > >> Hope that helps.
> > >>
> > >>
> > >> Cheers!
> > >> --
> > >> Jonathan
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, May 31, 2019 at 8:59 AM Pranavi Chandramohan <
> > >> pranavic@campaignmonitor.com> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > We use Kafka version 2.11-1.1.1. We produce and consume
> transactional
> > >> > messages and recently we noticed that 2 partitions of the
> > >> __consumer_offset
> > >> > topic have very high disk usage (256GB)
> > >> > When we looked at the log segments for these 2 partitions, there
> were
> > >> files
> > >> > that were 6 months old.
> > >> > By dumping the content of an old log segment using the following
> > command
> > >> >
> > >> > kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration
> > >> > --print-data-log  --files 00000000003949894887.log | less
> > >> >
> > >> >
> > >> > we found that all the records were COMMIT transaction markers.
> > >> >
> > >> > offset: 1924582627 position: 183 CreateTime: 1548972578376 isvalid:
> > >> > true keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE
> producerId:
> > >> > 126015 producerEpoch: 0 sequence: -1 isTransactional: true
> headerKeys:
> > >> > [] endTxnMarker: COMMIT coordinatorEpoch: 28
> > >> >
> > >> >
> > >> > Why are the commit transaction markers not compacted and deleted?
> > >> >
> > >> >
> > >> > Log cleaner config
> > >> >
> > >> > max.message.bytes 10000120
> > >> > min.cleanable.dirty.ratio 0.1
> > >> > compression.type uncompressed
> > >> > cleanup.policy compact
> > >> > retention.ms 2160000000
> > >> > segment.bytes 104857600
> > >> >
> > >> > # By default the log cleaner is disabled and the log retention
> policy
> > >> > will default to just delete segments after their retention expires.
> > >> > # If log.cleaner.enable=true is set the cleaner will be enabled and
> > >> > individual logs can then be marked for log compaction.
> > >> > log.cleaner.enable=true
> > >> > # give larger heap space to log cleaner
> > >> > log.cleaner.dedupe.buffer.size=1342177280
> > >> >
> > >>
> > >>
> > >> --
> > >> Santilli Jonathan
> > >>
> > >
> >
>
>
> --
> Santilli Jonathan
>

Re: Transactional markers are not deleted from log segments when policy is compact

Posted by Jonathan Santilli <jo...@gmail.com>.
Hello Pranavi,

according to the changes, that should be the case: "The producer is no
longer active, which means we can delete all records for that producer."

https://github.com/apache/kafka/pull/6715/files#diff-d7330411812d23e8a34889bee42fedfeR622


Cheers!
--
Jonathan




On Mon, Jun 3, 2019 at 7:02 AM Pranavi Chandramohan <
pranavic@campaignmonitor.com> wrote:

> Hi Jonathan,
>
> After looking at the changes in PR, I would like to ensure the fix would
> work for our scenario too.
> We create a new TransactionId for every producer instance every time it
> restarts and this would result in creating new ProducerIds. Old producers
> and their transactions will get expired.
> Our expectation is that the markers of inactive ProducerIds should be
> deleted. Does this hold true?
>
> Thanks,
> Pranavic
>
> On Sat, Jun 1, 2019 at 8:45 AM Pranavi Chandramohan <
> pranavic@campaignmonitor.com> wrote:
>
> > Thanks Jonathan! That should help.
> >
> > On Fri, 31 May 2019, 6:44 pm Jonathan Santilli, <
> > jonathansantilli@gmail.com> wrote:
> >
> >> Hello Pranavi,
> >>
> >> it sounds like this was solved in the release candidate 2.2.1-RC1 (
> >> https://issues.apache.org/jira/browse/KAFKA-8335)
> >> Take a look at it.
> >>
> >> Hope that helps.
> >>
> >>
> >> Cheers!
> >> --
> >> Jonathan
> >>
> >>
> >>
> >>
> >> On Fri, May 31, 2019 at 8:59 AM Pranavi Chandramohan <
> >> pranavic@campaignmonitor.com> wrote:
> >>
> >> > Hi all,
> >> >
> >> > We use Kafka version 2.11-1.1.1. We produce and consume transactional
> >> > messages and recently we noticed that 2 partitions of the
> >> __consumer_offset
> >> > topic have very high disk usage (256GB)
> >> > When we looked at the log segments for these 2 partitions, there were
> >> files
> >> > that were 6 months old.
> >> > By dumping the content of an old log segment using the following
> command
> >> >
> >> > kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration
> >> > --print-data-log  --files 00000000003949894887.log | less
> >> >
> >> >
> >> > we found that all the records were COMMIT transaction markers.
> >> >
> >> > offset: 1924582627 position: 183 CreateTime: 1548972578376 isvalid:
> >> > true keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId:
> >> > 126015 producerEpoch: 0 sequence: -1 isTransactional: true headerKeys:
> >> > [] endTxnMarker: COMMIT coordinatorEpoch: 28
> >> >
> >> >
> >> > Why are the commit transaction markers not compacted and deleted?
> >> >
> >> >
> >> > Log cleaner config
> >> >
> >> > max.message.bytes 10000120
> >> > min.cleanable.dirty.ratio 0.1
> >> > compression.type uncompressed
> >> > cleanup.policy compact
> >> > retention.ms 2160000000
> >> > segment.bytes 104857600
> >> >
> >> > # By default the log cleaner is disabled and the log retention policy
> >> > will default to just delete segments after their retention expires.
> >> > # If log.cleaner.enable=true is set the cleaner will be enabled and
> >> > individual logs can then be marked for log compaction.
> >> > log.cleaner.enable=true
> >> > # give larger heap space to log cleaner
> >> > log.cleaner.dedupe.buffer.size=1342177280
> >> >
> >>
> >>
> >> --
> >> Santilli Jonathan
> >>
> >
>


-- 
Santilli Jonathan