You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Achanta Vamsi Subhash <ac...@flipkart.com> on 2016/03/02 12:13:45 UTC

Consumer Offsets Topic cleanup.policy

Hi all,

We have a __consumer_offsets topic has cleanup.policy=compact and
log.cleaner.enable=false. What would happen if we change the cleanup.policy
to delete? Will that treat the offsets topic as same as any other topic?

We currently have a setup without log.cleaner.enable=false and we have
offset topics hosting brokers using a lot of disk as they are never
cleaned/compacted. We tried enabling log.cleaner.enable=true for the
brokers with offsets topic and that is leading to lot of replicated data
and is taking hours to finish.

What is a better way to clean up the old segments of __consumer_offsets
topic?

-- 
Regards
Vamsi Subhash

Re: Consumer Offsets Topic cleanup.policy

Posted by Achanta Vamsi Subhash <ac...@flipkart.com>.
We have changed the __consumer_offsets topic policy to "delete" from
"compact" and it is working as expected for us. The segments older than
segment.ms are purged like it happens for a normal topic. Offsets fetch and
commit also worked fine. Thanks

On Tue, Mar 8, 2016 at 12:59 PM, Achanta Vamsi Subhash <
achanta.vamsi@flipkart.com> wrote:

> Thanks for the reply Jason.
>
> Our topics global retention is for 4 days and as we are planning to set
> the __consumer_offsets retention to the same interval, in the worst case,
> we won't loose any message offsets as the data will anyways be rotated.
>
> Reg. the problems with log cleaner:
> 1. I enabled log compaction on one of the broker where the consumer
> offsets partitions reside (and that was running out of memory due to huge
> size of these partitions).
> 2. Upon reboot, the broker actually truncated all the logs and started
> replicating afresh - TBs of data.
> 3. This made the entire cluster slow as we hit our Rx and Tx limits and
> all the other topics got affected as the producing latencies spiked to
> minutes from milli-seconds.
>
> Theoretically, I was only expecting the partitions will only get the diff
> of the data from the leader as it has almost all of the data for the
> to-be-compacted topics. But as the partitions were compacted upon restart,
> the replication started getting all the data from the leader (which is not
> compacted). Hence, we took the broker OOR. Also, other wierd thing was the
> consumers have reset the offsets of these partitions to the latest when
> they failed to commit the offsets (which happened when the this broker was
> leader for some of the partitions and acks=-1 for the offsets topic and
> when shutdown, takes some time ~ max lag messages before coming off the
> ISR). We will investigate further on why the offsets were reset and file a
> JIRA.
>
> Now we are in a situtation where we might run out of disk space for the
> __consumer_offsets hosting brokers and we are planning to change the policy
> to delete to avoid that. Please give your inputs. Thanks.
>
> On Tue, Mar 8, 2016 at 7:10 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> This is actually a really good question. If you change the retention
>> policy
>> of the offsets topic, then in the worst case, consumer groups could lose
>> their last committed positions and fall back to the auto reset behavior.
>> However, if your consumers are not down for a long time and you set the
>> retention to a reasonably long value, maybe you can get away with it? One
>> downside is that broker reads the entire offset log into an in-memory
>> cache
>> when it takes over leadership of one of the __consumer_offsets partitions.
>> Hence the longer your retention time, the longer it will take for the new
>> leader to read to the end of the log. There may be other consequences as
>> well that I haven't thought of...
>>
>> Can you describe in a little more detail the problem that you found
>> enabling the cleaner?
>>
>> -Jason
>>
>> On Sun, Mar 6, 2016 at 3:09 AM, Achanta Vamsi Subhash <
>> achanta.vamsi@flipkart.com> wrote:
>>
>> > Hi,
>> >
>> > We tested this on our stage environment and works fine if we change the
>> > policy to delete from compact. Will there be any side effects if we
>> change
>> > it to delete for the __consumer_offsets topic?
>> >
>> > On Wed, Mar 2, 2016 at 4:43 PM, Achanta Vamsi Subhash <
>> > achanta.vamsi@flipkart.com> wrote:
>> >
>> > > Hi all,
>> > >
>> > > We have a __consumer_offsets topic has cleanup.policy=compact and
>> > > log.cleaner.enable=false. What would happen if we change the
>> > cleanup.policy
>> > > to delete? Will that treat the offsets topic as same as any other
>> topic?
>> > >
>> > > We currently have a setup without log.cleaner.enable=false and we have
>> > > offset topics hosting brokers using a lot of disk as they are never
>> > > cleaned/compacted. We tried enabling log.cleaner.enable=true for the
>> > > brokers with offsets topic and that is leading to lot of replicated
>> data
>> > > and is taking hours to finish.
>> > >
>> > > What is a better way to clean up the old segments of
>> __consumer_offsets
>> > > topic?
>> > >
>> > > --
>> > > Regards
>> > > Vamsi Subhash
>> > >
>> >
>> >
>> >
>> > --
>> > Regards
>> > Vamsi Subhash
>> >
>>
>
>
>
> --
> Regards
> Vamsi Subhash
>



-- 
Regards
Vamsi Subhash

Re: Consumer Offsets Topic cleanup.policy

Posted by Achanta Vamsi Subhash <ac...@flipkart.com>.
Thanks for the reply Jason.

Our topics global retention is for 4 days and as we are planning to set the
__consumer_offsets retention to the same interval, in the worst case, we
won't loose any message offsets as the data will anyways be rotated.

Reg. the problems with log cleaner:
1. I enabled log compaction on one of the broker where the consumer offsets
partitions reside (and that was running out of memory due to huge size of
these partitions).
2. Upon reboot, the broker actually truncated all the logs and started
replicating afresh - TBs of data.
3. This made the entire cluster slow as we hit our Rx and Tx limits and all
the other topics got affected as the producing latencies spiked to minutes
from milli-seconds.

Theoretically, I was only expecting the partitions will only get the diff
of the data from the leader as it has almost all of the data for the
to-be-compacted topics. But as the partitions were compacted upon restart,
the replication started getting all the data from the leader (which is not
compacted). Hence, we took the broker OOR. Also, other wierd thing was the
consumers have reset the offsets of these partitions to the latest when
they failed to commit the offsets (which happened when the this broker was
leader for some of the partitions and acks=-1 for the offsets topic and
when shutdown, takes some time ~ max lag messages before coming off the
ISR). We will investigate further on why the offsets were reset and file a
JIRA.

Now we are in a situtation where we might run out of disk space for the
__consumer_offsets hosting brokers and we are planning to change the policy
to delete to avoid that. Please give your inputs. Thanks.

On Tue, Mar 8, 2016 at 7:10 AM, Jason Gustafson <ja...@confluent.io> wrote:

> This is actually a really good question. If you change the retention policy
> of the offsets topic, then in the worst case, consumer groups could lose
> their last committed positions and fall back to the auto reset behavior.
> However, if your consumers are not down for a long time and you set the
> retention to a reasonably long value, maybe you can get away with it? One
> downside is that broker reads the entire offset log into an in-memory cache
> when it takes over leadership of one of the __consumer_offsets partitions.
> Hence the longer your retention time, the longer it will take for the new
> leader to read to the end of the log. There may be other consequences as
> well that I haven't thought of...
>
> Can you describe in a little more detail the problem that you found
> enabling the cleaner?
>
> -Jason
>
> On Sun, Mar 6, 2016 at 3:09 AM, Achanta Vamsi Subhash <
> achanta.vamsi@flipkart.com> wrote:
>
> > Hi,
> >
> > We tested this on our stage environment and works fine if we change the
> > policy to delete from compact. Will there be any side effects if we
> change
> > it to delete for the __consumer_offsets topic?
> >
> > On Wed, Mar 2, 2016 at 4:43 PM, Achanta Vamsi Subhash <
> > achanta.vamsi@flipkart.com> wrote:
> >
> > > Hi all,
> > >
> > > We have a __consumer_offsets topic has cleanup.policy=compact and
> > > log.cleaner.enable=false. What would happen if we change the
> > cleanup.policy
> > > to delete? Will that treat the offsets topic as same as any other
> topic?
> > >
> > > We currently have a setup without log.cleaner.enable=false and we have
> > > offset topics hosting brokers using a lot of disk as they are never
> > > cleaned/compacted. We tried enabling log.cleaner.enable=true for the
> > > brokers with offsets topic and that is leading to lot of replicated
> data
> > > and is taking hours to finish.
> > >
> > > What is a better way to clean up the old segments of __consumer_offsets
> > > topic?
> > >
> > > --
> > > Regards
> > > Vamsi Subhash
> > >
> >
> >
> >
> > --
> > Regards
> > Vamsi Subhash
> >
>



-- 
Regards
Vamsi Subhash

Re: Consumer Offsets Topic cleanup.policy

Posted by Jason Gustafson <ja...@confluent.io>.
This is actually a really good question. If you change the retention policy
of the offsets topic, then in the worst case, consumer groups could lose
their last committed positions and fall back to the auto reset behavior.
However, if your consumers are not down for a long time and you set the
retention to a reasonably long value, maybe you can get away with it? One
downside is that broker reads the entire offset log into an in-memory cache
when it takes over leadership of one of the __consumer_offsets partitions.
Hence the longer your retention time, the longer it will take for the new
leader to read to the end of the log. There may be other consequences as
well that I haven't thought of...

Can you describe in a little more detail the problem that you found
enabling the cleaner?

-Jason

On Sun, Mar 6, 2016 at 3:09 AM, Achanta Vamsi Subhash <
achanta.vamsi@flipkart.com> wrote:

> Hi,
>
> We tested this on our stage environment and works fine if we change the
> policy to delete from compact. Will there be any side effects if we change
> it to delete for the __consumer_offsets topic?
>
> On Wed, Mar 2, 2016 at 4:43 PM, Achanta Vamsi Subhash <
> achanta.vamsi@flipkart.com> wrote:
>
> > Hi all,
> >
> > We have a __consumer_offsets topic has cleanup.policy=compact and
> > log.cleaner.enable=false. What would happen if we change the
> cleanup.policy
> > to delete? Will that treat the offsets topic as same as any other topic?
> >
> > We currently have a setup without log.cleaner.enable=false and we have
> > offset topics hosting brokers using a lot of disk as they are never
> > cleaned/compacted. We tried enabling log.cleaner.enable=true for the
> > brokers with offsets topic and that is leading to lot of replicated data
> > and is taking hours to finish.
> >
> > What is a better way to clean up the old segments of __consumer_offsets
> > topic?
> >
> > --
> > Regards
> > Vamsi Subhash
> >
>
>
>
> --
> Regards
> Vamsi Subhash
>

Re: Consumer Offsets Topic cleanup.policy

Posted by Achanta Vamsi Subhash <ac...@flipkart.com>.
Hi,

We tested this on our stage environment and works fine if we change the
policy to delete from compact. Will there be any side effects if we change
it to delete for the __consumer_offsets topic?

On Wed, Mar 2, 2016 at 4:43 PM, Achanta Vamsi Subhash <
achanta.vamsi@flipkart.com> wrote:

> Hi all,
>
> We have a __consumer_offsets topic has cleanup.policy=compact and
> log.cleaner.enable=false. What would happen if we change the cleanup.policy
> to delete? Will that treat the offsets topic as same as any other topic?
>
> We currently have a setup without log.cleaner.enable=false and we have
> offset topics hosting brokers using a lot of disk as they are never
> cleaned/compacted. We tried enabling log.cleaner.enable=true for the
> brokers with offsets topic and that is leading to lot of replicated data
> and is taking hours to finish.
>
> What is a better way to clean up the old segments of __consumer_offsets
> topic?
>
> --
> Regards
> Vamsi Subhash
>



-- 
Regards
Vamsi Subhash