You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Mike Gould <mi...@gmail.com> on 2017/01/06 11:57:52 UTC

compaction + delete not working for me

Hi

I'm trying to configure log compaction + deletion as per KIP-71 in kafka
0.10.1 but so far haven't had any luck. My tests show more than 50%
duplicate keys when reading from the beginning even several minutes after
all the events were sent.
The documentation in section 3.1 doesn't seem very clear to me in terms of
exactly how to configure particular behavior. Could someone please clarify
a few things for me?

In order to significantly reduce the amount of data that new subscribers
have to receive I want to compact events as soon as possible, and delete
any events more than 24 hours old (e.g if there hasn't been an update with
a matching key for 24h).

I have set

cleanup.policy=compact, delete
min.cleanable.dirty.ratio=0.5
min.compaction.lag.ms=0
retention.ms=86400000
delete.retention.ms=86460000
segment.ms=60000


   - Should the cleanup.policy be "compact,delete" or "compact, delete" or
   something else?
   - Are events eligible for compaction soon after the min.compaction.lag.ms
   time and segment.ms or is there another parameter that affects this?
   I.e. if I read from the beginning after a couple of minutes should I see no
   more than 50% of the events received have the same key as previous events.
   - Does the retention.ms parameter only affect the deletion?
   - How can I tell if the config is accepted and compaction is working? Is
   there something useful to search for in the logs?
   - Also if I change the topic config via the kafka-configs.sh tool does
   the change take effect immediately for existing events, do I have to
   restart the brokers, or does it only affect new events?

Thank you
Mike G

Re: compaction + delete not working for me

Posted by Mike Gould <mi...@gmail.com>.
That's great thank you I have it working
One other thing I noticed; if I send a batch of data then wait then
compaction never happens. If I send a few more messages later then the
first batch gets compacted. I guess it needs a constant flow to trigger
compaction of completed segments. So it shows that my test doesn't match
real life. 😃


On Fri, 6 Jan 2017 at 21:36, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> On Fri, Jan 6, 2017 at 3:57 AM, Mike Gould <mi...@gmail.com> wrote:
>
>
>
> > Hi
>
> >
>
> > I'm trying to configure log compaction + deletion as per KIP-71 in kafka
>
> > 0.10.1 but so far haven't had any luck. My tests show more than 50%
>
> > duplicate keys when reading from the beginning even several minutes after
>
> > all the events were sent.
>
> > The documentation in section 3.1 doesn't seem very clear to me in terms
> of
>
> > exactly how to configure particular behavior. Could someone please
> clarify
>
> > a few things for me?
>
> >
>
> > In order to significantly reduce the amount of data that new subscribers
>
> > have to receive I want to compact events as soon as possible, and delete
>
> > any events more than 24 hours old (e.g if there hasn't been an update
> with
>
> > a matching key for 24h).
>
> >
>
> > I have set
>
> >
>
> > cleanup.policy=compact, delete
>
> > min.cleanable.dirty.ratio=0.5
>
> > min.compaction.lag.ms=0
>
> > retention.ms=86400000
>
> > delete.retention.ms=86460000
>
> > segment.ms=60000
>
> >
>
> >
>
> >    - Should the cleanup.policy be "compact,delete" or "compact, delete"
> or
>
> >    something else?
>
> >
>
>
>
> Either should work, extra leading and trailing spaces are removed.
>
>
>
>
>
> >    - Are events eligible for compaction soon after the
>
> > min.compaction.lag.ms
>
> >    time and segment.ms or is there another parameter that affects this?
>
>
>
>    I.e. if I read from the beginning after a couple of minutes should I see
>
> > no
>
> >    more than 50% of the events received have the same key as previous
>
> > events.
>
> >
>
>
>
> Maybe you need to modify log.retention.check.interval.ms? It defaults to 5
>
> minutes. The log cleaning runs periodically, so you may just not have
>
> waited log enough for cleaning to have executed.
>
>
>
>
>
> >    - Does the retention.ms parameter only affect the deletion?
>
> >    - How can I tell if the config is accepted and compaction is working?
> Is
>
> >    there something useful to search for in the logs?
>
> >
>
>
>
> Check for logs from LogCleaner.scala. It should log some info when it runs.
>
>
>
>
>
> >    - Also if I change the topic config via the kafka-configs.sh tool does
>
> >    the change take effect immediately for existing events, do I have to
>
> >    restart the brokers, or does it only affect new events?
>
> >
>
>
>
> Topic config changes shouldn't need a broker restart.
>
>
>
> -Ewen
>
>
>
>
>
> >
>
> > Thank you
>
> > Mike G
>
> >
>
>

Re: compaction + delete not working for me

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
On Fri, Jan 6, 2017 at 3:57 AM, Mike Gould <mi...@gmail.com> wrote:

> Hi
>
> I'm trying to configure log compaction + deletion as per KIP-71 in kafka
> 0.10.1 but so far haven't had any luck. My tests show more than 50%
> duplicate keys when reading from the beginning even several minutes after
> all the events were sent.
> The documentation in section 3.1 doesn't seem very clear to me in terms of
> exactly how to configure particular behavior. Could someone please clarify
> a few things for me?
>
> In order to significantly reduce the amount of data that new subscribers
> have to receive I want to compact events as soon as possible, and delete
> any events more than 24 hours old (e.g if there hasn't been an update with
> a matching key for 24h).
>
> I have set
>
> cleanup.policy=compact, delete
> min.cleanable.dirty.ratio=0.5
> min.compaction.lag.ms=0
> retention.ms=86400000
> delete.retention.ms=86460000
> segment.ms=60000
>
>
>    - Should the cleanup.policy be "compact,delete" or "compact, delete" or
>    something else?
>

Either should work, extra leading and trailing spaces are removed.


>    - Are events eligible for compaction soon after the
> min.compaction.lag.ms
>    time and segment.ms or is there another parameter that affects this?

   I.e. if I read from the beginning after a couple of minutes should I see
> no
>    more than 50% of the events received have the same key as previous
> events.
>

Maybe you need to modify log.retention.check.interval.ms? It defaults to 5
minutes. The log cleaning runs periodically, so you may just not have
waited log enough for cleaning to have executed.


>    - Does the retention.ms parameter only affect the deletion?
>    - How can I tell if the config is accepted and compaction is working? Is
>    there something useful to search for in the logs?
>

Check for logs from LogCleaner.scala. It should log some info when it runs.


>    - Also if I change the topic config via the kafka-configs.sh tool does
>    the change take effect immediately for existing events, do I have to
>    restart the brokers, or does it only affect new events?
>

Topic config changes shouldn't need a broker restart.

-Ewen


>
> Thank you
> Mike G
>