You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Kopacki, Tomasz (Nokia - PL/Wroclaw)" <to...@nokia.com> on 2018/03/21 09:06:31 UTC

log compaction v log rotation - best of the two worlds

Hi,
I've been recently exploring log handling in kafka and I wonder if/how can I mixed log compaction with log rotation.
A little background first:
I have an application that uses kafka topics as a backend for event sourcing. Messages represents change of state of my 'resources'. Each resource has UID that is also used as a key for the messages.
My resources have a lifecycle and when their life ends I don't need them anymore and there is no point in keeping their history. Having said that I thought that best choice for me will be log compaction with tombstone feature but I also would like to have a possibility to keep history of changes for the resources(only until they die).
With those requirements I'd love to have a possibility to use tombstone feature for log rotation but I guess it ain't working like that.

Does anyone here had similar requirements and solve that somehow ?


Sincerely,
Tomasz Kopacki
DevOps Engineer @ Nokia


RE: log compaction v log rotation - best of the two worlds

Posted by "Kopacki, Tomasz (Nokia - PL/Wroclaw)" <to...@nokia.com>.
I think first solution is what I need. Your second proposal also looks fine but I don't like idea of keeping additional ledger.
Thanks Svante ! I appreciate your help.

Sincerely, 
Tomasz Kopacki
DevOps Engineer @ Nokia

-----Original Message-----
From: Svante Karlsson [mailto:svante.karlsson@csi.se] 
Sent: Wednesday, March 21, 2018 11:21 AM
To: users@kafka.apache.org
Subject: Re: log compaction v log rotation - best of the two worlds

alt1)
if you can store a generation counter in the value of the "latest value"
topic you could do as follows

topic latest_value key [id]

topic full_history key[id, generation]

on delete get the latest_value.generation_counter and issue deletes on full_history key[id, 0..generation_counter]

alt2)
if you cannot store a generation_counter in "latest_value" store a timestamp or uuid to make each key unique

topic latest_value key [id]

topic full_history key[id, timestamp/uuid] on delete of "id" scan full_history topic from beginning and issue deletes on full_history key[id, timestamp]

you could optimized this by having another topic that contains a "to be purged ids"

/svante



2018-03-21 11:16 GMT+01:00 Manikumar <ma...@gmail.com>:

> Sorry, I was wrong. For history topic, we can use regular topic with 
> sufficient retention period.
> maybe others can give more ideas.
>
> On Wed, Mar 21, 2018 at 3:34 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) 
> < tomasz.kopacki@nokia.com> wrote:
>
> > Do you mean I can use tombstone if my clean policy is 'delete' and 
> > it still work ?
> >
> > Sincerely,
> > Tomasz Kopacki
> > DevOps Engineer @ Nokia
> >
> > -----Original Message-----
> > From: Manikumar [mailto:manikumar.reddy@gmail.com]
> > Sent: Wednesday, March 21, 2018 11:03 AM
> > To: users@kafka.apache.org
> > Subject: Re: log compaction v log rotation - best of the two worlds
> >
> > Not sure if understood requirement correctly.  one option is to use 
> > two compacted topic topics. one is for current state of the resource 
> > and one
> is
> > for history. and use tombstones whenever you want to clear them.
> >
> > On Wed, Mar 21, 2018 at 2:53 PM, Kopacki, Tomasz (Nokia - 
> > PL/Wroclaw) < tomasz.kopacki@nokia.com> wrote:
> >
> > > Almost,
> > > Consider this example:
> > >
> > > |R1|R2|R1|R3|R2|R4|R4|R1|R2| <- this is an example of a stream 
> > > |R1|R2|R1|R3|R2|R4|R4|R1|R2| where RX
> > > represents updates of a particular resource. I need to keep the 
> > > history of changes forever for all the resources but only until 
> > > resource is alive. If resource expires/dies I'd like to remove it 
> > > completely. In this example consider that resource R2 dies but 
> > > others are still alive. In such case I'd like to able to transform this into:
> > > |R1|R1|R3|R4|R4|R1| <- so, it's not compaction because I need the 
> > > |R1|R1|R3|R4|R4|R1| history
> > > of changes but neither I can simply remove 'old' messages because 
> > > I need to do this based of the lifecycle of the resource not just 
> > > their
> > age.
> > >
> > >
> > >
> > > Sincerely,
> > > Tomasz Kopacki
> > > DevOps Engineer @ Nokia
> > >
> > > -----Original Message-----
> > > From: Manikumar [mailto:manikumar.reddy@gmail.com]
> > > Sent: Wednesday, March 21, 2018 10:17 AM
> > > To: users@kafka.apache.org
> > > Subject: Re: log compaction v log rotation - best of the two 
> > > worlds
> > >
> > > We can enable both compaction and retention for a topic by setting 
> > > cleanup.policy="delete,compact"
> > > http://kafka.apache.org/documentation/#topicconfigs
> > >
> > > Does this handle your requirement?
> > >
> > > On Wed, Mar 21, 2018 at 2:36 PM, Kopacki, Tomasz (Nokia - 
> > > PL/Wroclaw) < tomasz.kopacki@nokia.com> wrote:
> > >
> > > > Hi,
> > > > I've been recently exploring log handling in kafka and I wonder 
> > > > if/how can I mixed log compaction with log rotation.
> > > > A little background first:
> > > > I have an application that uses kafka topics as a backend for 
> > > > event sourcing. Messages represents change of state of my 'resources'.
> > > > Each resource has UID that is also used as a key for the messages.
> > > > My resources have a lifecycle and when their life ends I don't 
> > > > need them anymore and there is no point in keeping their 
> > > > history. Having said that I thought that best choice for me will 
> > > > be log compaction with tombstone feature but I also would like 
> > > > to have a possibility to keep history of changes for the resources(only until they die).
> > > > With those requirements I'd love to have a possibility to use 
> > > > tombstone feature for log rotation but I guess it ain't working 
> > > > like
> > > that.
> > > >
> > > > Does anyone here had similar requirements and solve that somehow ?
> > > >
> > > >
> > > > Sincerely,
> > > > Tomasz Kopacki
> > > > DevOps Engineer @ Nokia
> > > >
> > > >
> > >
> >
>

Re: log compaction v log rotation - best of the two worlds

Posted by Svante Karlsson <sv...@csi.se>.
alt1)
if you can store a generation counter in the value of the "latest value"
topic you could do as follows

topic latest_value key [id]

topic full_history key[id, generation]

on delete get the latest_value.generation_counter and issue deletes on
full_history
key[id, 0..generation_counter]

alt2)
if you cannot store a generation_counter in "latest_value" store a
timestamp or uuid to make each key unique

topic latest_value key [id]

topic full_history key[id, timestamp/uuid]
on delete of "id" scan full_history topic from beginning and issue deletes
on full_history key[id, timestamp]

you could optimized this by having another topic that contains a "to be
purged ids"

/svante



2018-03-21 11:16 GMT+01:00 Manikumar <ma...@gmail.com>:

> Sorry, I was wrong. For history topic, we can use regular topic with
> sufficient retention period.
> maybe others can give more ideas.
>
> On Wed, Mar 21, 2018 at 3:34 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) <
> tomasz.kopacki@nokia.com> wrote:
>
> > Do you mean I can use tombstone if my clean policy is 'delete' and it
> > still work ?
> >
> > Sincerely,
> > Tomasz Kopacki
> > DevOps Engineer @ Nokia
> >
> > -----Original Message-----
> > From: Manikumar [mailto:manikumar.reddy@gmail.com]
> > Sent: Wednesday, March 21, 2018 11:03 AM
> > To: users@kafka.apache.org
> > Subject: Re: log compaction v log rotation - best of the two worlds
> >
> > Not sure if understood requirement correctly.  one option is to use two
> > compacted topic topics. one is for current state of the resource and one
> is
> > for history. and use tombstones whenever you want to clear them.
> >
> > On Wed, Mar 21, 2018 at 2:53 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) <
> > tomasz.kopacki@nokia.com> wrote:
> >
> > > Almost,
> > > Consider this example:
> > >
> > > |R1|R2|R1|R3|R2|R4|R4|R1|R2| <- this is an example of a stream where
> > > |R1|R2|R1|R3|R2|R4|R4|R1|R2| RX
> > > represents updates of a particular resource. I need to keep the
> > > history of changes forever for all the resources but only until
> > > resource is alive. If resource expires/dies I'd like to remove it
> > > completely. In this example consider that resource R2 dies but others
> > > are still alive. In such case I'd like to able to transform this into:
> > > |R1|R1|R3|R4|R4|R1| <- so, it's not compaction because I need the
> > > |R1|R1|R3|R4|R4|R1| history
> > > of changes but neither I can simply remove 'old' messages because I
> > > need to do this based of the lifecycle of the resource not just their
> > age.
> > >
> > >
> > >
> > > Sincerely,
> > > Tomasz Kopacki
> > > DevOps Engineer @ Nokia
> > >
> > > -----Original Message-----
> > > From: Manikumar [mailto:manikumar.reddy@gmail.com]
> > > Sent: Wednesday, March 21, 2018 10:17 AM
> > > To: users@kafka.apache.org
> > > Subject: Re: log compaction v log rotation - best of the two worlds
> > >
> > > We can enable both compaction and retention for a topic by setting
> > > cleanup.policy="delete,compact"
> > > http://kafka.apache.org/documentation/#topicconfigs
> > >
> > > Does this handle your requirement?
> > >
> > > On Wed, Mar 21, 2018 at 2:36 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw)
> > > < tomasz.kopacki@nokia.com> wrote:
> > >
> > > > Hi,
> > > > I've been recently exploring log handling in kafka and I wonder
> > > > if/how can I mixed log compaction with log rotation.
> > > > A little background first:
> > > > I have an application that uses kafka topics as a backend for event
> > > > sourcing. Messages represents change of state of my 'resources'.
> > > > Each resource has UID that is also used as a key for the messages.
> > > > My resources have a lifecycle and when their life ends I don't need
> > > > them anymore and there is no point in keeping their history. Having
> > > > said that I thought that best choice for me will be log compaction
> > > > with tombstone feature but I also would like to have a possibility
> > > > to keep history of changes for the resources(only until they die).
> > > > With those requirements I'd love to have a possibility to use
> > > > tombstone feature for log rotation but I guess it ain't working like
> > > that.
> > > >
> > > > Does anyone here had similar requirements and solve that somehow ?
> > > >
> > > >
> > > > Sincerely,
> > > > Tomasz Kopacki
> > > > DevOps Engineer @ Nokia
> > > >
> > > >
> > >
> >
>

Re: log compaction v log rotation - best of the two worlds

Posted by Manikumar <ma...@gmail.com>.
Sorry, I was wrong. For history topic, we can use regular topic with
sufficient retention period.
maybe others can give more ideas.

On Wed, Mar 21, 2018 at 3:34 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) <
tomasz.kopacki@nokia.com> wrote:

> Do you mean I can use tombstone if my clean policy is 'delete' and it
> still work ?
>
> Sincerely,
> Tomasz Kopacki
> DevOps Engineer @ Nokia
>
> -----Original Message-----
> From: Manikumar [mailto:manikumar.reddy@gmail.com]
> Sent: Wednesday, March 21, 2018 11:03 AM
> To: users@kafka.apache.org
> Subject: Re: log compaction v log rotation - best of the two worlds
>
> Not sure if understood requirement correctly.  one option is to use two
> compacted topic topics. one is for current state of the resource and one is
> for history. and use tombstones whenever you want to clear them.
>
> On Wed, Mar 21, 2018 at 2:53 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) <
> tomasz.kopacki@nokia.com> wrote:
>
> > Almost,
> > Consider this example:
> >
> > |R1|R2|R1|R3|R2|R4|R4|R1|R2| <- this is an example of a stream where
> > |R1|R2|R1|R3|R2|R4|R4|R1|R2| RX
> > represents updates of a particular resource. I need to keep the
> > history of changes forever for all the resources but only until
> > resource is alive. If resource expires/dies I'd like to remove it
> > completely. In this example consider that resource R2 dies but others
> > are still alive. In such case I'd like to able to transform this into:
> > |R1|R1|R3|R4|R4|R1| <- so, it's not compaction because I need the
> > |R1|R1|R3|R4|R4|R1| history
> > of changes but neither I can simply remove 'old' messages because I
> > need to do this based of the lifecycle of the resource not just their
> age.
> >
> >
> >
> > Sincerely,
> > Tomasz Kopacki
> > DevOps Engineer @ Nokia
> >
> > -----Original Message-----
> > From: Manikumar [mailto:manikumar.reddy@gmail.com]
> > Sent: Wednesday, March 21, 2018 10:17 AM
> > To: users@kafka.apache.org
> > Subject: Re: log compaction v log rotation - best of the two worlds
> >
> > We can enable both compaction and retention for a topic by setting
> > cleanup.policy="delete,compact"
> > http://kafka.apache.org/documentation/#topicconfigs
> >
> > Does this handle your requirement?
> >
> > On Wed, Mar 21, 2018 at 2:36 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw)
> > < tomasz.kopacki@nokia.com> wrote:
> >
> > > Hi,
> > > I've been recently exploring log handling in kafka and I wonder
> > > if/how can I mixed log compaction with log rotation.
> > > A little background first:
> > > I have an application that uses kafka topics as a backend for event
> > > sourcing. Messages represents change of state of my 'resources'.
> > > Each resource has UID that is also used as a key for the messages.
> > > My resources have a lifecycle and when their life ends I don't need
> > > them anymore and there is no point in keeping their history. Having
> > > said that I thought that best choice for me will be log compaction
> > > with tombstone feature but I also would like to have a possibility
> > > to keep history of changes for the resources(only until they die).
> > > With those requirements I'd love to have a possibility to use
> > > tombstone feature for log rotation but I guess it ain't working like
> > that.
> > >
> > > Does anyone here had similar requirements and solve that somehow ?
> > >
> > >
> > > Sincerely,
> > > Tomasz Kopacki
> > > DevOps Engineer @ Nokia
> > >
> > >
> >
>

RE: log compaction v log rotation - best of the two worlds

Posted by "Kopacki, Tomasz (Nokia - PL/Wroclaw)" <to...@nokia.com>.
Do you mean I can use tombstone if my clean policy is 'delete' and it still work ?

Sincerely, 
Tomasz Kopacki
DevOps Engineer @ Nokia

-----Original Message-----
From: Manikumar [mailto:manikumar.reddy@gmail.com] 
Sent: Wednesday, March 21, 2018 11:03 AM
To: users@kafka.apache.org
Subject: Re: log compaction v log rotation - best of the two worlds

Not sure if understood requirement correctly.  one option is to use two compacted topic topics. one is for current state of the resource and one is for history. and use tombstones whenever you want to clear them.

On Wed, Mar 21, 2018 at 2:53 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) < tomasz.kopacki@nokia.com> wrote:

> Almost,
> Consider this example:
>
> |R1|R2|R1|R3|R2|R4|R4|R1|R2| <- this is an example of a stream where 
> |R1|R2|R1|R3|R2|R4|R4|R1|R2| RX
> represents updates of a particular resource. I need to keep the 
> history of changes forever for all the resources but only until 
> resource is alive. If resource expires/dies I'd like to remove it 
> completely. In this example consider that resource R2 dies but others 
> are still alive. In such case I'd like to able to transform this into:
> |R1|R1|R3|R4|R4|R1| <- so, it's not compaction because I need the 
> |R1|R1|R3|R4|R4|R1| history
> of changes but neither I can simply remove 'old' messages because I 
> need to do this based of the lifecycle of the resource not just their age.
>
>
>
> Sincerely,
> Tomasz Kopacki
> DevOps Engineer @ Nokia
>
> -----Original Message-----
> From: Manikumar [mailto:manikumar.reddy@gmail.com]
> Sent: Wednesday, March 21, 2018 10:17 AM
> To: users@kafka.apache.org
> Subject: Re: log compaction v log rotation - best of the two worlds
>
> We can enable both compaction and retention for a topic by setting 
> cleanup.policy="delete,compact"
> http://kafka.apache.org/documentation/#topicconfigs
>
> Does this handle your requirement?
>
> On Wed, Mar 21, 2018 at 2:36 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) 
> < tomasz.kopacki@nokia.com> wrote:
>
> > Hi,
> > I've been recently exploring log handling in kafka and I wonder 
> > if/how can I mixed log compaction with log rotation.
> > A little background first:
> > I have an application that uses kafka topics as a backend for event 
> > sourcing. Messages represents change of state of my 'resources'. 
> > Each resource has UID that is also used as a key for the messages.
> > My resources have a lifecycle and when their life ends I don't need 
> > them anymore and there is no point in keeping their history. Having 
> > said that I thought that best choice for me will be log compaction 
> > with tombstone feature but I also would like to have a possibility 
> > to keep history of changes for the resources(only until they die).
> > With those requirements I'd love to have a possibility to use 
> > tombstone feature for log rotation but I guess it ain't working like
> that.
> >
> > Does anyone here had similar requirements and solve that somehow ?
> >
> >
> > Sincerely,
> > Tomasz Kopacki
> > DevOps Engineer @ Nokia
> >
> >
>

Re: log compaction v log rotation - best of the two worlds

Posted by Manikumar <ma...@gmail.com>.
Not sure if understood requirement correctly.  one option is to use two
compacted topic topics. one is for current state of the resource
and one is for history. and use tombstones whenever you want to clear them.

On Wed, Mar 21, 2018 at 2:53 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) <
tomasz.kopacki@nokia.com> wrote:

> Almost,
> Consider this example:
>
> |R1|R2|R1|R3|R2|R4|R4|R1|R2| <- this is an example of a stream where RX
> represents updates of a particular resource. I need to keep the history of
> changes forever for all the resources but only until resource is alive. If
> resource expires/dies I'd like to remove it completely. In this example
> consider that resource R2 dies but others are still alive. In such case I'd
> like to able to transform this into:
> |R1|R1|R3|R4|R4|R1| <- so, it's not compaction because I need the history
> of changes but neither I can simply remove 'old' messages because I need to
> do this based of the lifecycle of the resource not just their age.
>
>
>
> Sincerely,
> Tomasz Kopacki
> DevOps Engineer @ Nokia
>
> -----Original Message-----
> From: Manikumar [mailto:manikumar.reddy@gmail.com]
> Sent: Wednesday, March 21, 2018 10:17 AM
> To: users@kafka.apache.org
> Subject: Re: log compaction v log rotation - best of the two worlds
>
> We can enable both compaction and retention for a topic by setting
> cleanup.policy="delete,compact"
> http://kafka.apache.org/documentation/#topicconfigs
>
> Does this handle your requirement?
>
> On Wed, Mar 21, 2018 at 2:36 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) <
> tomasz.kopacki@nokia.com> wrote:
>
> > Hi,
> > I've been recently exploring log handling in kafka and I wonder if/how
> > can I mixed log compaction with log rotation.
> > A little background first:
> > I have an application that uses kafka topics as a backend for event
> > sourcing. Messages represents change of state of my 'resources'. Each
> > resource has UID that is also used as a key for the messages.
> > My resources have a lifecycle and when their life ends I don't need
> > them anymore and there is no point in keeping their history. Having
> > said that I thought that best choice for me will be log compaction
> > with tombstone feature but I also would like to have a possibility to
> > keep history of changes for the resources(only until they die).
> > With those requirements I'd love to have a possibility to use
> > tombstone feature for log rotation but I guess it ain't working like
> that.
> >
> > Does anyone here had similar requirements and solve that somehow ?
> >
> >
> > Sincerely,
> > Tomasz Kopacki
> > DevOps Engineer @ Nokia
> >
> >
>

RE: log compaction v log rotation - best of the two worlds

Posted by "Kopacki, Tomasz (Nokia - PL/Wroclaw)" <to...@nokia.com>.
Almost,
Consider this example:

|R1|R2|R1|R3|R2|R4|R4|R1|R2| <- this is an example of a stream where RX represents updates of a particular resource. I need to keep the history of changes forever for all the resources but only until resource is alive. If resource expires/dies I'd like to remove it completely. In this example consider that resource R2 dies but others are still alive. In such case I'd like to able to transform this into:
|R1|R1|R3|R4|R4|R1| <- so, it's not compaction because I need the history of changes but neither I can simply remove 'old' messages because I need to do this based of the lifecycle of the resource not just their age.



Sincerely, 
Tomasz Kopacki
DevOps Engineer @ Nokia

-----Original Message-----
From: Manikumar [mailto:manikumar.reddy@gmail.com] 
Sent: Wednesday, March 21, 2018 10:17 AM
To: users@kafka.apache.org
Subject: Re: log compaction v log rotation - best of the two worlds

We can enable both compaction and retention for a topic by setting cleanup.policy="delete,compact"
http://kafka.apache.org/documentation/#topicconfigs

Does this handle your requirement?

On Wed, Mar 21, 2018 at 2:36 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) < tomasz.kopacki@nokia.com> wrote:

> Hi,
> I've been recently exploring log handling in kafka and I wonder if/how 
> can I mixed log compaction with log rotation.
> A little background first:
> I have an application that uses kafka topics as a backend for event 
> sourcing. Messages represents change of state of my 'resources'. Each 
> resource has UID that is also used as a key for the messages.
> My resources have a lifecycle and when their life ends I don't need 
> them anymore and there is no point in keeping their history. Having 
> said that I thought that best choice for me will be log compaction 
> with tombstone feature but I also would like to have a possibility to 
> keep history of changes for the resources(only until they die).
> With those requirements I'd love to have a possibility to use 
> tombstone feature for log rotation but I guess it ain't working like that.
>
> Does anyone here had similar requirements and solve that somehow ?
>
>
> Sincerely,
> Tomasz Kopacki
> DevOps Engineer @ Nokia
>
>

Re: log compaction v log rotation - best of the two worlds

Posted by Manikumar <ma...@gmail.com>.
We can enable both compaction and retention for a topic by
setting cleanup.policy="delete,compact"
http://kafka.apache.org/documentation/#topicconfigs

Does this handle your requirement?

On Wed, Mar 21, 2018 at 2:36 PM, Kopacki, Tomasz (Nokia - PL/Wroclaw) <
tomasz.kopacki@nokia.com> wrote:

> Hi,
> I've been recently exploring log handling in kafka and I wonder if/how can
> I mixed log compaction with log rotation.
> A little background first:
> I have an application that uses kafka topics as a backend for event
> sourcing. Messages represents change of state of my 'resources'. Each
> resource has UID that is also used as a key for the messages.
> My resources have a lifecycle and when their life ends I don't need them
> anymore and there is no point in keeping their history. Having said that I
> thought that best choice for me will be log compaction with tombstone
> feature but I also would like to have a possibility to keep history of
> changes for the resources(only until they die).
> With those requirements I'd love to have a possibility to use tombstone
> feature for log rotation but I guess it ain't working like that.
>
> Does anyone here had similar requirements and solve that somehow ?
>
>
> Sincerely,
> Tomasz Kopacki
> DevOps Engineer @ Nokia
>
>