You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Pierre-Yves Ritschard <py...@spootnik.org> on 2015/03/10 13:51:37 UTC

Log compaction recover strategy

Hi kafka,

I've started implementing simple materialized views with the log
compaction feature to test it out, and it works great. I'll share the
code and an accompanying article shortly but first wanted to discuss
some of the production implications my sandbox has.

I've separated the project in two components:

- An HTTP API which reads off of a memory cache (in this case: redis)
and produces mutations on a kafka topic
- A worker which consumes the stream and materializes the view in redis.

I have a single entity so, the materialization is a very simple process,
which maintains a set of all entity keys and store entity content in
keys. In redis, a create or update maps to a SADD and SET, a delete maps
to a SREM and a DEL.

I'm now considering the production implications this has and have a few
questions:

- How do you typically handle workers starting, always start at offset 0
to make sure the view is correctly recreated ?
- How do you handle topology changes in consumers, which lead to a
redistribution of key across them ?
- Is there a valid mechanism to know the log is being reconsumed and to
let the client layer know of this ?

Congrats on getting log compaction in, this feature opens up a ton of
reliability improvements for us :-)

  - pyr

Re: Log compaction recover strategy

Posted by Pierre-Yves Ritschard <py...@spootnik.org>.


On 03/10/2015 05:48 PM, Mayuresh Gharat wrote:
> How do you typically handle workers starting, always start at offset 0
> to make sure the view is correctly recreated ?
> ---> You will have to reset the offsets to 0 and the offset reset policy to
> earliest in consumer.

Yup, as expected.
> 
> How do you handle topology changes in consumers, which lead to a
> redistribution of key across them ?
> ---> Can you explain what exactly do you want to handle here?
>
This is a non-issue, sorry about that.

> Is there a valid mechanism to know the log is being reconsumed and to
> let the client layer know of this ?
> ---> I suppose you will have to maintain this in your application by
> checking keeping track of offset that was consumed in the past and the
> offset currently being consumed.

Thanks Mayuresh!

  - pyr

Re: Log compaction recover strategy

Posted by Mayuresh Gharat <gh...@gmail.com>.

How do you typically handle workers starting, always start at offset 0
to make sure the view is correctly recreated ?
---> You will have to reset the offsets to 0 and the offset reset policy to
earliest in consumer.

How do you handle topology changes in consumers, which lead to a
redistribution of key across them ?
---> Can you explain what exactly do you want to handle here?

Is there a valid mechanism to know the log is being reconsumed and to
let the client layer know of this ?
---> I suppose you will have to maintain this in your application by
checking keeping track of offset that was consumed in the past and the
offset currently being consumed.


Thanks,

Mayuresh

On Tue, Mar 10, 2015 at 5:51 AM, Pierre-Yves Ritschard <py...@spootnik.org>
wrote:

> Hi kafka,
>
> I've started implementing simple materialized views with the log
> compaction feature to test it out, and it works great. I'll share the
> code and an accompanying article shortly but first wanted to discuss
> some of the production implications my sandbox has.
>
> I've separated the project in two components:
>
> - An HTTP API which reads off of a memory cache (in this case: redis)
> and produces mutations on a kafka topic
> - A worker which consumes the stream and materializes the view in redis.
>
> I have a single entity so, the materialization is a very simple process,
> which maintains a set of all entity keys and store entity content in
> keys. In redis, a create or update maps to a SADD and SET, a delete maps
> to a SREM and a DEL.
>
> I'm now considering the production implications this has and have a few
> questions:
>
> - How do you typically handle workers starting, always start at offset 0
> to make sure the view is correctly recreated ?
> - How do you handle topology changes in consumers, which lead to a
> redistribution of key across them ?
> - Is there a valid mechanism to know the log is being reconsumed and to
> let the client layer know of this ?
>
> Congrats on getting log compaction in, this feature opens up a ton of
> reliability improvements for us :-)
>
>   - pyr
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125