You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Vladimir Lebedev <wa...@fastmail.fm> on 2015/07/29 17:30:34 UTC

changelog compaction problem

Hello,

I have a problem with changelog in one of my samza jobs grows indefinitely.

The job is quite simple, it reads messages from the input kafka topic, 
and either creates or updates a key in task-local samza store. Once in a 
minute the window method kicks-in, it iterates over all keys in the 
store and deletes some of them, selecting on the contents of their value.

Message rate in input topic is about 3000 messages per second. The input 
topic is partitioned in 48 partitions. Average number of keys, kept in 
the store is more or less stable and do not exceed 10000 keys per task. 
Average size of values is 50 bytes. So I expected that sum of all 
segments' size in kafka data directory for the job's changelog topic 
should not exceed 10000*50*48 ~= 24Mbytes. In fact it is more than 2.5GB 
(after 6 days running from scratch) and it is growing.

I tried to change default segment size for changelog topic in kafka, and 
it worked a bit - instead of 500Mbyte segments I have now 50Mbyte 
segments, but it did not heal the indefinite data growth problem.

Moreover, if I stop the job and start it again it cannot restart, it 
breaks right after reading all records from changelog topic.

Did somebody have similar problem? How it could be resolved?

Best regards,
Vladimir

-- 
Vladimir Lebedev
wal@fastmail.fm


Re: changelog compaction problem

Posted by Roger Hoover <ro...@gmail.com>.
You also may want to check if the cleaner thread in the broker is still
alive (using jstack).  I've run into this issue and used the fix mentioned
in the ticket to get compaction working again.

https://issues.apache.org/jira/browse/KAFKA-1641
I'd just like to mention that a possible workaround (depending on your
situation in regard to keys) is to stop the broker, remove the cleaner
offset checkpoint, and then start the broker again for each ISR member in
serial to get the thread running again. Keep in mind that the cleaner will
start from the beginning if you do this.

On Wed, Jul 29, 2015 at 8:43 AM, Chinmay Soman <ch...@gmail.com>
wrote:

> Just curious,
>
> Can you double check if you have log compaction enabled on your Kafka
> brokers ?
>
> On Wed, Jul 29, 2015 at 8:30 AM, Vladimir Lebedev <wa...@fastmail.fm> wrote:
>
> > Hello,
> >
> > I have a problem with changelog in one of my samza jobs grows
> indefinitely.
> >
> > The job is quite simple, it reads messages from the input kafka topic,
> and
> > either creates or updates a key in task-local samza store. Once in a
> minute
> > the window method kicks-in, it iterates over all keys in the store and
> > deletes some of them, selecting on the contents of their value.
> >
> > Message rate in input topic is about 3000 messages per second. The input
> > topic is partitioned in 48 partitions. Average number of keys, kept in
> the
> > store is more or less stable and do not exceed 10000 keys per task.
> Average
> > size of values is 50 bytes. So I expected that sum of all segments' size
> in
> > kafka data directory for the job's changelog topic should not exceed
> > 10000*50*48 ~= 24Mbytes. In fact it is more than 2.5GB (after 6 days
> > running from scratch) and it is growing.
> >
> > I tried to change default segment size for changelog topic in kafka, and
> > it worked a bit - instead of 500Mbyte segments I have now 50Mbyte
> segments,
> > but it did not heal the indefinite data growth problem.
> >
> > Moreover, if I stop the job and start it again it cannot restart, it
> > breaks right after reading all records from changelog topic.
> >
> > Did somebody have similar problem? How it could be resolved?
> >
> > Best regards,
> > Vladimir
> >
> > --
> > Vladimir Lebedev
> > wal@fastmail.fm
> >
> >
>
>
> --
> Thanks and regards
>
> Chinmay Soman
>

Re: changelog compaction problem

Posted by Chinmay Soman <ch...@gmail.com>.
Just curious,

Can you double check if you have log compaction enabled on your Kafka
brokers ?

On Wed, Jul 29, 2015 at 8:30 AM, Vladimir Lebedev <wa...@fastmail.fm> wrote:

> Hello,
>
> I have a problem with changelog in one of my samza jobs grows indefinitely.
>
> The job is quite simple, it reads messages from the input kafka topic, and
> either creates or updates a key in task-local samza store. Once in a minute
> the window method kicks-in, it iterates over all keys in the store and
> deletes some of them, selecting on the contents of their value.
>
> Message rate in input topic is about 3000 messages per second. The input
> topic is partitioned in 48 partitions. Average number of keys, kept in the
> store is more or less stable and do not exceed 10000 keys per task. Average
> size of values is 50 bytes. So I expected that sum of all segments' size in
> kafka data directory for the job's changelog topic should not exceed
> 10000*50*48 ~= 24Mbytes. In fact it is more than 2.5GB (after 6 days
> running from scratch) and it is growing.
>
> I tried to change default segment size for changelog topic in kafka, and
> it worked a bit - instead of 500Mbyte segments I have now 50Mbyte segments,
> but it did not heal the indefinite data growth problem.
>
> Moreover, if I stop the job and start it again it cannot restart, it
> breaks right after reading all records from changelog topic.
>
> Did somebody have similar problem? How it could be resolved?
>
> Best regards,
> Vladimir
>
> --
> Vladimir Lebedev
> wal@fastmail.fm
>
>


-- 
Thanks and regards

Chinmay Soman