You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Vitalii Stoianov <vi...@gmail.com> on 2020/07/22 17:15:09 UTC

kafka tuning(vm.max_map_count) and logs retention.

Hi All,

According to this: https://docs.confluent.io/current/kafka/deployment.html
vm.max_map_count is depend on number of index file:
*find /tmp/kafka_logs -name '*index' | wc -l*

In our test lab we have next setup:

*Topic:test      PartitionCount:256      ReplicationFactor:2
Configs:segment.bytes=1073741824,retention.ms
<http://retention.ms>=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*

No cleanup.policy set explicitly for topic or in server.properties so I
assume default: delete according to
https://kafka.apache.org/23/documentation.html#brokerconfigs

I did a small script that counted the number of index files and for this
topic it is:
~638000.
Also if I check kafka log/data dir it contain some old log/index files
create date for which is older than 10 days.(retention for topic is one day)
Note: When i checked  log-cleaner.log it contains info only about cleanup
for compacted logs.

In order to set:  vm.max_map_count value correctly, I need to
understand the following:
Why do such old index/log files exist and not cleaned?
How properly set vm.max_map_count if index/logs is not freed on time ??

Regards,
Vitalii.

Re: kafka tuning(vm.max_map_count) and logs retention.

Posted by Vitalii Stoianov <vi...@gmail.com>.
Hi All,

I was checking it more and found this (we use librdkafka to put data into
kafka topics):

https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923

As docs say they use microseconds:
virtual ErrorCode  produce
<https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#a5d569225be5e98a016f889d54adf4e6c>
(const
std::string topic_name, int32_t partition, int msgflags, void *payload,
size_t len, const void *key, size_t key_len, int64_t timestamp, void
*msg_opaque)=0
  produce()
<https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923>
variant
that takes topic as a string (no need for creating a Topic
<https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Topic.html>
object),
and also allows providing the message timestamp (microseconds since
beginning of epoch, UTC). Otherwise identical to produce()
<https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923>
 above.

So am I looking into docs miss leading  and we still need to set
milliseconds?

Regards,
Vitalii.


On Thu, Jul 23, 2020 at 11:53 AM Vitalii Stoianov <
vitalii.stoianov.ua@gmail.com> wrote:

> Hi  Alexandre,
>
> According to kafka broker logs it happens even faster each 5-30 sec.
>
> Regards,
> Vitalii.
>
> On Thu, Jul 23, 2020 at 11:15 AM Alexandre Dupriez <
> alexandre.dupriez@gmail.com> wrote:
>
>> Hi Vitalii,
>>
>> The timestamps provided by your producers are in microseconds, whereas
>> Kafka expects milliseconds epochs. This could be the reason for
>> over-rolling. When you had the default roll time value of a week, did
>> you experience segment rolls every 15 minutes or so?
>>
>> Thanks,
>> Alexandre
>>
>> Le jeu. 23 juil. 2020 à 08:31, William Reynolds
>> <wi...@instaclustr.com> a écrit :
>> >
>> > Hi Vitali,
>> > When I ran into it it was latest time being very large. Until we could
>> get
>> > the messages set right we set segment.ms to maxint so it only rolled
>> based
>> > on size.
>> > Cheers
>> > William
>> >
>> > On Thu, 23 Jul 2020 at 4:46 pm, Vitalii Stoianov <
>> > vitalii.stoianov.ua@gmail.com> wrote:
>> >
>> > > Hi  William,
>> > >
>> > >
>> > > ./kafka-console-consumer.sh --bootstrap-server localhost:9092
>> --property
>> > > print.timestamp=true --topic test
>> > > One of the messages TS output:
>> > > CreateTime:1595485571406707 1595485026.850 1595485571.406
>> 216301538579718
>> > > {msg data}
>> > >
>> > > So which one of these is used to roll over a log segment?
>> > > I was trying to find some explanation on the web but with no luck.
>> > >
>> > > Regards,
>> > > Vitalii.
>> > >
>> > > On Thu, Jul 23, 2020 at 9:25 AM William Reynolds <
>> > > william.reynolds@instaclustr.com> wrote:
>> > >
>> > > > Hi Vitali,
>> > > > What are the timestamps in your message? I have seen this before
>> where
>> > > you
>> > > > have timestamps well into the future so every few messages causes a
>> log
>> > > > roll and you end up with a very large amount of log files.
>> > > >
>> > > > *William*
>> > > >
>> > > > On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov <
>> > > > vitalii.stoianov.ua@gmail.com> wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > I also have noticed that the number of log/index files are too
>> high and
>> > > > log
>> > > > > roll is happening more frequently than expected.
>> > > > > The log.roll.hours is default (168) and log.segment.bytes is 1g
>> and log
>> > > > > files size in the topic partition folders are usually smaller
>> than 1g.
>> > > > >
>> > > > > Regards,
>> > > > > Vitalii.
>> > > > >
>> > > > > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
>> > > > > vitalii.stoianov.ua@gmail.com> wrote:
>> > > > >
>> > > > > > Hi All,
>> > > > > >
>> > > > > > According to this:
>> > > > > https://docs.confluent.io/current/kafka/deployment.html
>> > > > > > vm.max_map_count is depend on number of index file:
>> > > > > > *find /tmp/kafka_logs -name '*index' | wc -l*
>> > > > > >
>> > > > > > In our test lab we have next setup:
>> > > > > >
>> > > > > > *Topic:test      PartitionCount:256      ReplicationFactor:2
>> > > > > > Configs:segment.bytes=1073741824,retention.ms
>> > > > > > <http://retention.ms
>> > > > >
>> > > >
>> > >
>> >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
>> > > > > >
>> > > > > > No cleanup.policy set explicitly for topic or in
>> server.properties
>> > > so I
>> > > > > > assume default: delete according to
>> > > > > > https://kafka.apache.org/23/documentation.html#brokerconfigs
>> > > > > >
>> > > > > > I did a small script that counted the number of index files and
>> for
>> > > > this
>> > > > > > topic it is:
>> > > > > > ~638000.
>> > > > > > Also if I check kafka log/data dir it contain some old log/index
>> > > files
>> > > > > > create date for which is older than 10 days.(retention for
>> topic is
>> > > one
>> > > > > day)
>> > > > > > Note: When i checked  log-cleaner.log it contains info only
>> about
>> > > > cleanup
>> > > > > > for compacted logs.
>> > > > > >
>> > > > > > In order to set:  vm.max_map_count value correctly, I need to
>> > > > > > understand the following:
>> > > > > > Why do such old index/log files exist and not cleaned?
>> > > > > > How properly set vm.max_map_count if index/logs is not freed on
>> time
>> > > ??
>> > > > > >
>> > > > > > Regards,
>> > > > > > Vitalii.
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > --
>> >
>> >
>> > *William Reynolds**Technical Operations Engineer*
>> >
>> >
>> > <https://www.facebook.com/instaclustr>   <
>> https://twitter.com/instaclustr>
>> > <https://www.linkedin.com/company/instaclustr>
>> >
>> > Read our latest technical blog posts here
>> > <https://www.instaclustr.com/blog/>.
>> >
>> > This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia)
>> > and Instaclustr Inc (USA).
>> >
>> > This email and any attachments may contain confidential and legally
>> > privileged information.  If you are not the intended recipient, do not
>> copy
>> > or disclose its content, but please reply to this email immediately and
>> > highlight the error to the sender and then immediately delete the
>> message.
>> >
>> > Instaclustr values your privacy. Our privacy policy can be found at
>> > https://www.instaclustr.com/company/policies/privacy-policy
>>
>

Re: kafka tuning(vm.max_map_count) and logs retention.

Posted by Vitalii Stoianov <vi...@gmail.com>.
Hi  Alexandre,

According to kafka broker logs it happens even faster each 5-30 sec.

Regards,
Vitalii.

On Thu, Jul 23, 2020 at 11:15 AM Alexandre Dupriez <
alexandre.dupriez@gmail.com> wrote:

> Hi Vitalii,
>
> The timestamps provided by your producers are in microseconds, whereas
> Kafka expects milliseconds epochs. This could be the reason for
> over-rolling. When you had the default roll time value of a week, did
> you experience segment rolls every 15 minutes or so?
>
> Thanks,
> Alexandre
>
> Le jeu. 23 juil. 2020 à 08:31, William Reynolds
> <wi...@instaclustr.com> a écrit :
> >
> > Hi Vitali,
> > When I ran into it it was latest time being very large. Until we could
> get
> > the messages set right we set segment.ms to maxint so it only rolled
> based
> > on size.
> > Cheers
> > William
> >
> > On Thu, 23 Jul 2020 at 4:46 pm, Vitalii Stoianov <
> > vitalii.stoianov.ua@gmail.com> wrote:
> >
> > > Hi  William,
> > >
> > >
> > > ./kafka-console-consumer.sh --bootstrap-server localhost:9092
> --property
> > > print.timestamp=true --topic test
> > > One of the messages TS output:
> > > CreateTime:1595485571406707 1595485026.850 1595485571.406
> 216301538579718
> > > {msg data}
> > >
> > > So which one of these is used to roll over a log segment?
> > > I was trying to find some explanation on the web but with no luck.
> > >
> > > Regards,
> > > Vitalii.
> > >
> > > On Thu, Jul 23, 2020 at 9:25 AM William Reynolds <
> > > william.reynolds@instaclustr.com> wrote:
> > >
> > > > Hi Vitali,
> > > > What are the timestamps in your message? I have seen this before
> where
> > > you
> > > > have timestamps well into the future so every few messages causes a
> log
> > > > roll and you end up with a very large amount of log files.
> > > >
> > > > *William*
> > > >
> > > > On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov <
> > > > vitalii.stoianov.ua@gmail.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I also have noticed that the number of log/index files are too
> high and
> > > > log
> > > > > roll is happening more frequently than expected.
> > > > > The log.roll.hours is default (168) and log.segment.bytes is 1g
> and log
> > > > > files size in the topic partition folders are usually smaller than
> 1g.
> > > > >
> > > > > Regards,
> > > > > Vitalii.
> > > > >
> > > > > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
> > > > > vitalii.stoianov.ua@gmail.com> wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > According to this:
> > > > > https://docs.confluent.io/current/kafka/deployment.html
> > > > > > vm.max_map_count is depend on number of index file:
> > > > > > *find /tmp/kafka_logs -name '*index' | wc -l*
> > > > > >
> > > > > > In our test lab we have next setup:
> > > > > >
> > > > > > *Topic:test      PartitionCount:256      ReplicationFactor:2
> > > > > > Configs:segment.bytes=1073741824,retention.ms
> > > > > > <http://retention.ms
> > > > >
> > > >
> > >
> >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
> > > > > >
> > > > > > No cleanup.policy set explicitly for topic or in
> server.properties
> > > so I
> > > > > > assume default: delete according to
> > > > > > https://kafka.apache.org/23/documentation.html#brokerconfigs
> > > > > >
> > > > > > I did a small script that counted the number of index files and
> for
> > > > this
> > > > > > topic it is:
> > > > > > ~638000.
> > > > > > Also if I check kafka log/data dir it contain some old log/index
> > > files
> > > > > > create date for which is older than 10 days.(retention for topic
> is
> > > one
> > > > > day)
> > > > > > Note: When i checked  log-cleaner.log it contains info only about
> > > > cleanup
> > > > > > for compacted logs.
> > > > > >
> > > > > > In order to set:  vm.max_map_count value correctly, I need to
> > > > > > understand the following:
> > > > > > Why do such old index/log files exist and not cleaned?
> > > > > > How properly set vm.max_map_count if index/logs is not freed on
> time
> > > ??
> > > > > >
> > > > > > Regards,
> > > > > > Vitalii.
> > > > > >
> > > > >
> > > >
> > >
> > --
> >
> >
> > *William Reynolds**Technical Operations Engineer*
> >
> >
> > <https://www.facebook.com/instaclustr>   <
> https://twitter.com/instaclustr>
> > <https://www.linkedin.com/company/instaclustr>
> >
> > Read our latest technical blog posts here
> > <https://www.instaclustr.com/blog/>.
> >
> > This email has been sent on behalf of Instaclustr Pty. Limited
> (Australia)
> > and Instaclustr Inc (USA).
> >
> > This email and any attachments may contain confidential and legally
> > privileged information.  If you are not the intended recipient, do not
> copy
> > or disclose its content, but please reply to this email immediately and
> > highlight the error to the sender and then immediately delete the
> message.
> >
> > Instaclustr values your privacy. Our privacy policy can be found at
> > https://www.instaclustr.com/company/policies/privacy-policy
>

Re: kafka tuning(vm.max_map_count) and logs retention.

Posted by Alexandre Dupriez <al...@gmail.com>.
Hi Vitalii,

The timestamps provided by your producers are in microseconds, whereas
Kafka expects milliseconds epochs. This could be the reason for
over-rolling. When you had the default roll time value of a week, did
you experience segment rolls every 15 minutes or so?

Thanks,
Alexandre

Le jeu. 23 juil. 2020 à 08:31, William Reynolds
<wi...@instaclustr.com> a écrit :
>
> Hi Vitali,
> When I ran into it it was latest time being very large. Until we could get
> the messages set right we set segment.ms to maxint so it only rolled based
> on size.
> Cheers
> William
>
> On Thu, 23 Jul 2020 at 4:46 pm, Vitalii Stoianov <
> vitalii.stoianov.ua@gmail.com> wrote:
>
> > Hi  William,
> >
> >
> > ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --property
> > print.timestamp=true --topic test
> > One of the messages TS output:
> > CreateTime:1595485571406707 1595485026.850 1595485571.406 216301538579718
> > {msg data}
> >
> > So which one of these is used to roll over a log segment?
> > I was trying to find some explanation on the web but with no luck.
> >
> > Regards,
> > Vitalii.
> >
> > On Thu, Jul 23, 2020 at 9:25 AM William Reynolds <
> > william.reynolds@instaclustr.com> wrote:
> >
> > > Hi Vitali,
> > > What are the timestamps in your message? I have seen this before where
> > you
> > > have timestamps well into the future so every few messages causes a log
> > > roll and you end up with a very large amount of log files.
> > >
> > > *William*
> > >
> > > On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov <
> > > vitalii.stoianov.ua@gmail.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I also have noticed that the number of log/index files are too high and
> > > log
> > > > roll is happening more frequently than expected.
> > > > The log.roll.hours is default (168) and log.segment.bytes is 1g and log
> > > > files size in the topic partition folders are usually smaller than 1g.
> > > >
> > > > Regards,
> > > > Vitalii.
> > > >
> > > > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
> > > > vitalii.stoianov.ua@gmail.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > According to this:
> > > > https://docs.confluent.io/current/kafka/deployment.html
> > > > > vm.max_map_count is depend on number of index file:
> > > > > *find /tmp/kafka_logs -name '*index' | wc -l*
> > > > >
> > > > > In our test lab we have next setup:
> > > > >
> > > > > *Topic:test      PartitionCount:256      ReplicationFactor:2
> > > > > Configs:segment.bytes=1073741824,retention.ms
> > > > > <http://retention.ms
> > > >
> > >
> > >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
> > > > >
> > > > > No cleanup.policy set explicitly for topic or in server.properties
> > so I
> > > > > assume default: delete according to
> > > > > https://kafka.apache.org/23/documentation.html#brokerconfigs
> > > > >
> > > > > I did a small script that counted the number of index files and for
> > > this
> > > > > topic it is:
> > > > > ~638000.
> > > > > Also if I check kafka log/data dir it contain some old log/index
> > files
> > > > > create date for which is older than 10 days.(retention for topic is
> > one
> > > > day)
> > > > > Note: When i checked  log-cleaner.log it contains info only about
> > > cleanup
> > > > > for compacted logs.
> > > > >
> > > > > In order to set:  vm.max_map_count value correctly, I need to
> > > > > understand the following:
> > > > > Why do such old index/log files exist and not cleaned?
> > > > > How properly set vm.max_map_count if index/logs is not freed on time
> > ??
> > > > >
> > > > > Regards,
> > > > > Vitalii.
> > > > >
> > > >
> > >
> >
> --
>
>
> *William Reynolds**Technical Operations Engineer*
>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> Instaclustr values your privacy. Our privacy policy can be found at
> https://www.instaclustr.com/company/policies/privacy-policy

Re: kafka tuning(vm.max_map_count) and logs retention.

Posted by William Reynolds <wi...@instaclustr.com>.
Hi Vitali,
When I ran into it it was latest time being very large. Until we could get
the messages set right we set segment.ms to maxint so it only rolled based
on size.
Cheers
William

On Thu, 23 Jul 2020 at 4:46 pm, Vitalii Stoianov <
vitalii.stoianov.ua@gmail.com> wrote:

> Hi  William,
>
>
> ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --property
> print.timestamp=true --topic test
> One of the messages TS output:
> CreateTime:1595485571406707 1595485026.850 1595485571.406 216301538579718
> {msg data}
>
> So which one of these is used to roll over a log segment?
> I was trying to find some explanation on the web but with no luck.
>
> Regards,
> Vitalii.
>
> On Thu, Jul 23, 2020 at 9:25 AM William Reynolds <
> william.reynolds@instaclustr.com> wrote:
>
> > Hi Vitali,
> > What are the timestamps in your message? I have seen this before where
> you
> > have timestamps well into the future so every few messages causes a log
> > roll and you end up with a very large amount of log files.
> >
> > *William*
> >
> > On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov <
> > vitalii.stoianov.ua@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I also have noticed that the number of log/index files are too high and
> > log
> > > roll is happening more frequently than expected.
> > > The log.roll.hours is default (168) and log.segment.bytes is 1g and log
> > > files size in the topic partition folders are usually smaller than 1g.
> > >
> > > Regards,
> > > Vitalii.
> > >
> > > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
> > > vitalii.stoianov.ua@gmail.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > According to this:
> > > https://docs.confluent.io/current/kafka/deployment.html
> > > > vm.max_map_count is depend on number of index file:
> > > > *find /tmp/kafka_logs -name '*index' | wc -l*
> > > >
> > > > In our test lab we have next setup:
> > > >
> > > > *Topic:test      PartitionCount:256      ReplicationFactor:2
> > > > Configs:segment.bytes=1073741824,retention.ms
> > > > <http://retention.ms
> > >
> >
> >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
> > > >
> > > > No cleanup.policy set explicitly for topic or in server.properties
> so I
> > > > assume default: delete according to
> > > > https://kafka.apache.org/23/documentation.html#brokerconfigs
> > > >
> > > > I did a small script that counted the number of index files and for
> > this
> > > > topic it is:
> > > > ~638000.
> > > > Also if I check kafka log/data dir it contain some old log/index
> files
> > > > create date for which is older than 10 days.(retention for topic is
> one
> > > day)
> > > > Note: When i checked  log-cleaner.log it contains info only about
> > cleanup
> > > > for compacted logs.
> > > >
> > > > In order to set:  vm.max_map_count value correctly, I need to
> > > > understand the following:
> > > > Why do such old index/log files exist and not cleaned?
> > > > How properly set vm.max_map_count if index/logs is not freed on time
> ??
> > > >
> > > > Regards,
> > > > Vitalii.
> > > >
> > >
> >
>
-- 


*William Reynolds**Technical Operations Engineer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy

Re: kafka tuning(vm.max_map_count) and logs retention.

Posted by Vitalii Stoianov <vi...@gmail.com>.
Hi  William,


./kafka-console-consumer.sh --bootstrap-server localhost:9092 --property
print.timestamp=true --topic test
One of the messages TS output:
CreateTime:1595485571406707 1595485026.850 1595485571.406 216301538579718
{msg data}

So which one of these is used to roll over a log segment?
I was trying to find some explanation on the web but with no luck.

Regards,
Vitalii.

On Thu, Jul 23, 2020 at 9:25 AM William Reynolds <
william.reynolds@instaclustr.com> wrote:

> Hi Vitali,
> What are the timestamps in your message? I have seen this before where you
> have timestamps well into the future so every few messages causes a log
> roll and you end up with a very large amount of log files.
>
> *William*
>
> On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov <
> vitalii.stoianov.ua@gmail.com> wrote:
>
> > Hi All,
> >
> > I also have noticed that the number of log/index files are too high and
> log
> > roll is happening more frequently than expected.
> > The log.roll.hours is default (168) and log.segment.bytes is 1g and log
> > files size in the topic partition folders are usually smaller than 1g.
> >
> > Regards,
> > Vitalii.
> >
> > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
> > vitalii.stoianov.ua@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > According to this:
> > https://docs.confluent.io/current/kafka/deployment.html
> > > vm.max_map_count is depend on number of index file:
> > > *find /tmp/kafka_logs -name '*index' | wc -l*
> > >
> > > In our test lab we have next setup:
> > >
> > > *Topic:test      PartitionCount:256      ReplicationFactor:2
> > > Configs:segment.bytes=1073741824,retention.ms
> > > <http://retention.ms
> >
> >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
> > >
> > > No cleanup.policy set explicitly for topic or in server.properties so I
> > > assume default: delete according to
> > > https://kafka.apache.org/23/documentation.html#brokerconfigs
> > >
> > > I did a small script that counted the number of index files and for
> this
> > > topic it is:
> > > ~638000.
> > > Also if I check kafka log/data dir it contain some old log/index files
> > > create date for which is older than 10 days.(retention for topic is one
> > day)
> > > Note: When i checked  log-cleaner.log it contains info only about
> cleanup
> > > for compacted logs.
> > >
> > > In order to set:  vm.max_map_count value correctly, I need to
> > > understand the following:
> > > Why do such old index/log files exist and not cleaned?
> > > How properly set vm.max_map_count if index/logs is not freed on time ??
> > >
> > > Regards,
> > > Vitalii.
> > >
> >
>

Re: kafka tuning(vm.max_map_count) and logs retention.

Posted by William Reynolds <wi...@instaclustr.com>.
Hi Vitali,
What are the timestamps in your message? I have seen this before where you
have timestamps well into the future so every few messages causes a log
roll and you end up with a very large amount of log files.

*William*

On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov <
vitalii.stoianov.ua@gmail.com> wrote:

> Hi All,
>
> I also have noticed that the number of log/index files are too high and log
> roll is happening more frequently than expected.
> The log.roll.hours is default (168) and log.segment.bytes is 1g and log
> files size in the topic partition folders are usually smaller than 1g.
>
> Regards,
> Vitalii.
>
> On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
> vitalii.stoianov.ua@gmail.com> wrote:
>
> > Hi All,
> >
> > According to this:
> https://docs.confluent.io/current/kafka/deployment.html
> > vm.max_map_count is depend on number of index file:
> > *find /tmp/kafka_logs -name '*index' | wc -l*
> >
> > In our test lab we have next setup:
> >
> > *Topic:test      PartitionCount:256      ReplicationFactor:2
> > Configs:segment.bytes=1073741824,retention.ms
> > <http://retention.ms
> >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
> >
> > No cleanup.policy set explicitly for topic or in server.properties so I
> > assume default: delete according to
> > https://kafka.apache.org/23/documentation.html#brokerconfigs
> >
> > I did a small script that counted the number of index files and for this
> > topic it is:
> > ~638000.
> > Also if I check kafka log/data dir it contain some old log/index files
> > create date for which is older than 10 days.(retention for topic is one
> day)
> > Note: When i checked  log-cleaner.log it contains info only about cleanup
> > for compacted logs.
> >
> > In order to set:  vm.max_map_count value correctly, I need to
> > understand the following:
> > Why do such old index/log files exist and not cleaned?
> > How properly set vm.max_map_count if index/logs is not freed on time ??
> >
> > Regards,
> > Vitalii.
> >
>

Re: kafka tuning(vm.max_map_count) and logs retention.

Posted by Vitalii Stoianov <vi...@gmail.com>.
Hi All,

I also have noticed that the number of log/index files are too high and log
roll is happening more frequently than expected.
The log.roll.hours is default (168) and log.segment.bytes is 1g and log
files size in the topic partition folders are usually smaller than 1g.

Regards,
Vitalii.

On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
vitalii.stoianov.ua@gmail.com> wrote:

> Hi All,
>
> According to this: https://docs.confluent.io/current/kafka/deployment.html
> vm.max_map_count is depend on number of index file:
> *find /tmp/kafka_logs -name '*index' | wc -l*
>
> In our test lab we have next setup:
>
> *Topic:test      PartitionCount:256      ReplicationFactor:2
> Configs:segment.bytes=1073741824,retention.ms
> <http://retention.ms>=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
>
> No cleanup.policy set explicitly for topic or in server.properties so I
> assume default: delete according to
> https://kafka.apache.org/23/documentation.html#brokerconfigs
>
> I did a small script that counted the number of index files and for this
> topic it is:
> ~638000.
> Also if I check kafka log/data dir it contain some old log/index files
> create date for which is older than 10 days.(retention for topic is one day)
> Note: When i checked  log-cleaner.log it contains info only about cleanup
> for compacted logs.
>
> In order to set:  vm.max_map_count value correctly, I need to
> understand the following:
> Why do such old index/log files exist and not cleaned?
> How properly set vm.max_map_count if index/logs is not freed on time ??
>
> Regards,
> Vitalii.
>