You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Fares Oueslati <ou...@gmail.com> on 2020/03/09 15:58:00 UTC

Question about log flusher real frequency

Hello,

By default, both log.flush.interval.ms and log.flush.interval.messages are
set to Long.MAX_VALUE.

As I understand, it makes Kafka flush log to disk (fsync) only depends on
file system.

Is there any simple way to monitor that frequency ?

Is there a rule of thumb to estimate that value depending on the os ?

Thank you guys !
Fares

Re: Question about log flusher real frequency

Posted by Fares Oueslati <ou...@gmail.com>.
Hi Alexandre,

Thank you for your quick answer.

I want to monitor it cause I'm trying to find out the reason why our
existing Kafka cluster is configured to flush data every10 milliseconds!
(people who configured it are not available anymore to answer).

As that value seems really low to me, I was trying to understand and to
monitor the "flush behaviour".

Fares

On Mon, Mar 9, 2020 at 5:24 PM Alexandre Dupriez <
alexandre.dupriez@gmail.com> wrote:

> Hi Fares,
>
> On Linux kernels, you can use the property "dirty_writeback_centisecs"
> [1] to configure the period between executions of kswapd, which does
> this "sync" job. The period is usually set to 30 seconds.
> There are few exceptions where Kafka explicitly forces a sync (via the
> force() method from the I/O API of the JDK), e.g. when a segment is
> rolled or Kafka shutting down.
>
> The page writeback activity from your kernel is monitorable at
> different levels of granularity and depending on the instrumentation
> you are willing to use.
>
> Why would you want to monitor this activity in the first place? Do you
> want to know exactly *when* your data is on the disk?
>
> [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt
>
> Le lun. 9 mars 2020 à 15:58, Fares Oueslati <ou...@gmail.com> a
> écrit :
> >
> > Hello,
> >
> > By default, both log.flush.interval.ms and log.flush.interval.messages
> are
> > set to Long.MAX_VALUE.
> >
> > As I understand, it makes Kafka flush log to disk (fsync) only depends on
> > file system.
> >
> > Is there any simple way to monitor that frequency ?
> >
> > Is there a rule of thumb to estimate that value depending on the os ?
> >
> > Thank you guys !
> > Fares
>

Re: Question about log flusher real frequency

Posted by Fares Oueslati <ou...@gmail.com>.
Hi Alexandre,

Thank you for your quick answer.

I want to monitor it cause I'm trying to find out the reason why our
existing Kafka cluster is configured to flush data every10 milliseconds!
(people who configured it are not available anymore to answer).

As that value seems really low to me, I was trying to understand and to
monitor the "flush behaviour".

Fares

On Mon, Mar 9, 2020 at 5:24 PM Alexandre Dupriez <
alexandre.dupriez@gmail.com> wrote:

> Hi Fares,
>
> On Linux kernels, you can use the property "dirty_writeback_centisecs"
> [1] to configure the period between executions of kswapd, which does
> this "sync" job. The period is usually set to 30 seconds.
> There are few exceptions where Kafka explicitly forces a sync (via the
> force() method from the I/O API of the JDK), e.g. when a segment is
> rolled or Kafka shutting down.
>
> The page writeback activity from your kernel is monitorable at
> different levels of granularity and depending on the instrumentation
> you are willing to use.
>
> Why would you want to monitor this activity in the first place? Do you
> want to know exactly *when* your data is on the disk?
>
> [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt
>
> Le lun. 9 mars 2020 à 15:58, Fares Oueslati <ou...@gmail.com> a
> écrit :
> >
> > Hello,
> >
> > By default, both log.flush.interval.ms and log.flush.interval.messages
> are
> > set to Long.MAX_VALUE.
> >
> > As I understand, it makes Kafka flush log to disk (fsync) only depends on
> > file system.
> >
> > Is there any simple way to monitor that frequency ?
> >
> > Is there a rule of thumb to estimate that value depending on the os ?
> >
> > Thank you guys !
> > Fares
>

Re: Question about log flusher real frequency

Posted by Alexandre Dupriez <al...@gmail.com>.
Hi Fares,

On Linux kernels, you can use the property "dirty_writeback_centisecs"
[1] to configure the period between executions of kswapd, which does
this "sync" job. The period is usually set to 30 seconds.
There are few exceptions where Kafka explicitly forces a sync (via the
force() method from the I/O API of the JDK), e.g. when a segment is
rolled or Kafka shutting down.

The page writeback activity from your kernel is monitorable at
different levels of granularity and depending on the instrumentation
you are willing to use.

Why would you want to monitor this activity in the first place? Do you
want to know exactly *when* your data is on the disk?

[1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Le lun. 9 mars 2020 à 15:58, Fares Oueslati <ou...@gmail.com> a écrit :
>
> Hello,
>
> By default, both log.flush.interval.ms and log.flush.interval.messages are
> set to Long.MAX_VALUE.
>
> As I understand, it makes Kafka flush log to disk (fsync) only depends on
> file system.
>
> Is there any simple way to monitor that frequency ?
>
> Is there a rule of thumb to estimate that value depending on the os ?
>
> Thank you guys !
> Fares