You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@asterixdb.apache.org by Chen Luo <cl...@uci.edu> on 2018/03/28 00:19:31 UTC

Current Default Log Buffer Size is Too Small

Hi Devs,

Recently I was doing ingestion experiments, and found out our default log
buffer size (1MB = 8 pages * 128KB page size) is too small, and negatively
impacts the ingestion performance. The short conclusion is that by simply
increasing the log buffer size (e.g., to 32MB), I can improve the ingestion
performance by *50% ~ 100%* on a single node sensorium machine as shown
follows.


The detailed explanation of log buffer size is as follows. Right now we
have a background LogFlusher thread which continuously forces log records
to disk. When the log buffer is full, writers are blocked to wait for log
buffer space. However, when setting the log buffer size, we have to
consider the LSM operations as well. The memory component is first filled
up with incoming records at a very high speed, which is then flushed to
disk at a relatively low speed. If the log buffer size is small, ingestion
is very likely to be blocked by the LogFlusher when filling up the memory
component. This blocking is wasted since quite often flush/merge is idle.
However, when the log buffer is relatively large, the LogFlush can catch up
itself when ingestion is blocked by flush/merge, which is not harmful since
there is ongoing LSM I/O operations.

I didn't know how large the log buffer size should be right now (as it
depends on various factors), but our default value *1MB* is very likely too
small to cause blocking during normal ingestion time. Just let you know and
be aware of this parameter when you measure ingestion performance...

Best regards,
Chen Luo

Re: Current Default Log Buffer Size is Too Small

Posted by Chen Luo <cl...@uci.edu>.

I'll make a table instead...

Page Size Write Throughput (MB/s)
64KB 1.61
128KB 3.14
256KB 5.69
512KB 9.52
1024KB 17.41
2048KB 28.29
4096KB 41.63
8192KB 56.26
16384KB 72.11


On Fri, Mar 30, 2018 at 3:25 PM, abdullah alamoudi <ba...@gmail.com>
wrote:

> Am I the only one who didn't get the image in the email?
>
> > On Mar 30, 2018, at 3:22 PM, Chen Luo <cl...@uci.edu> wrote:
> >
> > An update on this issue. It seems this speed-up comes from simply
> increasing the log page size (and I've submitted a patch
> https://asterix-gerrit.ics.uci.edu/#/c/2553/ <https://asterix-gerrit.ics.
> uci.edu/#/c/2553/>).
> >
> > I also wrote a simple program to test the write throughput w.r.t.
> different page sizes:
> >        for (int i = 0; i < numPages; i++) {
> >                 byteBuffer.rewind();
> >                 while (byteBuffer.hasRemaining()) {
> >                     totalBytesWriten += channel.write(byteBuffer);
> >                 }
> >                 channel.force(false);
> >             }
> >         }
> > It also confirms that varying page size can have a big impact on the
> disk throughput (even it's sequential I/Os). The experiment result on one
> of our sensorium node is as follows:
> >
> >
> >
> >
> > On Tue, Mar 27, 2018 at 5:19 PM, Chen Luo <cluo8@uci.edu <mailto:
> cluo8@uci.edu>> wrote:
> > Hi Devs,
> >
> > Recently I was doing ingestion experiments, and found out our default
> log buffer size (1MB = 8 pages * 128KB page size) is too small, and
> negatively impacts the ingestion performance. The short conclusion is that
> by simply increasing the log buffer size (e.g., to 32MB), I can improve the
> ingestion performance by 50% ~ 100% on a single node sensorium machine as
> shown follows.
> >
> >
> > The detailed explanation of log buffer size is as follows. Right now we
> have a background LogFlusher thread which continuously forces log records
> to disk. When the log buffer is full, writers are blocked to wait for log
> buffer space. However, when setting the log buffer size, we have to
> consider the LSM operations as well. The memory component is first filled
> up with incoming records at a very high speed, which is then flushed to
> disk at a relatively low speed. If the log buffer size is small, ingestion
> is very likely to be blocked by the LogFlusher when filling up the memory
> component. This blocking is wasted since quite often flush/merge is idle.
> However, when the log buffer is relatively large, the LogFlush can catch up
> itself when ingestion is blocked by flush/merge, which is not harmful since
> there is ongoing LSM I/O operations.
> >
> > I didn't know how large the log buffer size should be right now (as it
> depends on various factors), but our default value 1MB is very likely too
> small to cause blocking during normal ingestion time. Just let you know and
> be aware of this parameter when you measure ingestion performance...
> >
> > Best regards,
> > Chen Luo
> >
> >
>
>

Re: Current Default Log Buffer Size is Too Small

Posted by abdullah alamoudi <ba...@gmail.com>.

Am I the only one who didn't get the image in the email?

> On Mar 30, 2018, at 3:22 PM, Chen Luo <cl...@uci.edu> wrote:
> 
> An update on this issue. It seems this speed-up comes from simply increasing the log page size (and I've submitted a patch https://asterix-gerrit.ics.uci.edu/#/c/2553/ <https://asterix-gerrit.ics.uci.edu/#/c/2553/>).
> 
> I also wrote a simple program to test the write throughput w.r.t. different page sizes:
>        for (int i = 0; i < numPages; i++) {
>                 byteBuffer.rewind();
>                 while (byteBuffer.hasRemaining()) {
>                     totalBytesWriten += channel.write(byteBuffer);
>                 }
>                 channel.force(false);
>             }
>         }
> It also confirms that varying page size can have a big impact on the disk throughput (even it's sequential I/Os). The experiment result on one of our sensorium node is as follows:
> 
> 
> 
> 
> On Tue, Mar 27, 2018 at 5:19 PM, Chen Luo <cluo8@uci.edu <ma...@uci.edu>> wrote:
> Hi Devs,
> 
> Recently I was doing ingestion experiments, and found out our default log buffer size (1MB = 8 pages * 128KB page size) is too small, and negatively impacts the ingestion performance. The short conclusion is that by simply increasing the log buffer size (e.g., to 32MB), I can improve the ingestion performance by 50% ~ 100% on a single node sensorium machine as shown follows.
> 
> 
> The detailed explanation of log buffer size is as follows. Right now we have a background LogFlusher thread which continuously forces log records to disk. When the log buffer is full, writers are blocked to wait for log buffer space. However, when setting the log buffer size, we have to consider the LSM operations as well. The memory component is first filled up with incoming records at a very high speed, which is then flushed to disk at a relatively low speed. If the log buffer size is small, ingestion is very likely to be blocked by the LogFlusher when filling up the memory component. This blocking is wasted since quite often flush/merge is idle. However, when the log buffer is relatively large, the LogFlush can catch up itself when ingestion is blocked by flush/merge, which is not harmful since there is ongoing LSM I/O operations.
> 
> I didn't know how large the log buffer size should be right now (as it depends on various factors), but our default value 1MB is very likely too small to cause blocking during normal ingestion time. Just let you know and be aware of this parameter when you measure ingestion performance...
> 
> Best regards,
> Chen Luo
> 
>

Re: Current Default Log Buffer Size is Too Small

Posted by Chen Luo <cl...@uci.edu>.

An update on this issue. It seems this speed-up comes from simply
increasing the log page size (and I've submitted a patch
https://asterix-gerrit.ics.uci.edu/#/c/2553/).

I also wrote a simple program to test the write throughput w.r.t. different
page sizes:

       for (int i = 0; i < numPages; i++) {

                byteBuffer.rewind();

                while (byteBuffer.hasRemaining()) {

                    totalBytesWriten += channel.write(byteBuffer);

                }

                channel.force(false);

            }

        }

It also confirms that varying page size can have a big impact on the disk
throughput (even it's sequential I/Os). The experiment result on one of our
sensorium node is as follows:





On Tue, Mar 27, 2018 at 5:19 PM, Chen Luo <cl...@uci.edu> wrote:

> Hi Devs,
>
> Recently I was doing ingestion experiments, and found out our default log
> buffer size (1MB = 8 pages * 128KB page size) is too small, and negatively
> impacts the ingestion performance. The short conclusion is that by simply
> increasing the log buffer size (e.g., to 32MB), I can improve the ingestion
> performance by *50% ~ 100%* on a single node sensorium machine as shown
> follows.
>
>
> The detailed explanation of log buffer size is as follows. Right now we
> have a background LogFlusher thread which continuously forces log records
> to disk. When the log buffer is full, writers are blocked to wait for log
> buffer space. However, when setting the log buffer size, we have to
> consider the LSM operations as well. The memory component is first filled
> up with incoming records at a very high speed, which is then flushed to
> disk at a relatively low speed. If the log buffer size is small, ingestion
> is very likely to be blocked by the LogFlusher when filling up the memory
> component. This blocking is wasted since quite often flush/merge is idle.
> However, when the log buffer is relatively large, the LogFlush can catch up
> itself when ingestion is blocked by flush/merge, which is not harmful since
> there is ongoing LSM I/O operations.
>
> I didn't know how large the log buffer size should be right now (as it
> depends on various factors), but our default value *1MB* is very likely
> too small to cause blocking during normal ingestion time. Just let you know
> and be aware of this parameter when you measure ingestion performance...
>
> Best regards,
> Chen Luo
>
>