You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Parag Patel <pp...@clearpoolgroup.com> on 2014/04/09 12:06:52 UTC

Commitlog questions


1)      Why is the default 4GB?  Has anyone changed this? What are some aspects to consider when determining the commitlog size?

2)      If the commitlog is in periodic mode, there is a property to set a time interval to flush the incoming mutations to disk.  This implies that there is a queue inside Cassandra to hold this data in memory until it is flushed.

a.       Is there a name for this queue?

b.      Is there a limit for this queue?

c.       Are there any tuning parameters for this queue?

Thanks,
Parag

Re: Commitlog questions

Posted by Russell Hatch <rh...@datastax.com>.
>
>  If the commitlog is in periodic mode and the fsync happens every 10
> seconds, Cassandra is storing the stuff that needs to be sync'd somewhere
> for a period of 10 seconds.  I'm talking about before it even hits any
> disk.  This has to be in memory, correct?


The information you are referring to is stored in the OS page cache[1] so
it's not part of Cassandra's memory, though I imagine Cassandra will keep a
small handle of some kind on the mutation for making the system fsync[2]
call when appropriate.

[1] http://en.wikipedia.org/wiki/Page_cache
[2] http://linux.die.net/man/2/fsync

Thanks,

Russ


On Thu, Apr 10, 2014 at 1:11 PM, Parag Patel <pp...@clearpoolgroup.com>wrote:

> Oleg,
>
> Thanks for the response.  If the commitlog is in periodic mode and the
> fsync happens every 10 seconds, Cassandra is storing the stuff that needs
> to be sync'd somewhere for a period of 10 seconds.  I'm talking about
> before it even hits any disk.  This has to be in memory, correct?
>
> Parag
>
> -----Original Message-----
> From: Oleg Dulin [mailto:oleg.dulin@gmail.com]
> Sent: Wednesday, April 09, 2014 10:42 AM
> To: user@cassandra.apache.org
> Subject: Re: Commitlog questions
>
> Parag:
>
> To answer your questions:
>
> 1) Default is just that, a default. I wouldn't advise raising it though.
> The bigger it is the longer it takes to restart the node.
> 2) I think they juse use fsync. There is no queue. All files in cassandra
> use java.nio buffers, but they need to be fsynced periodically. Look at
> commitlog_sync parameters in cassandra.yaml file, the comments there
> explain how it works. I believe the difference between periodic and batch
> is just that -- if it is periodic, it will fsync every 10 seconds, if it is
> batch it will fsync if there were any changes within a time window.
>
> On 2014-04-09 10:06:52 +0000, Parag Patel said:
>
> >
> >>>>> 1)      Why is the default 4GB?  Has anyone changed this? What are
> >>>>> some aspects to consider when determining the commitlog size?
> >>>>> 2)      If the commitlog is in periodic mode, there is a property
> >>>>> to set a time interval to flush the incoming mutations to disk.
> >>>>> This implies that there is a queue inside Cassandra to hold this
> >>>>> data in memory until it is flushed.
> >>>>>>>>> a.       Is there a name for this queue?
> >>>>>>>>> b.      Is there a limit for this queue?
> >>>>>>>>> c.       Are there any tuning parameters for this queue?
> >
> > Thanks,
> > Parag
>
>
> --
> Regards,
> Oleg Dulin
> http://www.olegdulin.com
>
>
>

RE: Commitlog questions

Posted by Parag Patel <pp...@clearpoolgroup.com>.
Oleg,

Thanks for the response.  If the commitlog is in periodic mode and the fsync happens every 10 seconds, Cassandra is storing the stuff that needs to be sync'd somewhere for a period of 10 seconds.  I'm talking about before it even hits any disk.  This has to be in memory, correct?

Parag

-----Original Message-----
From: Oleg Dulin [mailto:oleg.dulin@gmail.com] 
Sent: Wednesday, April 09, 2014 10:42 AM
To: user@cassandra.apache.org
Subject: Re: Commitlog questions

Parag:

To answer your questions:

1) Default is just that, a default. I wouldn't advise raising it though. The bigger it is the longer it takes to restart the node.
2) I think they juse use fsync. There is no queue. All files in cassandra use java.nio buffers, but they need to be fsynced periodically. Look at commitlog_sync parameters in cassandra.yaml file, the comments there explain how it works. I believe the difference between periodic and batch is just that -- if it is periodic, it will fsync every 10 seconds, if it is batch it will fsync if there were any changes within a time window.

On 2014-04-09 10:06:52 +0000, Parag Patel said:

>  
>>>>> 1)      Why is the default 4GB?  Has anyone changed this? What are 
>>>>> some aspects to consider when determining the commitlog size?
>>>>> 2)      If the commitlog is in periodic mode, there is a property 
>>>>> to set a time interval to flush the incoming mutations to disk.  
>>>>> This implies that there is a queue inside Cassandra to hold this 
>>>>> data in memory until it is flushed.
>>>>>>>>> a.       Is there a name for this queue?
>>>>>>>>> b.      Is there a limit for this queue?
>>>>>>>>> c.       Are there any tuning parameters for this queue?
>  
> Thanks,
> Parag


--
Regards,
Oleg Dulin
http://www.olegdulin.com



Re: Commitlog questions

Posted by Oleg Dulin <ol...@gmail.com>.
Parag:

To answer your questions:

1) Default is just that, a default. I wouldn't advise raising it 
though. The bigger it is the longer it takes to restart the node.
2) I think they juse use fsync. There is no queue. All files in 
cassandra use java.nio buffers, but they need to be fsynced 
periodically. Look at commitlog_sync parameters in cassandra.yaml file, 
the comments there explain how it works. I believe the difference 
between periodic and batch is just that -- if it is periodic, it will 
fsync every 10 seconds, if it is batch it will fsync if there were any 
changes within a time window.

On 2014-04-09 10:06:52 +0000, Parag Patel said:

>  
>>>>> 1)      Why is the default 4GB?  Has anyone changed this? What are some 
>>>>> aspects to consider when determining the commitlog size?
>>>>> 2)      If the commitlog is in periodic mode, there is a property to 
>>>>> set a time interval to flush the incoming mutations to disk.  This 
>>>>> implies that there is a queue inside Cassandra to hold this data in 
>>>>> memory until it is flushed.
>>>>>>>>> a.       Is there a name for this queue?
>>>>>>>>> b.      Is there a limit for this queue?
>>>>>>>>> c.       Are there any tuning parameters for this queue?
>  
> Thanks,
> Parag


-- 
Regards,
Oleg Dulin
http://www.olegdulin.com



Re: Commitlog questions

Posted by Panagiotis Garefalakis <pa...@gmail.com>.
The incoming mutations are written per column in a Memtable (an in memory
cache) . The default size for this table is 64MB if I can recall correctly.
For more information take a look here:
https://wiki.apache.org/cassandra/MemtableSSTable
http://wiki.apache.org/cassandra/MemtableThresholds

Regards,
Panagiotis


On Wed, Apr 9, 2014 at 8:44 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Apr 9, 2014 at 3:06 AM, Parag Patel <pp...@clearpoolgroup.com>wrote:
>
>>   <some questions about the commitlog and related assumptions>
>>
>
> https://issues.apache.org/jira/browse/CASSANDRA-6764
>
> You might wish to get in contact with the reporter here, who has similar
> questions!
>
> =Rob
>
>

Re: Commitlog questions

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Apr 9, 2014 at 3:06 AM, Parag Patel <pp...@clearpoolgroup.com>wrote:

>   <some questions about the commitlog and related assumptions>
>

https://issues.apache.org/jira/browse/CASSANDRA-6764

You might wish to get in contact with the reporter here, who has similar
questions!

=Rob