You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by David Boxenhorn <da...@lookin2.com> on 2010/07/04 09:34:20 UTC

Write assurance in Cassandra

As I understand it, when you write to Cassandra, you are assured that, if
successful, the new data has been written to a log file - so that if there
is a crash your data is safe. Is this correct?

If the above is correct, there is something going on that I don't
understand. Are the log files to which the data is first written the ones
that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
The reason I ask is that when I write a lot of data, nothing seems to change
in the commitlog directory for a long time, then at some point the log files
in this directory get updated. It looks to me like there's memory caching
involved, and the new data is not being immediately written to disk. What is
going on?

Re: Write assurance in Cassandra

Posted by David Boxenhorn <da...@lookin2.com>.
Yes, it was. I was dumping data from Oracle into Cassandra.

On Sun, Jul 4, 2010 at 11:11 AM, Andrew Rollins <an...@localytics.com>wrote:

> Is your IO under heavy load? If it is, that may be the cause, otherwise I'm
> not sure what causes significant lag. On Linux I like to use "iostat -tx 10"
> to check IO.
>
> - Andrew
>
>
> On Sun, Jul 4, 2010 at 4:04 AM, David Boxenhorn <da...@lookin2.com> wrote:
>
>> Thank you very much! I now understand things much better.
>>
>> However, my configuration is as follows:
>>
>>   <CommitLogSync>periodic</CommitLogSync>
>>   <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>
>>
>> So I should see my commit log change after 10,000 milliseconds = 10
>> seconds? It seems to take much longer to show up.
>>
>> On Sun, Jul 4, 2010 at 10:52 AM, Andrew Rollins <an...@localytics.com>wrote:
>>
>>> By default Cassandra syncs the commit log to disk periodically, so if you
>>> are looking at file sizes, you won't see the most up to date numbers. This
>>> is just like how if you tail a file that isn't flushing frequently, you
>>> might wait a little while before you see the updates.
>>>
>>> In periodic mode, Cassandra acknowledges the write to the client
>>> immediately (even before it is synced). You can run Cassandra in batch mode
>>> instead, which basically means it writes in batches *and* it won't
>>> acknowledge the writes to the client until it has actually synced. I'm still
>>> somewhat new to this, but that's my understanding.
>>>
>>> Have a look at CommitLogSync in your storage-conf.xml for more info about
>>> setting up syncing periods.
>>>
>>> As an aside, I'm not sure why the "ack immediately" or "ack after sync"
>>> setting is piggybacked on the periodic vs batch setting. At first glance it
>>> seems like concepts should be independent of one another.
>>>
>>> - Andrew
>>>
>>>
>>> On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn <da...@lookin2.com>wrote:
>>>
>>>> As I understand it, when you write to Cassandra, you are assured that,
>>>> if successful, the new data has been written to a log file - so that if
>>>> there is a crash your data is safe. Is this correct?
>>>>
>>>> If the above is correct, there is something going on that I don't
>>>> understand. Are the log files to which the data is first written the ones
>>>> that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
>>>> The reason I ask is that when I write a lot of data, nothing seems to change
>>>> in the commitlog directory for a long time, then at some point the log files
>>>> in this directory get updated. It looks to me like there's memory caching
>>>> involved, and the new data is not being immediately written to disk. What is
>>>> going on?
>>>>
>>>
>>>
>>
>

Re: Write assurance in Cassandra

Posted by Andrew Rollins <an...@localytics.com>.
Is your IO under heavy load? If it is, that may be the cause, otherwise I'm
not sure what causes significant lag. On Linux I like to use "iostat -tx 10"
to check IO.

- Andrew


On Sun, Jul 4, 2010 at 4:04 AM, David Boxenhorn <da...@lookin2.com> wrote:

> Thank you very much! I now understand things much better.
>
> However, my configuration is as follows:
>
>   <CommitLogSync>periodic</CommitLogSync>
>   <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>
>
> So I should see my commit log change after 10,000 milliseconds = 10
> seconds? It seems to take much longer to show up.
>
> On Sun, Jul 4, 2010 at 10:52 AM, Andrew Rollins <an...@localytics.com>wrote:
>
>> By default Cassandra syncs the commit log to disk periodically, so if you
>> are looking at file sizes, you won't see the most up to date numbers. This
>> is just like how if you tail a file that isn't flushing frequently, you
>> might wait a little while before you see the updates.
>>
>> In periodic mode, Cassandra acknowledges the write to the client
>> immediately (even before it is synced). You can run Cassandra in batch mode
>> instead, which basically means it writes in batches *and* it won't
>> acknowledge the writes to the client until it has actually synced. I'm still
>> somewhat new to this, but that's my understanding.
>>
>> Have a look at CommitLogSync in your storage-conf.xml for more info about
>> setting up syncing periods.
>>
>> As an aside, I'm not sure why the "ack immediately" or "ack after sync"
>> setting is piggybacked on the periodic vs batch setting. At first glance it
>> seems like concepts should be independent of one another.
>>
>> - Andrew
>>
>>
>> On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn <da...@lookin2.com>wrote:
>>
>>> As I understand it, when you write to Cassandra, you are assured that, if
>>> successful, the new data has been written to a log file - so that if there
>>> is a crash your data is safe. Is this correct?
>>>
>>> If the above is correct, there is something going on that I don't
>>> understand. Are the log files to which the data is first written the ones
>>> that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
>>> The reason I ask is that when I write a lot of data, nothing seems to change
>>> in the commitlog directory for a long time, then at some point the log files
>>> in this directory get updated. It looks to me like there's memory caching
>>> involved, and the new data is not being immediately written to disk. What is
>>> going on?
>>>
>>
>>
>

Re: Write assurance in Cassandra

Posted by David Boxenhorn <da...@lookin2.com>.
Thank you very much! I now understand things much better.

However, my configuration is as follows:

  <CommitLogSync>periodic</CommitLogSync>
  <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>

So I should see my commit log change after 10,000 milliseconds = 10 seconds?
It seems to take much longer to show up.

On Sun, Jul 4, 2010 at 10:52 AM, Andrew Rollins <an...@localytics.com>wrote:

> By default Cassandra syncs the commit log to disk periodically, so if you
> are looking at file sizes, you won't see the most up to date numbers. This
> is just like how if you tail a file that isn't flushing frequently, you
> might wait a little while before you see the updates.
>
> In periodic mode, Cassandra acknowledges the write to the client
> immediately (even before it is synced). You can run Cassandra in batch mode
> instead, which basically means it writes in batches *and* it won't
> acknowledge the writes to the client until it has actually synced. I'm still
> somewhat new to this, but that's my understanding.
>
> Have a look at CommitLogSync in your storage-conf.xml for more info about
> setting up syncing periods.
>
> As an aside, I'm not sure why the "ack immediately" or "ack after sync"
> setting is piggybacked on the periodic vs batch setting. At first glance it
> seems like concepts should be independent of one another.
>
> - Andrew
>
>
> On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn <da...@lookin2.com> wrote:
>
>> As I understand it, when you write to Cassandra, you are assured that, if
>> successful, the new data has been written to a log file - so that if there
>> is a crash your data is safe. Is this correct?
>>
>> If the above is correct, there is something going on that I don't
>> understand. Are the log files to which the data is first written the ones
>> that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
>> The reason I ask is that when I write a lot of data, nothing seems to change
>> in the commitlog directory for a long time, then at some point the log files
>> in this directory get updated. It looks to me like there's memory caching
>> involved, and the new data is not being immediately written to disk. What is
>> going on?
>>
>
>

Re: Write assurance in Cassandra

Posted by Andrew Rollins <an...@localytics.com>.
By default Cassandra syncs the commit log to disk periodically, so if you
are looking at file sizes, you won't see the most up to date numbers. This
is just like how if you tail a file that isn't flushing frequently, you
might wait a little while before you see the updates.

In periodic mode, Cassandra acknowledges the write to the client immediately
(even before it is synced). You can run Cassandra in batch mode instead,
which basically means it writes in batches *and* it won't acknowledge the
writes to the client until it has actually synced. I'm still somewhat new to
this, but that's my understanding.

Have a look at CommitLogSync in your storage-conf.xml for more info about
setting up syncing periods.

As an aside, I'm not sure why the "ack immediately" or "ack after sync"
setting is piggybacked on the periodic vs batch setting. At first glance it
seems like concepts should be independent of one another.

- Andrew


On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn <da...@lookin2.com> wrote:

> As I understand it, when you write to Cassandra, you are assured that, if
> successful, the new data has been written to a log file - so that if there
> is a crash your data is safe. Is this correct?
>
> If the above is correct, there is something going on that I don't
> understand. Are the log files to which the data is first written the ones
> that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
> The reason I ask is that when I write a lot of data, nothing seems to change
> in the commitlog directory for a long time, then at some point the log files
> in this directory get updated. It looks to me like there's memory caching
> involved, and the new data is not being immediately written to disk. What is
> going on?
>