You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Chuan-Heng Hsiao <hs...@gmail.com> on 2012/11/17 02:30:48 UTC

huge commitlog

hi Cassandra Developers,

I am experiencing huge commitlog size (200+G) after inserting huge
amount of data.
It is a 4-node cluster with RF= 3, and currently each has 200+G commit
log (so there are around 1T commit log in total)

The setting of commitlog_total_space_in_mb is default.

I am using 1.1.6.

I did not do nodetool cleanup and nodetool flush yet, but
I did nodetool repair -pr for each column family.

There is 1 huge column family (around 68G in data_file_directories),
and 18 mid-huge column family (around 1G in data_file_directories)
and around 700 mini column family (around 10M in data_file_directories)

I am wondering whether the huge commitlog size is the expected behavior or not?
and how can we reduce the size of commitlog?

Sincerely,
Hsiao

Re: huge commitlog

Posted by aaron morton <aa...@thelastpickle.com>.

Can you please create a ticket for this  on https://issues.apache.org/jira/browse/CASSANDRA

Thanks

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/11/2012, at 1:58 PM, Chuan-Heng Hsiao <hs...@gmail.com> wrote:

> Hi Aaron,
> 
> Thank you very much for replying.
> 
> From the log, it seems the the ERROR happens when trying to flush
> memtable with secondary index.
> (When inserting the data, I set the default value as '' for all
> pre-defined columns.
> it's for programming convenience.)
> 
> The following is the log:
> 
> INFO [OptionalTasks:1] 2012-11-13 14:24:20,650 ColumnFamilyStore.java
> (line 659) Enqueuing flush of
> Memtable-(some_cf).(some_cf)_(some_idx)_idx_1@1216346401(485/8360
> serialized/live bytes, 24 ops)
> ERROR [FlushWriter:2123] 2012-11-13 14:24:20,650
> AbstractCassandraDaemon.java (line 135) Exception in thread
> Thread[FlushWriter:2123,5,main]
> java.lang.AssertionError: Keys must not be empty
>        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
>        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
>        at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295)
>        at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48)
>        at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:722)
> 
> 
> INFO [FlushWriter:2125] 2012-11-13 14:24:20,651 Memtable.java (line
> 264) Writing Memtable-(some_cf).(some_cf)_(some_idx2)_idx_1@272356994(485/2426
> serialized/live bytes, 24 ops)
> ERROR [FlushWriter:2125] 2012-11-13 14:24:20,652
> AbstractCassandraDaemon.java (line 135) Exception in thread
> Thread[FlushWriter
> :2125,5,main]
> java.lang.AssertionError: Keys must not be empty
>        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
>        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
>        at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295)
>        at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48)
>        at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:722)
> 
> Sincerely,
> Hsiao
> 
> 
> On Mon, Nov 26, 2012 at 3:52 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> I checked the log, and found some ERROR about network problems,
>> and some ERROR about "Keys must not be empty".
>> 
>> Do you have the full error stack ?
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 25/11/2012, at 4:14 AM, Chuan-Heng Hsiao <hs...@gmail.com>
>> wrote:
>> 
>> Hi Cassandra Devs,
>> 
>> After trying to setup the same settings (and importing same data)
>> to the 3 VMs on the same machine instead of 3 physical machines,
>> so far I couldn't replicate the exploded-commitlog situation.
>> 
>> On my 4-physical-machine setting, everything seems to be
>> back to normal (commitlog size is less than the expected max setting)
>> after restarting the nodes.
>> 
>> This time the size of the commitlog of one node is set as 4G, and the
>> others are set as 8G.
>> 
>> Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at
>> 8G).
>> I checked the log, and found some ERROR about network problems,
>> and some ERROR about "Keys must not be empty".
>> 
>> I suspect that besides the network problems,
>> the "Keys must not be empty" ERROR may be the main reason why
>> the commitlog continues growing.
>> (I've already ensured that the Keys must not be empty in my code,
>> so the problem may be raised when syncing internally in cassandra.)
>> 
>> I restarted the 4G node as 8G node. Because there is no huge traffic since
>> then, I am not sure whether increasing the commitlog size will
>> solve/reduce this problem or not yet.
>> I'll keep you posted once the commitlog get expldoed again.
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>> On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao
>> <hs...@gmail.com> wrote:
>> 
>> I have RF = 3. Read/Write consistency has already been set as TWO.
>> 
>> It did seem that the data were not consistent yet.
>> (There are some CFs that I expected empty after the operations, but still
>> got some data, and the number of data were decreasing after retrying
>> to get all data
>> from that CF)
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>> On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tu...@tupshin.com>
>> wrote:
>> 
>> What consistency level are you writing with? If you were writing with ANY,
>> try writing with a higher consistency level.
>> 
>> -Tupshin
>> 
>> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hs...@gmail.com>
>> wrote:
>> 
>> 
>> Hi Aaron,
>> 
>> Thank you very much for the replying.
>> 
>> The 700 CFs were created in the beginning (before any insertion.)
>> 
>> I did not do anything with commitlog_archiving.properties, so I guess
>> I was not using commit log archiving.
>> 
>> What I did was doing a lot of insertions (and some deletions)
>> using another 4 machines with 32 processes in total.
>> (There are 4 nodes in my setting, so there are 8 machines in total)
>> 
>> I did see huge logs in /var/log/cassandra after such huge amount of
>> insertions.
>> Right now I  can't distinguish whether single insertion also cause huge
>> logs.
>> 
>> nodetool flush hanged (maybe because of 200G+ commitlog)
>> 
>> Because these machines are not in production (guaranteed no more
>> insertion/deletion)
>> I ended up restarting cassandra one node each time, the commitlog
>> shrinked back to
>> 4G. I am doing repair on each node now.
>> 
>> I'll try to re-import and keep logs when the commitlog increases insanely
>> again.
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>> 
>> I am wondering whether the huge commitlog size is the expected behavior
>> or
>> not?
>> 
>> Nope.
>> 
>> Did you notice the large log size during or after the inserts ?
>> If after did the size settle ?
>> Are you using commit log archiving ? (in commitlog_archiving.properties)
>> 
>> and around 700 mini column family (around 10M in data_file_directories)
>> 
>> Can you describe how you created the 700 CF's ?
>> 
>> and how can we reduce the size of commitlog?
>> 
>> As a work around nodetool flush should checkpoint the log.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hs...@gmail.com>
>> wrote:
>> 
>> hi Cassandra Developers,
>> 
>> I am experiencing huge commitlog size (200+G) after inserting huge
>> amount of data.
>> It is a 4-node cluster with RF= 3, and currently each has 200+G commit
>> log (so there are around 1T commit log in total)
>> 
>> The setting of commitlog_total_space_in_mb is default.
>> 
>> I am using 1.1.6.
>> 
>> I did not do nodetool cleanup and nodetool flush yet, but
>> I did nodetool repair -pr for each column family.
>> 
>> There is 1 huge column family (around 68G in data_file_directories),
>> and 18 mid-huge column family (around 1G in data_file_directories)
>> and around 700 mini column family (around 10M in data_file_directories)
>> 
>> I am wondering whether the huge commitlog size is the expected behavior
>> or
>> not?
>> and how can we reduce the size of commitlog?
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>>

Re: huge commitlog

Posted by Chuan-Heng Hsiao <hs...@gmail.com>.

Hi Aaron,

Thank you very much for replying.

>From the log, it seems the the ERROR happens when trying to flush
memtable with secondary index.
(When inserting the data, I set the default value as '' for all
pre-defined columns.
 it's for programming convenience.)

The following is the log:

 INFO [OptionalTasks:1] 2012-11-13 14:24:20,650 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-(some_cf).(some_cf)_(some_idx)_idx_1@1216346401(485/8360
serialized/live bytes, 24 ops)
ERROR [FlushWriter:2123] 2012-11-13 14:24:20,650
AbstractCassandraDaemon.java (line 135) Exception in thread
Thread[FlushWriter:2123,5,main]
java.lang.AssertionError: Keys must not be empty
        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
        at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295)
        at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48)
        at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)


 INFO [FlushWriter:2125] 2012-11-13 14:24:20,651 Memtable.java (line
264) Writing Memtable-(some_cf).(some_cf)_(some_idx2)_idx_1@272356994(485/2426
serialized/live bytes, 24 ops)
ERROR [FlushWriter:2125] 2012-11-13 14:24:20,652
AbstractCassandraDaemon.java (line 135) Exception in thread
Thread[FlushWriter
:2125,5,main]
java.lang.AssertionError: Keys must not be empty
        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
        at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295)
        at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48)
        at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

Sincerely,
Hsiao


On Mon, Nov 26, 2012 at 3:52 AM, aaron morton <aa...@thelastpickle.com> wrote:
> I checked the log, and found some ERROR about network problems,
> and some ERROR about "Keys must not be empty".
>
> Do you have the full error stack ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/11/2012, at 4:14 AM, Chuan-Heng Hsiao <hs...@gmail.com>
> wrote:
>
> Hi Cassandra Devs,
>
> After trying to setup the same settings (and importing same data)
> to the 3 VMs on the same machine instead of 3 physical machines,
> so far I couldn't replicate the exploded-commitlog situation.
>
> On my 4-physical-machine setting, everything seems to be
> back to normal (commitlog size is less than the expected max setting)
> after restarting the nodes.
>
> This time the size of the commitlog of one node is set as 4G, and the
> others are set as 8G.
>
> Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at
> 8G).
> I checked the log, and found some ERROR about network problems,
> and some ERROR about "Keys must not be empty".
>
> I suspect that besides the network problems,
> the "Keys must not be empty" ERROR may be the main reason why
> the commitlog continues growing.
> (I've already ensured that the Keys must not be empty in my code,
> so the problem may be raised when syncing internally in cassandra.)
>
> I restarted the 4G node as 8G node. Because there is no huge traffic since
> then, I am not sure whether increasing the commitlog size will
> solve/reduce this problem or not yet.
> I'll keep you posted once the commitlog get expldoed again.
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao
> <hs...@gmail.com> wrote:
>
> I have RF = 3. Read/Write consistency has already been set as TWO.
>
> It did seem that the data were not consistent yet.
> (There are some CFs that I expected empty after the operations, but still
> got some data, and the number of data were decreasing after retrying
> to get all data
> from that CF)
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tu...@tupshin.com>
> wrote:
>
> What consistency level are you writing with? If you were writing with ANY,
> try writing with a higher consistency level.
>
> -Tupshin
>
> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hs...@gmail.com>
> wrote:
>
>
> Hi Aaron,
>
> Thank you very much for the replying.
>
> The 700 CFs were created in the beginning (before any insertion.)
>
> I did not do anything with commitlog_archiving.properties, so I guess
> I was not using commit log archiving.
>
> What I did was doing a lot of insertions (and some deletions)
> using another 4 machines with 32 processes in total.
> (There are 4 nodes in my setting, so there are 8 machines in total)
>
> I did see huge logs in /var/log/cassandra after such huge amount of
> insertions.
> Right now I  can't distinguish whether single insertion also cause huge
> logs.
>
> nodetool flush hanged (maybe because of 200G+ commitlog)
>
> Because these machines are not in production (guaranteed no more
> insertion/deletion)
> I ended up restarting cassandra one node each time, the commitlog
> shrinked back to
> 4G. I am doing repair on each node now.
>
> I'll try to re-import and keep logs when the commitlog increases insanely
> again.
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>
> I am wondering whether the huge commitlog size is the expected behavior
> or
> not?
>
> Nope.
>
> Did you notice the large log size during or after the inserts ?
> If after did the size settle ?
> Are you using commit log archiving ? (in commitlog_archiving.properties)
>
> and around 700 mini column family (around 10M in data_file_directories)
>
> Can you describe how you created the 700 CF's ?
>
> and how can we reduce the size of commitlog?
>
> As a work around nodetool flush should checkpoint the log.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hs...@gmail.com>
> wrote:
>
> hi Cassandra Developers,
>
> I am experiencing huge commitlog size (200+G) after inserting huge
> amount of data.
> It is a 4-node cluster with RF= 3, and currently each has 200+G commit
> log (so there are around 1T commit log in total)
>
> The setting of commitlog_total_space_in_mb is default.
>
> I am using 1.1.6.
>
> I did not do nodetool cleanup and nodetool flush yet, but
> I did nodetool repair -pr for each column family.
>
> There is 1 huge column family (around 68G in data_file_directories),
> and 18 mid-huge column family (around 1G in data_file_directories)
> and around 700 mini column family (around 10M in data_file_directories)
>
> I am wondering whether the huge commitlog size is the expected behavior
> or
> not?
> and how can we reduce the size of commitlog?
>
> Sincerely,
> Hsiao
>
>
>

Re: huge commitlog

Posted by aaron morton <aa...@thelastpickle.com>.

> I checked the log, and found some ERROR about network problems,
> and some ERROR about "Keys must not be empty".
Do you have the full error stack ?

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/11/2012, at 4:14 AM, Chuan-Heng Hsiao <hs...@gmail.com> wrote:

> Hi Cassandra Devs,
> 
> After trying to setup the same settings (and importing same data)
> to the 3 VMs on the same machine instead of 3 physical machines,
> so far I couldn't replicate the exploded-commitlog situation.
> 
> On my 4-physical-machine setting, everything seems to be
> back to normal (commitlog size is less than the expected max setting)
> after restarting the nodes.
> 
> This time the size of the commitlog of one node is set as 4G, and the
> others are set as 8G.
> 
> Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at 8G).
> I checked the log, and found some ERROR about network problems,
> and some ERROR about "Keys must not be empty".
> 
> I suspect that besides the network problems,
> the "Keys must not be empty" ERROR may be the main reason why
> the commitlog continues growing.
> (I've already ensured that the Keys must not be empty in my code,
> so the problem may be raised when syncing internally in cassandra.)
> 
> I restarted the 4G node as 8G node. Because there is no huge traffic since
> then, I am not sure whether increasing the commitlog size will
> solve/reduce this problem or not yet.
> I'll keep you posted once the commitlog get expldoed again.
> 
> Sincerely,
> Hsiao
> 
> 
> On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao
> <hs...@gmail.com> wrote:
>> I have RF = 3. Read/Write consistency has already been set as TWO.
>> 
>> It did seem that the data were not consistent yet.
>> (There are some CFs that I expected empty after the operations, but still
>> got some data, and the number of data were decreasing after retrying
>> to get all data
>> from that CF)
>> 
>> Sincerely,
>> Hsiao
>> 
>> 
>> On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tu...@tupshin.com> wrote:
>>> What consistency level are you writing with? If you were writing with ANY,
>>> try writing with a higher consistency level.
>>> 
>>> -Tupshin
>>> 
>>> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hs...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi Aaron,
>>>> 
>>>> Thank you very much for the replying.
>>>> 
>>>> The 700 CFs were created in the beginning (before any insertion.)
>>>> 
>>>> I did not do anything with commitlog_archiving.properties, so I guess
>>>> I was not using commit log archiving.
>>>> 
>>>> What I did was doing a lot of insertions (and some deletions)
>>>> using another 4 machines with 32 processes in total.
>>>> (There are 4 nodes in my setting, so there are 8 machines in total)
>>>> 
>>>> I did see huge logs in /var/log/cassandra after such huge amount of
>>>> insertions.
>>>> Right now I  can't distinguish whether single insertion also cause huge
>>>> logs.
>>>> 
>>>> nodetool flush hanged (maybe because of 200G+ commitlog)
>>>> 
>>>> Because these machines are not in production (guaranteed no more
>>>> insertion/deletion)
>>>> I ended up restarting cassandra one node each time, the commitlog
>>>> shrinked back to
>>>> 4G. I am doing repair on each node now.
>>>> 
>>>> I'll try to re-import and keep logs when the commitlog increases insanely
>>>> again.
>>>> 
>>>> Sincerely,
>>>> Hsiao
>>>> 
>>>> 
>>>> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com>
>>>> wrote:
>>>>> I am wondering whether the huge commitlog size is the expected behavior
>>>>> or
>>>>> not?
>>>>> 
>>>>> Nope.
>>>>> 
>>>>> Did you notice the large log size during or after the inserts ?
>>>>> If after did the size settle ?
>>>>> Are you using commit log archiving ? (in commitlog_archiving.properties)
>>>>> 
>>>>> and around 700 mini column family (around 10M in data_file_directories)
>>>>> 
>>>>> Can you describe how you created the 700 CF's ?
>>>>> 
>>>>> and how can we reduce the size of commitlog?
>>>>> 
>>>>> As a work around nodetool flush should checkpoint the log.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Cassandra Developer
>>>>> New Zealand
>>>>> 
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>> 
>>>>> On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hs...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> hi Cassandra Developers,
>>>>> 
>>>>> I am experiencing huge commitlog size (200+G) after inserting huge
>>>>> amount of data.
>>>>> It is a 4-node cluster with RF= 3, and currently each has 200+G commit
>>>>> log (so there are around 1T commit log in total)
>>>>> 
>>>>> The setting of commitlog_total_space_in_mb is default.
>>>>> 
>>>>> I am using 1.1.6.
>>>>> 
>>>>> I did not do nodetool cleanup and nodetool flush yet, but
>>>>> I did nodetool repair -pr for each column family.
>>>>> 
>>>>> There is 1 huge column family (around 68G in data_file_directories),
>>>>> and 18 mid-huge column family (around 1G in data_file_directories)
>>>>> and around 700 mini column family (around 10M in data_file_directories)
>>>>> 
>>>>> I am wondering whether the huge commitlog size is the expected behavior
>>>>> or
>>>>> not?
>>>>> and how can we reduce the size of commitlog?
>>>>> 
>>>>> Sincerely,
>>>>> Hsiao
>>>>> 
>>>>>

Re: huge commitlog

Posted by Chuan-Heng Hsiao <hs...@gmail.com>.

Hi Cassandra Devs,

After trying to setup the same settings (and importing same data)
to the 3 VMs on the same machine instead of 3 physical machines,
so far I couldn't replicate the exploded-commitlog situation.

On my 4-physical-machine setting, everything seems to be
back to normal (commitlog size is less than the expected max setting)
after restarting the nodes.

This time the size of the commitlog of one node is set as 4G, and the
others are set as 8G.

Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at 8G).
I checked the log, and found some ERROR about network problems,
and some ERROR about "Keys must not be empty".

I suspect that besides the network problems,
the "Keys must not be empty" ERROR may be the main reason why
the commitlog continues growing.
(I've already ensured that the Keys must not be empty in my code,
 so the problem may be raised when syncing internally in cassandra.)

I restarted the 4G node as 8G node. Because there is no huge traffic since
then, I am not sure whether increasing the commitlog size will
solve/reduce this problem or not yet.
I'll keep you posted once the commitlog get expldoed again.

Sincerely,
Hsiao


On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao
<hs...@gmail.com> wrote:
> I have RF = 3. Read/Write consistency has already been set as TWO.
>
> It did seem that the data were not consistent yet.
> (There are some CFs that I expected empty after the operations, but still
>  got some data, and the number of data were decreasing after retrying
> to get all data
>  from that CF)
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tu...@tupshin.com> wrote:
>> What consistency level are you writing with? If you were writing with ANY,
>> try writing with a higher consistency level.
>>
>> -Tupshin
>>
>> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hs...@gmail.com>
>> wrote:
>>>
>>> Hi Aaron,
>>>
>>> Thank you very much for the replying.
>>>
>>> The 700 CFs were created in the beginning (before any insertion.)
>>>
>>> I did not do anything with commitlog_archiving.properties, so I guess
>>> I was not using commit log archiving.
>>>
>>> What I did was doing a lot of insertions (and some deletions)
>>> using another 4 machines with 32 processes in total.
>>> (There are 4 nodes in my setting, so there are 8 machines in total)
>>>
>>> I did see huge logs in /var/log/cassandra after such huge amount of
>>> insertions.
>>> Right now I  can't distinguish whether single insertion also cause huge
>>> logs.
>>>
>>> nodetool flush hanged (maybe because of 200G+ commitlog)
>>>
>>> Because these machines are not in production (guaranteed no more
>>> insertion/deletion)
>>> I ended up restarting cassandra one node each time, the commitlog
>>> shrinked back to
>>> 4G. I am doing repair on each node now.
>>>
>>> I'll try to re-import and keep logs when the commitlog increases insanely
>>> again.
>>>
>>> Sincerely,
>>> Hsiao
>>>
>>>
>>> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com>
>>> wrote:
>>> > I am wondering whether the huge commitlog size is the expected behavior
>>> > or
>>> > not?
>>> >
>>> > Nope.
>>> >
>>> > Did you notice the large log size during or after the inserts ?
>>> > If after did the size settle ?
>>> > Are you using commit log archiving ? (in commitlog_archiving.properties)
>>> >
>>> > and around 700 mini column family (around 10M in data_file_directories)
>>> >
>>> > Can you describe how you created the 700 CF's ?
>>> >
>>> > and how can we reduce the size of commitlog?
>>> >
>>> > As a work around nodetool flush should checkpoint the log.
>>> >
>>> > Cheers
>>> >
>>> > -----------------
>>> > Aaron Morton
>>> > Freelance Cassandra Developer
>>> > New Zealand
>>> >
>>> > @aaronmorton
>>> > http://www.thelastpickle.com
>>> >
>>> > On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hs...@gmail.com>
>>> > wrote:
>>> >
>>> > hi Cassandra Developers,
>>> >
>>> > I am experiencing huge commitlog size (200+G) after inserting huge
>>> > amount of data.
>>> > It is a 4-node cluster with RF= 3, and currently each has 200+G commit
>>> > log (so there are around 1T commit log in total)
>>> >
>>> > The setting of commitlog_total_space_in_mb is default.
>>> >
>>> > I am using 1.1.6.
>>> >
>>> > I did not do nodetool cleanup and nodetool flush yet, but
>>> > I did nodetool repair -pr for each column family.
>>> >
>>> > There is 1 huge column family (around 68G in data_file_directories),
>>> > and 18 mid-huge column family (around 1G in data_file_directories)
>>> > and around 700 mini column family (around 10M in data_file_directories)
>>> >
>>> > I am wondering whether the huge commitlog size is the expected behavior
>>> > or
>>> > not?
>>> > and how can we reduce the size of commitlog?
>>> >
>>> > Sincerely,
>>> > Hsiao
>>> >
>>> >

Re: huge commitlog

Posted by Chuan-Heng Hsiao <hs...@gmail.com>.

I have RF = 3. Read/Write consistency has already been set as TWO.

It did seem that the data were not consistent yet.
(There are some CFs that I expected empty after the operations, but still
 got some data, and the number of data were decreasing after retrying
to get all data
 from that CF)

Sincerely,
Hsiao


On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tu...@tupshin.com> wrote:
> What consistency level are you writing with? If you were writing with ANY,
> try writing with a higher consistency level.
>
> -Tupshin
>
> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hs...@gmail.com>
> wrote:
>>
>> Hi Aaron,
>>
>> Thank you very much for the replying.
>>
>> The 700 CFs were created in the beginning (before any insertion.)
>>
>> I did not do anything with commitlog_archiving.properties, so I guess
>> I was not using commit log archiving.
>>
>> What I did was doing a lot of insertions (and some deletions)
>> using another 4 machines with 32 processes in total.
>> (There are 4 nodes in my setting, so there are 8 machines in total)
>>
>> I did see huge logs in /var/log/cassandra after such huge amount of
>> insertions.
>> Right now I  can't distinguish whether single insertion also cause huge
>> logs.
>>
>> nodetool flush hanged (maybe because of 200G+ commitlog)
>>
>> Because these machines are not in production (guaranteed no more
>> insertion/deletion)
>> I ended up restarting cassandra one node each time, the commitlog
>> shrinked back to
>> 4G. I am doing repair on each node now.
>>
>> I'll try to re-import and keep logs when the commitlog increases insanely
>> again.
>>
>> Sincerely,
>> Hsiao
>>
>>
>> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>> > I am wondering whether the huge commitlog size is the expected behavior
>> > or
>> > not?
>> >
>> > Nope.
>> >
>> > Did you notice the large log size during or after the inserts ?
>> > If after did the size settle ?
>> > Are you using commit log archiving ? (in commitlog_archiving.properties)
>> >
>> > and around 700 mini column family (around 10M in data_file_directories)
>> >
>> > Can you describe how you created the 700 CF's ?
>> >
>> > and how can we reduce the size of commitlog?
>> >
>> > As a work around nodetool flush should checkpoint the log.
>> >
>> > Cheers
>> >
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > New Zealand
>> >
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hs...@gmail.com>
>> > wrote:
>> >
>> > hi Cassandra Developers,
>> >
>> > I am experiencing huge commitlog size (200+G) after inserting huge
>> > amount of data.
>> > It is a 4-node cluster with RF= 3, and currently each has 200+G commit
>> > log (so there are around 1T commit log in total)
>> >
>> > The setting of commitlog_total_space_in_mb is default.
>> >
>> > I am using 1.1.6.
>> >
>> > I did not do nodetool cleanup and nodetool flush yet, but
>> > I did nodetool repair -pr for each column family.
>> >
>> > There is 1 huge column family (around 68G in data_file_directories),
>> > and 18 mid-huge column family (around 1G in data_file_directories)
>> > and around 700 mini column family (around 10M in data_file_directories)
>> >
>> > I am wondering whether the huge commitlog size is the expected behavior
>> > or
>> > not?
>> > and how can we reduce the size of commitlog?
>> >
>> > Sincerely,
>> > Hsiao
>> >
>> >

Re: huge commitlog

Posted by Tupshin Harper <tu...@tupshin.com>.

What consistency level are you writing with? If you were writing with ANY,
try writing with a higher consistency level.

-Tupshin
On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hs...@gmail.com>
wrote:

> Hi Aaron,
>
> Thank you very much for the replying.
>
> The 700 CFs were created in the beginning (before any insertion.)
>
> I did not do anything with commitlog_archiving.properties, so I guess
> I was not using commit log archiving.
>
> What I did was doing a lot of insertions (and some deletions)
> using another 4 machines with 32 processes in total.
> (There are 4 nodes in my setting, so there are 8 machines in total)
>
> I did see huge logs in /var/log/cassandra after such huge amount of
> insertions.
> Right now I  can't distinguish whether single insertion also cause huge
> logs.
>
> nodetool flush hanged (maybe because of 200G+ commitlog)
>
> Because these machines are not in production (guaranteed no more
> insertion/deletion)
> I ended up restarting cassandra one node each time, the commitlog
> shrinked back to
> 4G. I am doing repair on each node now.
>
> I'll try to re-import and keep logs when the commitlog increases insanely
> again.
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
> > I am wondering whether the huge commitlog size is the expected behavior
> or
> > not?
> >
> > Nope.
> >
> > Did you notice the large log size during or after the inserts ?
> > If after did the size settle ?
> > Are you using commit log archiving ? (in commitlog_archiving.properties)
> >
> > and around 700 mini column family (around 10M in data_file_directories)
> >
> > Can you describe how you created the 700 CF's ?
> >
> > and how can we reduce the size of commitlog?
> >
> > As a work around nodetool flush should checkpoint the log.
> >
> > Cheers
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > New Zealand
> >
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hs...@gmail.com>
> > wrote:
> >
> > hi Cassandra Developers,
> >
> > I am experiencing huge commitlog size (200+G) after inserting huge
> > amount of data.
> > It is a 4-node cluster with RF= 3, and currently each has 200+G commit
> > log (so there are around 1T commit log in total)
> >
> > The setting of commitlog_total_space_in_mb is default.
> >
> > I am using 1.1.6.
> >
> > I did not do nodetool cleanup and nodetool flush yet, but
> > I did nodetool repair -pr for each column family.
> >
> > There is 1 huge column family (around 68G in data_file_directories),
> > and 18 mid-huge column family (around 1G in data_file_directories)
> > and around 700 mini column family (around 10M in data_file_directories)
> >
> > I am wondering whether the huge commitlog size is the expected behavior
> or
> > not?
> > and how can we reduce the size of commitlog?
> >
> > Sincerely,
> > Hsiao
> >
> >
>

Re: huge commitlog

Posted by Chuan-Heng Hsiao <hs...@gmail.com>.

Hi Aaron,

Thank you very much for the replying.

The 700 CFs were created in the beginning (before any insertion.)

I did not do anything with commitlog_archiving.properties, so I guess
I was not using commit log archiving.

What I did was doing a lot of insertions (and some deletions)
using another 4 machines with 32 processes in total.
(There are 4 nodes in my setting, so there are 8 machines in total)

I did see huge logs in /var/log/cassandra after such huge amount of insertions.
Right now I  can't distinguish whether single insertion also cause huge logs.

nodetool flush hanged (maybe because of 200G+ commitlog)

Because these machines are not in production (guaranteed no more
insertion/deletion)
I ended up restarting cassandra one node each time, the commitlog
shrinked back to
4G. I am doing repair on each node now.

I'll try to re-import and keep logs when the commitlog increases insanely again.

Sincerely,
Hsiao

On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com> wrote:
> I am wondering whether the huge commitlog size is the expected behavior or
> not?
>
> Nope.
>
> Did you notice the large log size during or after the inserts ?
> If after did the size settle ?
> Are you using commit log archiving ? (in commitlog_archiving.properties)
>
> and around 700 mini column family (around 10M in data_file_directories)
>
> Can you describe how you created the 700 CF's ?
>
> and how can we reduce the size of commitlog?
>
> As a work around nodetool flush should checkpoint the log.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hs...@gmail.com>
> wrote:
>
> hi Cassandra Developers,
>
> I am experiencing huge commitlog size (200+G) after inserting huge
> amount of data.
> It is a 4-node cluster with RF= 3, and currently each has 200+G commit
> log (so there are around 1T commit log in total)
>
> The setting of commitlog_total_space_in_mb is default.
>
> I am using 1.1.6.
>
> I did not do nodetool cleanup and nodetool flush yet, but
> I did nodetool repair -pr for each column family.
>
> There is 1 huge column family (around 68G in data_file_directories),
> and 18 mid-huge column family (around 1G in data_file_directories)
> and around 700 mini column family (around 10M in data_file_directories)
>
> I am wondering whether the huge commitlog size is the expected behavior or
> not?
> and how can we reduce the size of commitlog?
>
> Sincerely,
> Hsiao
>
>

Re: huge commitlog

Posted by aaron morton <aa...@thelastpickle.com>.

> I am wondering whether the huge commitlog size is the expected behavior or not?
Nope. 

Did you notice the large log size during or after the inserts ? 
If after did the size settle ?
Are you using commit log archiving ? (in commitlog_archiving.properties)

> and around 700 mini column family (around 10M in data_file_directories)
Can you describe how you created the 700 CF's ? 

> and how can we reduce the size of commitlog?
As a work around nodetool flush should checkpoint the log. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hs...@gmail.com> wrote:

> hi Cassandra Developers,
> 
> I am experiencing huge commitlog size (200+G) after inserting huge
> amount of data.
> It is a 4-node cluster with RF= 3, and currently each has 200+G commit
> log (so there are around 1T commit log in total)
> 
> The setting of commitlog_total_space_in_mb is default.
> 
> I am using 1.1.6.
> 
> I did not do nodetool cleanup and nodetool flush yet, but
> I did nodetool repair -pr for each column family.
> 
> There is 1 huge column family (around 68G in data_file_directories),
> and 18 mid-huge column family (around 1G in data_file_directories)
> and around 700 mini column family (around 10M in data_file_directories)
> 
> I am wondering whether the huge commitlog size is the expected behavior or not?
> and how can we reduce the size of commitlog?
> 
> Sincerely,
> Hsiao