You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "B. Todd Burruss" <bb...@real.com> on 2011/01/27 07:22:03 UTC
repair cause large number of SSTABLEs
i ran out of file handles on the "repairing node" after doing nodetool
repair - strange as i have never had this issue until using 0.7.0 (but i
should say that i have not truly tested 0.7.0 until now.) up'ed the
number of file handles, removed data, restarted nodes, then restarted my
test. waited a little while. i have two keyspaces on the cluster, so i
checked the number of SSTABLES in one of them before "nodetool repair"
and i see 36 "data.db" files, spread over 11 column families. very
reasonable.
after running nodetool repair i have over 900 "data.db" files,
immediately! now after waiting several hours i have over 1500 data.db
files. out of these i have 95 "compacted" files
lsof reporting 803 files in use by cassandra for the "Queues" keyspace ...
[cassandra@kv-app02 ~]$ /usr/sbin/lsof -p 32645|grep Data.db|grep -c Queues
803
.. this doesn't sound right to me. checking the server log i see a lot
of these messages:
ERROR [RequestResponseStage:14] 2011-01-26 17:00:29,493
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.fastRemove(ArrayList.java:441)
at java.util.ArrayList.remove(ArrayList.java:424)
at
com.google.common.collect.AbstractMultimap.remove(AbstractMultimap.java:219)
at
com.google.common.collect.ArrayListMultimap.remove(ArrayListMultimap.java:60)
at
org.apache.cassandra.net.MessagingService.responseReceivedFrom(MessagingService.java:436)
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:40)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
and a lot of these:
ERROR [ReadStage:809] 2011-01-26 21:48:01,047
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.ArrayIndexOutOfBoundsException
ERROR [ReadStage:809] 2011-01-26 21:48:01,047
AbstractCassandraDaemon.java (line 91) Fatal exception in thread
Thread[ReadStage:809,5,main]
java.lang.ArrayIndexOutOfBoundsException
and some more like this:
ERROR [ReadStage:15] 2011-01-26 20:59:14,695
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.ArrayIndexOutOfBoundsException: 6
at
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56)
at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45)
at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29)
at
org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:98)
at
org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:95)
at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:334)
at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)
at
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Re: repair cause large number of SSTABLEs
Posted by aaron morton <aa...@thelastpickle.com>.
The ArrayIndexOutOfBounds in the ReadStage looks like it can happen if a key is not of the expected type. Could the comparator for the CF have changed ?
The error in the RequestResponseStage may be the race condition identified here https://issues.apache.org/jira/browse/CASSANDRA-1959
Aaron
On 27 Jan 2011, at 19:22, B. Todd Burruss wrote:
> i ran out of file handles on the "repairing node" after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. waited a little while. i have two keyspaces on the cluster, so i checked the number of SSTABLES in one of them before "nodetool repair" and i see 36 "data.db" files, spread over 11 column families. very reasonable.
>
> after running nodetool repair i have over 900 "data.db" files, immediately! now after waiting several hours i have over 1500 data.db files. out of these i have 95 "compacted" files
>
> lsof reporting 803 files in use by cassandra for the "Queues" keyspace ...
>
> [cassandra@kv-app02 ~]$ /usr/sbin/lsof -p 32645|grep Data.db|grep -c Queues
> 803
>
> .. this doesn't sound right to me. checking the server log i see a lot of these messages:
>
> ERROR [RequestResponseStage:14] 2011-01-26 17:00:29,493 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.fastRemove(ArrayList.java:441)
> at java.util.ArrayList.remove(ArrayList.java:424)
> at com.google.common.collect.AbstractMultimap.remove(AbstractMultimap.java:219)
> at com.google.common.collect.ArrayListMultimap.remove(ArrayListMultimap.java:60)
> at org.apache.cassandra.net.MessagingService.responseReceivedFrom(MessagingService.java:436)
> at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:40)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
>
>
> and a lot of these:
>
> ERROR [ReadStage:809] 2011-01-26 21:48:01,047 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.ArrayIndexOutOfBoundsException
> ERROR [ReadStage:809] 2011-01-26 21:48:01,047 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[ReadStage:809,5,main]
> java.lang.ArrayIndexOutOfBoundsException
>
> and some more like this:
> ERROR [ReadStage:15] 2011-01-26 20:59:14,695 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.ArrayIndexOutOfBoundsException: 6
> at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56)
> at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45)
> at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29)
> at org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:98)
> at org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:95)
> at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:334)
> at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
> at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
> at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)
> at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)
> at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)
> at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
> at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
> at org.apache.cassandra.db.Table.getRow(Table.java:384)
> at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
> at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
>
Re: repair cause large number of SSTABLEs
Posted by "B. Todd Burruss" <bb...@real.com>.
[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Data.db
| grep -c -v "\-tmp\-"
824
[cassandra@kv-app02 ~]$ ls -l
/data/cassandra-data/data/Queues/*-tmp-*Data.db | wc -l
829
[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Comp* |
wc -l
247
On 01/27/2011 11:14 AM, Stu Hood wrote:
>
> When the destination node fails to open the streamed SSTable, we
> assume it was corrupted during transfer, and retry the stream.
> Independent of the exception posted above, it is a problem that the
> failed transfers were not cleaned up.
>
> How many of the data files are marked as -tmp-?
>
> On Jan 27, 2011 9:00 AM, "B. Todd Burruss" <bburruss@real.com
> <ma...@real.com>> wrote:
> > ok thx. what about the repair creating hundreds of new sstables and
> > lsof showing cassandra using currently over 800 Data.db files? is this
> > normal?
> >
> > On 01/27/2011 08:40 AM, Brandon Williams wrote:
> >> On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss <bburruss@real.com
> <ma...@real.com>
> >> <mailto:bburruss@real.com <ma...@real.com>>> wrote:
> >>
> >> thx, but i didn't do anything like removing/adding nodes. just
> >> did a "nodetool repair" after running for an hour or so on a clean
> >> install
> >>
> >>
> >> It affects anything that involves streaming.
> >>
> >> -Brandon
Re: repair cause large number of SSTABLEs
Posted by Stu Hood <st...@gmail.com>.
When the destination node fails to open the streamed SSTable, we assume it
was corrupted during transfer, and retry the stream. Independent of the
exception posted above, it is a problem that the failed transfers were not
cleaned up.
How many of the data files are marked as -tmp-?
On Jan 27, 2011 9:00 AM, "B. Todd Burruss" <bb...@real.com> wrote:
> ok thx. what about the repair creating hundreds of new sstables and
> lsof showing cassandra using currently over 800 Data.db files? is this
> normal?
>
> On 01/27/2011 08:40 AM, Brandon Williams wrote:
>> On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss <bburruss@real.com
>> <ma...@real.com>> wrote:
>>
>> thx, but i didn't do anything like removing/adding nodes. just
>> did a "nodetool repair" after running for an hour or so on a clean
>> install
>>
>>
>> It affects anything that involves streaming.
>>
>> -Brandon
Re: repair cause large number of SSTABLEs
Posted by "B. Todd Burruss" <bb...@real.com>.
ok thx. what about the repair creating hundreds of new sstables and
lsof showing cassandra using currently over 800 Data.db files? is this
normal?
On 01/27/2011 08:40 AM, Brandon Williams wrote:
> On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss <bburruss@real.com
> <ma...@real.com>> wrote:
>
> thx, but i didn't do anything like removing/adding nodes. just
> did a "nodetool repair" after running for an hour or so on a clean
> install
>
>
> It affects anything that involves streaming.
>
> -Brandon
Re: repair cause large number of SSTABLEs
Posted by Brandon Williams <dr...@gmail.com>.
On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss <bb...@real.com> wrote:
> thx, but i didn't do anything like removing/adding nodes. just did a
> "nodetool repair" after running for an hour or so on a clean install
>
It affects anything that involves streaming.
-Brandon
RE: repair cause large number of SSTABLEs
Posted by Todd Burruss <bb...@real.com>.
thx, but i didn't do anything like removing/adding nodes. just did a "nodetool repair" after running for an hour or so on a clean install
________________________________________
From: Matthew Conway [matt@backupify.com]
Sent: Thursday, January 27, 2011 8:17 AM
To: user@cassandra.apache.org
Subject: Re: repair cause large number of SSTABLEs
Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ?
On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote:
i ran out of file handles on the "repairing node" after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. waited a little while. i have two keyspaces on the cluster, so i checked the number of SSTABLES in one of them before "nodetool repair" and i see 36 "data.db" files, spread over 11 column families. very reasonable.
Re: repair cause large number of SSTABLEs
Posted by Matthew Conway <ma...@backupify.com>.
Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ?
On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote:
> i ran out of file handles on the "repairing node" after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. waited a little while. i have two keyspaces on the cluster, so i checked the number of SSTABLES in one of them before "nodetool repair" and i see 36 "data.db" files, spread over 11 column families. very reasonable.