You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ashley Martens <am...@ngmoco.com> on 2011/10/05 19:42:03 UTC

0.7.9 RejectedExecutionException

I'm getting the following exception on a 0.7.9 node before the node crashes.
I don't have this problem with the other nodes running 0.7.8. Does anyone
know what the problem is?

ERROR [Thread-47] 2011-10-05 05:07:03,840 AbstractCassandraDaemon.java (line
133) Fatal exception in thread Thread[Thread-47,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
    at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
    at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
    at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
    at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
    at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)

Re: 0.7.9 RejectedExecutionException

Posted by Jonathan Ellis <jb...@gmail.com>.
I've never seen a JVM crash that was polite enough to run shutdown
hooks first, but it's worth a try.

On Wed, Oct 12, 2011 at 3:27 PM, Erik Forkalsrud <ef...@cj.com> wrote:
>
> My suggestion would be to put a recent Sun JVM on the problematic node and
> see if that eliminates the crashes.
>
> The Sun JVM appears the be the mainstream choice when running Cassandra, so
> that's a more well tested configuration. You can search the list archives
> for OpenJDK related bugs to see that they do exist.  For example:
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201107.mbox/%3CCAB-=z42iHihR8SVhDrbVpfyJjYSsPCKes1ZoxKvcJe9AXKdaDA@mail.gmail.com%3E
>
>
> - Erik -
>
>
> On 10/12/2011 12:41 PM, Ashley Martens wrote:
>
> I guess it could be an option but I can't puppet the Oracle JDK install so I
> would rather not.
>
> On Wed, Oct 12, 2011 at 12:35 PM, Erik Forkalsrud <ef...@cj.com>
> wrote:
>>
>> On 10/12/2011 11:33 AM, Ashley Martens wrote:
>>
>> java version "1.6.0_20"
>> OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.10.2)
>> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>>
>> This may have been mentioned before, but is it an option to use the
>> Sun/Oracle JDK?
>>
>>
>> - Erik -
>>
>
>
>
> --
> gpg --keyserver pgpkeys.mit.edu --recv-keys 0x23e861255b0d6abb
> Key fingerprint = 0E9E 0E22 3957 BB04 DD72  B093 23E8 6125 5B0D 6ABB
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.9 RejectedExecutionException

Posted by Erik Forkalsrud <ef...@cj.com>.
My suggestion would be to put a recent Sun JVM on the problematic node 
and see if that eliminates the crashes.

The Sun JVM appears the be the mainstream choice when running Cassandra, 
so that's a more well tested configuration. You can search the list 
archives for OpenJDK related bugs to see that they do exist.  For 
example:  
http://mail-archives.apache.org/mod_mbox/cassandra-user/201107.mbox/%3CCAB-=z42iHihR8SVhDrbVpfyJjYSsPCKes1ZoxKvcJe9AXKdaDA@mail.gmail.com%3E


- Erik -


On 10/12/2011 12:41 PM, Ashley Martens wrote:
> I guess it could be an option but I can't puppet the Oracle JDK 
> install so I would rather not.
>
> On Wed, Oct 12, 2011 at 12:35 PM, Erik Forkalsrud <eforkalsrud@cj.com 
> <ma...@cj.com>> wrote:
>
>     On 10/12/2011 11:33 AM, Ashley Martens wrote:
>>     java version "1.6.0_20"
>>     OpenJDK Runtime Environment (IcedTea6 1.9.9)
>>     (6b20-1.9.9-0ubuntu1~10.10.2)
>>     OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>
>     This may have been mentioned before, but is it an option to use
>     the Sun/Oracle JDK?
>
>
>     - Erik -
>
>
>
>
> -- 
> gpg --keyserver pgpkeys.mit.edu <http://pgpkeys.mit.edu/> --recv-keys 
> 0x23e861255b0d6abb
> Key fingerprint = 0E9E 0E22 3957 BB04 DD72  B093 23E8 6125 5B0D 6ABB
>


Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
I guess it could be an option but I can't puppet the Oracle JDK install so I
would rather not.

On Wed, Oct 12, 2011 at 12:35 PM, Erik Forkalsrud <ef...@cj.com>wrote:

>  On 10/12/2011 11:33 AM, Ashley Martens wrote:
>
> java version "1.6.0_20"
> OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.10.2)
> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>
>
> This may have been mentioned before, but is it an option to use the
> Sun/Oracle JDK?
>
>
> - Erik -
>
>


-- 
gpg --keyserver pgpkeys.mit.edu --recv-keys 0x23e861255b0d6abb
Key fingerprint = 0E9E 0E22 3957 BB04 DD72  B093 23E8 6125 5B0D 6ABB

Re: 0.7.9 RejectedExecutionException

Posted by Erik Forkalsrud <ef...@cj.com>.
On 10/12/2011 11:33 AM, Ashley Martens wrote:
> java version "1.6.0_20"
> OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.10.2)
> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

This may have been mentioned before, but is it an option to use the 
Sun/Oracle JDK?


- Erik -


Re: 0.7.9 RejectedExecutionException

Posted by Mohit Anchlia <mo...@gmail.com>.
strace -F -f -c java is how I use for other related issues. Haven't
used with Cassandra though.

On Wed, Oct 12, 2011 at 3:22 PM, Ashley Martens <am...@ngmoco.com> wrote:
> This is a production node on real hardware. I like the strace idea, do you
> have a workable command line for that?
>
> On Wed, Oct 12, 2011 at 1:13 PM, Mohit Anchlia <mo...@gmail.com>
> wrote:
>>
>> Yes. If you have exhausted all the options I think it will be good to
>> see if this issue persists accross other nodes after you decommission
>> that node.
>>
>> If this is not production and issue is reproducible easily you can
>> also try using strace with fork option to see if it gets killed at the
>> same place.
>>
>> Are these vms?
>>
>

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
This is a production node on real hardware. I like the strace idea, do you
have a workable command line for that?

On Wed, Oct 12, 2011 at 1:13 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> Yes. If you have exhausted all the options I think it will be good to
> see if this issue persists accross other nodes after you decommission
> that node.
>
> If this is not production and issue is reproducible easily you can
> also try using strace with fork option to see if it gets killed at the
> same place.
>
> Are these vms?
>
>

Re: 0.7.9 RejectedExecutionException

Posted by Mohit Anchlia <mo...@gmail.com>.
Yes. If you have exhausted all the options I think it will be good to
see if this issue persists accross other nodes after you decommission
that node.

If this is not production and issue is reproducible easily you can
also try using strace with fork option to see if it gets killed at the
same place.

Are these vms?

On Wed, Oct 12, 2011 at 12:41 PM, Ashley Martens <am...@ngmoco.com> wrote:
> We have 20 nodes in this cluster.
> Yes, however are you recommending that I decommission the node?
>
> I noted the compaction because it is common for the last line in the log
> file. For reference:
>  INFO [FlushWriter:12] 2011-10-12 18:10:09,823 Memtable.java (line 157)
> Writing Memtable-HintsColumnFamily@
> 1313157506(0 bytes, 1 operations)
>  INFO [CompactionExecutor:1] 2011-10-12 18:10:09,823 CompactionManager.java
> (line 395) Compacting
> [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-7958-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-7959-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-7960-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-7961-Data.db')]
>  INFO [FlushWriter:12] 2011-10-12 18:10:09,862 Memtable.java (line 172)
> Completed flushing
> /var/lib/cassandra/data/system/HintsColumnFamily-f-7962-Data.db (61 bytes)
> Load, cpu and memory are nominal. The box is not stressed. iostat reports
> low usage.
>
> On Wed, Oct 12, 2011 at 12:24 PM, Mohit Anchlia <mo...@gmail.com>
> wrote:
>>
>> You mentioned this happens only on one node? How many nodes do you
>> have? Is it possible to turn off this node completely and run
>> compactions on other nodes and see if this happens there too?
>>
>> Also, you mentioned this happens after compaction. Did you mean during
>> compaction or right after it? What load, cpu, memory etc do you see
>> during those times?
>

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
We have 20 nodes in this cluster.
Yes, however are you recommending that I decommission the node?

I noted the compaction because it is common for the last line in the log
file. For reference:
 INFO [FlushWriter:12] 2011-10-12 18:10:09,823 Memtable.java (line 157)
Writing Memtable-HintsColumnFamily@
1313157506(0 bytes, 1 operations)
 INFO [CompactionExecutor:1] 2011-10-12 18:10:09,823 CompactionManager.java
(line 395) Compacting
[SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-7958-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-7959-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-7960-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-7961-Data.db')]
 INFO [FlushWriter:12] 2011-10-12 18:10:09,862 Memtable.java (line 172)
Completed flushing
/var/lib/cassandra/data/system/HintsColumnFamily-f-7962-Data.db (61 bytes)

Load, cpu and memory are nominal. The box is not stressed. iostat reports
low usage.

On Wed, Oct 12, 2011 at 12:24 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> You mentioned this happens only on one node? How many nodes do you
> have? Is it possible to turn off this node completely and run
> compactions on other nodes and see if this happens there too?
>
> Also, you mentioned this happens after compaction. Did you mean during
> compaction or right after it? What load, cpu, memory etc do you see
> during those times?
>

Re: 0.7.9 RejectedExecutionException

Posted by Mohit Anchlia <mo...@gmail.com>.
You mentioned this happens only on one node? How many nodes do you
have? Is it possible to turn off this node completely and run
compactions on other nodes and see if this happens there too?

Also, you mentioned this happens after compaction. Did you mean during
compaction or right after it? What load, cpu, memory etc do you see
during those times?
On Wed, Oct 12, 2011 at 12:03 PM, Ashley Martens <am...@ngmoco.com> wrote:
> No.
>
> On Wed, Oct 12, 2011 at 11:46 AM, Brandon Williams <dr...@gmail.com> wrote:
>>
>> Anything from the OOM killer in the last few lines from dmesg?
>>
>

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
No.

On Wed, Oct 12, 2011 at 11:46 AM, Brandon Williams <dr...@gmail.com> wrote:

> Anything from the OOM killer in the last few lines from dmesg?
>
>

Re: 0.7.9 RejectedExecutionException

Posted by Brandon Williams <dr...@gmail.com>.
Anything from the OOM killer in the last few lines from dmesg?

On Wed, Oct 12, 2011 at 1:33 PM, Ashley Martens <am...@ngmoco.com> wrote:
> Ubuntu 10.10
>
> java version "1.6.0_20"
> OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.10.2)
> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>
> Always the same node. No other nodes in this cluster, which all have the
> same hardware and OS, have this issue.
>
> I don't see any resource contention. Cassandra is the only application
> running on this machine.
>
> On Wed, Oct 12, 2011 at 10:55 AM, Sasha Dolgy <sd...@gmail.com> wrote:
>>
>> What OS?  JVM version?  is it always on the same node or all nodes?  i had
>> a similar problem in the past in that the OS killed Cassandra because it
>> felt threatened and needed more resources.
>
> --
> gpg --keyserver pgpkeys.mit.edu --recv-keys 0x23e861255b0d6abb
> Key fingerprint = 0E9E 0E22 3957 BB04 DD72  B093 23E8 6125 5B0D 6ABB
>

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
Ubuntu 10.10

java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.10.2)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

Always the same node. No other nodes in this cluster, which all have the
same hardware and OS, have this issue.

I don't see any resource contention. Cassandra is the only application
running on this machine.

On Wed, Oct 12, 2011 at 10:55 AM, Sasha Dolgy <sd...@gmail.com> wrote:

>
> What OS?  JVM version?  is it always on the same node or all nodes?  i had
> a similar problem in the past in that the OS killed Cassandra because it
> felt threatened and needed more resources.
>
>
-- 
gpg --keyserver pgpkeys.mit.edu --recv-keys 0x23e861255b0d6abb
Key fingerprint = 0E9E 0E22 3957 BB04 DD72  B093 23E8 6125 5B0D 6ABB

Re: 0.7.9 RejectedExecutionException

Posted by Sasha Dolgy <sd...@gmail.com>.
What OS?  JVM version?  is it always on the same node or all nodes?  i had a
similar problem in the past in that the OS killed Cassandra because it felt
threatened and needed more resources.

On Wed, Oct 12, 2011 at 7:47 PM, Ashley Martens <am...@ngmoco.com> wrote:

> The thing is we only see that error once every so often. Additional, since
> Cassandra is not logging a shutdown message then it must be a violent
> termination, which leaves no traces in the system logs. It's possible that
> there is something wrong with the hardware, but the OS side I don't see what
> would be killing it. The fact that it happens after compaction, usually of
> the hints leads me to think that there is ghost hiding there that we had the
> great luck to find.
>
>

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
The thing is we only see that error once every so often. Additional, since
Cassandra is not logging a shutdown message then it must be a violent
termination, which leaves no traces in the system logs. It's possible that
there is something wrong with the hardware, but the OS side I don't see what
would be killing it. The fact that it happens after compaction, usually of
the hints leads me to think that there is ghost hiding there that we had the
great luck to find.

On Wed, Oct 12, 2011 at 8:57 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> I'm comfortable with saying that there's some problem with your
> environment that hasn't been identified yet, because we see many many
> people running 0.7.9 and it just does not die randomly.
>
> Again, the exception you see in the log is consistent with being
> killed externally (and not consistent with dying from a Cassandra
> bug).
>
>

Re: 0.7.9 RejectedExecutionException

Posted by Jonathan Ellis <jb...@gmail.com>.
I'm comfortable with saying that there's some problem with your
environment that hasn't been identified yet, because we see many many
people running 0.7.9 and it just does not die randomly.

Again, the exception you see in the log is consistent with being
killed externally (and not consistent with dying from a Cassandra
bug).

On Wed, Oct 12, 2011 at 10:42 AM, Ashley Martens <am...@ngmoco.com> wrote:
> Tue Oct 11 21:34:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Tue Oct 11 22:06:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Tue Oct 11 22:36:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 00:40:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 02:12:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 03:14:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 03:46:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 04:18:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 06:20:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 06:50:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 09:56:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 10:26:11 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 10:58:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 11:30:11 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 12:00:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 12:32:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 13:02:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 14:34:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
> Wed Oct 12 15:36:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
Tue Oct 11 21:34:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Tue Oct 11 22:06:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Tue Oct 11 22:36:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 00:40:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 02:12:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 03:14:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 03:46:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 04:18:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 06:20:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 06:50:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 09:56:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 10:26:11 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 10:58:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 11:30:11 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 12:00:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 12:32:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 13:02:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 14:34:10 UTC 2011 - Fuck this Cassandra bullshit... it died again
Wed Oct 12 15:36:10 UTC 2011 - Fuck this Cassandra bullshit... it died again

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
deploy@mobage-prod-cassandra150:~$ grep -i 'killed process'
/var/log/messages
deploy@mobage-prod-cassandra150:~$


On Tue, Oct 11, 2011 at 5:57 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> grep -i 'killed process' /var/log/messages
>
>

Re: 0.7.9 RejectedExecutionException

Posted by Jonathan Ellis <jb...@gmail.com>.
grep -i 'killed process' /var/log/messages

On Tue, Oct 11, 2011 at 5:25 PM, Ashley Martens <am...@ngmoco.com> wrote:
> So we created a script to check if Cassandra is alive and run it every two
> minutes. Here are some results for today:
>
> Tue Oct 11 18:28:09 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 19:00:10 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 19:30:10 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 20:02:10 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 21:34:10 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 22:06:10 UTC 2011 - F this Cassandra bullshit... it died again
>
>
> And here are some of the log tails:
>
>  INFO [CompactionExecutor:1] 2011-10-11 18:58:14,909 CompactionManager.java
> (line 395) Compacting []
>  INFO [FlushWriter:10] 2011-10-11 18:58:14,951 Memtable.java (line 172)
> Completed flushing /var/lib/cassandra/data/
> system/HintsColumnFamily-f-568-Data.db (60 bytes)
>  INFO [FlushWriter:10] 2011-10-11 18:58:14,951 Memtable.java (line 157)
> Writing Memtable-HintsColumnFamily@1493400027(0 bytes, 1 operations)
>  INFO [FlushWriter:10] 2011-10-11 18:58:14,991 Memtable.java (line 172)
> Completed flushing
> /var/lib/cassandra/data/system/HintsColumnFamily-f-569-Data.db (61 bytes)
>  INFO [FlushWriter:10] 2011-10-11 18:58:14,991 Memtable.java (line 157)
> Writing Memtable-HintsColumnFamily@1932871300(0 bytes, 1 operations)
>  INFO [FlushWriter:10] 2011-10-11 18:58:15,031 Memtable.java (line 172)
> Completed flushing
> /var/lib/cassandra/data/system/HintsColumnFamily-f-570-Data.db (61 bytes)
>
> INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/
> system/HintsColumnFamily-f-1066
>  INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1098
>  INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1040
>  INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1071
>  INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,907 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1093
>
> INFO [FlushWriter:8] 2011-10-11 20:00:10,701 Memtable.java (line 157)
> Writing Memtable-HintsColumnFamily@
> 1488536311(0 bytes, 1 operations)
>  INFO [CompactionExecutor:1] 2011-10-11 20:00:10,701 CompactionManager.java
> (line 395) Compacting
> [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1687-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1688-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1689-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1690-Data.db')]
>  INFO [FlushWriter:8] 2011-10-11 20:00:10,741 Memtable.java (line 172)
> Completed flushing
> /var/lib/cassandra/data/system/HintsColumnFamily-f-1691-Data.db (61 bytes)
>  INFO [NonPeriodicTasks:1] 2011-10-11 21:33:26,980 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/
> system/HintsColumnFamily-f-3349
> ERROR [Thread-18] 2011-10-11 21:33:31,452 AbstractCassandraDaemon.java (line
> 132) Fatal exception in thread Thread[Thread-18,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>        at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
>        at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
>        at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
>        at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
>        at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)
> ERROR [Thread-19] 2011-10-11 22:04:39,195 AbstractCassandraDaemon.java (line
> 132) Fatal exception in thread Thread[Thread-19,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>        at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
>        at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
>        at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
>        at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
>        at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)
>
> I'm going to increase the logging level to DEBUG. Other than that I've got
> to say that Cassandra 0.7.9 is F'ed in some way or another.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
So we created a script to check if Cassandra is alive and run it every two
minutes. Here are some results for today:

Tue Oct 11 18:28:09 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 19:00:10 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 19:30:10 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 20:02:10 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 21:34:10 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 22:06:10 UTC 2011 - F this Cassandra bullshit... it died again


And here are some of the log tails:

 INFO [CompactionExecutor:1] 2011-10-11 18:58:14,909 CompactionManager.java
(line 395) Compacting []
 INFO [FlushWriter:10] 2011-10-11 18:58:14,951 Memtable.java (line 172)
Completed flushing /var/lib/cassandra/data/
system/HintsColumnFamily-f-568-Data.db (60 bytes)
 INFO [FlushWriter:10] 2011-10-11 18:58:14,951 Memtable.java (line 157)
Writing Memtable-HintsColumnFamily@1493400027(0 bytes, 1 operations)
 INFO [FlushWriter:10] 2011-10-11 18:58:14,991 Memtable.java (line 172)
Completed flushing
/var/lib/cassandra/data/system/HintsColumnFamily-f-569-Data.db (61 bytes)
 INFO [FlushWriter:10] 2011-10-11 18:58:14,991 Memtable.java (line 157)
Writing Memtable-HintsColumnFamily@1932871300(0 bytes, 1 operations)
 INFO [FlushWriter:10] 2011-10-11 18:58:15,031 Memtable.java (line 172)
Completed flushing
/var/lib/cassandra/data/system/HintsColumnFamily-f-570-Data.db (61 bytes)

INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/
system/HintsColumnFamily-f-1066
 INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1098
 INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1040
 INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1071
 INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,907 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1093

INFO [FlushWriter:8] 2011-10-11 20:00:10,701 Memtable.java (line 157)
Writing Memtable-HintsColumnFamily@
1488536311(0 bytes, 1 operations)
 INFO [CompactionExecutor:1] 2011-10-11 20:00:10,701 CompactionManager.java
(line 395) Compacting
[SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1687-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1688-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1689-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1690-Data.db')]
 INFO [FlushWriter:8] 2011-10-11 20:00:10,741 Memtable.java (line 172)
Completed flushing
/var/lib/cassandra/data/system/HintsColumnFamily-f-1691-Data.db (61 bytes)

 INFO [NonPeriodicTasks:1] 2011-10-11 21:33:26,980 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/
system/HintsColumnFamily-f-3349
ERROR [Thread-18] 2011-10-11 21:33:31,452 AbstractCassandraDaemon.java (line
132) Fatal exception in thread Thread[Thread-18,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
       at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
       at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
       at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
       at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
       at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)

ERROR [Thread-19] 2011-10-11 22:04:39,195 AbstractCassandraDaemon.java (line
132) Fatal exception in thread Thread[Thread-19,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
       at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
       at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
       at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
       at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
       at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)


I'm going to increase the logging level to DEBUG. Other than that I've got
to say that Cassandra 0.7.9 is F'ed in some way or another.

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
It is actually not at the exact same time of the day. It varies but happens
within certain blocks of time, like between 00hr and 02hr. The could be up
for hours or it could crash again in 15 minutes. The memory is fine, just
using a larger footprint than 0.6 in all ways.

On Mon, Oct 10, 2011 at 1:18 PM, aaron morton <aa...@thelastpickle.com>wrote:

> The service keeps dieing at the same time every day and there is nothing in
> the app logs, it's going to be something external.
>
> Sorry but I'm not sure what the problem with the memory usage is. Is the
> server running out of memory, or is it experiencing a lot of GC ?
>
>

Re: 0.7.9 RejectedExecutionException

Posted by aaron morton <aa...@thelastpickle.com>.
The service keeps dieing at the same time every day and there is nothing in the app logs, it's going to be something external.

Sorry but I'm not sure what the problem with the memory usage is. Is the server running out of memory, or is it experiencing a lot of GC ? 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11/10/2011, at 5:00 AM, Ashley Martens wrote:

> I have check both the output file and the system log, neither have errors in them. I don't believe anything external is killing the process, I could be wrong but this node's setup is the same as all my other nodes (including hardware) so it doesn't make much sense.
> 
> 
> jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk/jre/bin/../ -pidfile /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log -cp /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-0.7.8.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/avro-1.4.0-fixes.jar:/usr/share/cassandra/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-codec-1.2.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/concurrentlinkedhashmap-lru-1.1.jar:/usr/share/casandra/guava-r05.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/jetty-6.1.21.jar:/usr/share/cassandra/jetty-util-6.1.21.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json-simple-1.1.jar:/usr/share/cassandra/jug-2.0.0.jar:/usr/share/cassandra/libthrift-0.5.jar:/usr/share/cassandra/log4j-1.2.16.jar:/usr/share/cassandra/servlet-api-2.5-20081211.jar:/usr/share/cassandra/slf4j-api-1.6.1.jar:/usr/share/cassandra/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/snakeyaml-1.6.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar -Dlog4j.configuration=log4j-server.properties -XX:HeapDumpPath=/var/lib/cassandra/java_1318260751.hprof -XX:ErrorFile=/var/lib/casandra/hs_err_1318260751.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms24196M -Xmx24196M -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.thrift.CassandraDaemon
> 
> I have munin monitoring of JMX so when I talk about heap max then I'm referring to:
> 
> jmxObjectName java.lang:type=Memory
> jmxAttributeName HeapMemoryUsage
> jmxAttributeKey max
> 
> The other crazy thing is the heap used is no where close to heap max.
> 
> On Mon, Oct 10, 2011 at 12:40 AM, aaron morton <aa...@thelastpickle.com> wrote:
> Have you checked /var/log/cassandra/output.txt (the packaged install pipes std out/err to there) or the system logs ? If there are no errors in the logs it may well be something external killing it.
> 
> With regard to memory usage, it's hard for people to help unless you provide some numbers. What do you mean by MAX heap ? Is this the max used heap size reported by JMX or the -Xmx setting passed to the server ?
> 


Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
I have check both the output file and the system log, neither have errors in
them. I don't believe anything external is killing the process, I could be
wrong but this node's setup is the same as all my other nodes (including
hardware) so it doesn't make much sense.


jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk/jre/bin/../
-pidfile /var/run/cassandra.pid -errfile &1 -outfile
/var/log/cassandra/output.log -cp
/usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-0.7.8.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/avro-1.4.0-fixes.jar:/usr/share/cassandra/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-codec-1.2.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/concurrentlinkedhashmap-lru-1.1.jar:/usr/share/casandra/guava-r05.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/jetty-6.1.21.jar:/usr/share/cassandra/jetty-util-6.1.21.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json-simple-1.1.jar:/usr/share/cassandra/jug-2.0.0.jar:/usr/share/cassandra/libthrift-0.5.jar:/usr/share/cassandra/log4j-1.2.16.jar:/usr/share/cassandra/servlet-api-2.5-20081211.jar:/usr/share/cassandra/slf4j-api-1.6.1.jar:/usr/share/cassandra/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/snakeyaml-1.6.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-XX:HeapDumpPath=/var/lib/cassandra/java_1318260751.hprof
-XX:ErrorFile=/var/lib/casandra/hs_err_1318260751.log -ea
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms24196M -Xmx24196M
-Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=8080
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.thrift.CassandraDaemon

I have munin monitoring of JMX so when I talk about heap max then I'm
referring to:

jmxObjectName java.lang:type=Memory
jmxAttributeName HeapMemoryUsage
jmxAttributeKey max

The other crazy thing is the heap used is no where close to heap max.

On Mon, Oct 10, 2011 at 12:40 AM, aaron morton <aa...@thelastpickle.com>wrote:

> Have you checked /var/log/cassandra/output.txt (the packaged install pipes
> std out/err to there) or the system logs ? If there are no errors in the
> logs it may well be something external killing it.
>
> With regard to memory usage, it's hard for people to help unless you
> provide some numbers. What do you mean by MAX heap ? Is this the max used
> heap size reported by JMX or the -Xmx setting passed to the server ?
>
>

Re: 0.7.9 RejectedExecutionException

Posted by aaron morton <aa...@thelastpickle.com>.
Have you checked /var/log/cassandra/output.txt (the packaged install pipes std out/err to there) or the system logs ? If there are no errors in the logs it may well be something external killing it. 
 
With regard to memory usage, it's hard for people to help unless you provide some numbers. What do you mean by MAX heap ? Is this the max used heap size reported by JMX or the -Xmx setting passed to the server ? 

Cheers
  
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8/10/2011, at 7:02 AM, Ashley Martens wrote:

> Okay, this is still a problem. This node keeps dieing at 1am every day, most times without an error in the log. I'd appriciate any help in tracking down why.
> 
> Additionally, I don't understand why 0.7.x using *way* more RAM than 0.6.x and 0.8.x, from a top or ps perspective. I'm now watching the JVM memory and it seems to be more in line with 0.6.x but the MAX heap is crazy high (28G on my servers).


Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
Okay, this is still a problem. This node keeps dieing at 1am every day, most
times without an error in the log. I'd appriciate any help in tracking down
why.

Additionally, I don't understand why 0.7.x using *way* more RAM than 0.6.x
and 0.8.x, from a top or ps perspective. I'm now watching the JVM memory and
it seems to be more in line with 0.6.x but the MAX heap is crazy high (28G
on my servers).

Re: 0.7.9 RejectedExecutionException

Posted by aaron morton <aa...@thelastpickle.com>.
check this http://wiki.apache.org/cassandra/FAQ#mmap

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/10/2011, at 9:25 AM, Ashley Martens wrote:

> I could be wrong. I just looked the amount of memory being used and it's huge. WTF?


Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
I could be wrong. I just looked the amount of memory being used and it's
huge. WTF?

Re: 0.7.9 RejectedExecutionException

Posted by Ashley Martens <am...@ngmoco.com>.
No OOM errors appear and the memory used is far below physical and Java max.
I changed the JAR to 0.7.8 to see if that works. If so I'll find a way to
roll out that version instead of 0.7.9.

Re: 0.7.9 RejectedExecutionException

Posted by Jonathan Ellis <jb...@gmail.com>.
"I can't schedule this task because I'm shutting down" is a symptom of
your node crashing, not a cause.  Is it being OOMkilled, perhaps?

On Wed, Oct 5, 2011 at 12:42 PM, Ashley Martens <am...@ngmoco.com> wrote:
> I'm getting the following exception on a 0.7.9 node before the node crashes.
> I don't have this problem with the other nodes running 0.7.8. Does anyone
> know what the problem is?
>
> ERROR [Thread-47] 2011-10-05 05:07:03,840 AbstractCassandraDaemon.java (line
> 133) Fatal exception in thread Thread[Thread-47,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>     at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
>     at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
>     at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
>     at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
>     at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com