You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Eric Czech <er...@nextbigsound.com> on 2010/10/14 03:43:53 UTC

Silent Crash

Recently, cassandra has been crashing with no apparent error on one specific
node in my cluster.  Has anyone else ever had this happen and is there a way
to possible figure out what is going on other than looking at what is in the
stdout and system.log files?

Thanks!

Re: Silent Crash

Posted by Eric Czech <er...@nextbigsound.com>.
Alright, very helpful.  Thanks again!

That's more encouraging than corruption so I'd be happy to try it.

On Wed, Oct 13, 2010 at 11:41 PM, B. Todd Burruss <bb...@real.com> wrote:

>  that type of error report indicates a bug in the JVM.  something that
> should *never* occur if the JVM is operating properly.  corrupt cassandra
> data, auto-bootstrapping should never cause that kind of crash.
>
> the SIGSEGV in the report indicates a segmentation fault (
> http://en.wikipedia.org/wiki/SIGSEGV), which again, should *never* happen
> if the JVM is operating properly.  the real problem is inside the JVM, not
> with Cassandra
>
> sorry to say, your best bet is to upgrade
>
>
>
> On 10/13/2010 10:09 PM, Eric Czech wrote:
>
> Thank you Todd.  It seems strange though that this is only happening on one
> node and has never occurred on any others that are using the same JVM
> version.  This node was just auto-bootstrapped so do you think this might be
> the result of some sort of data corruption?  I would like to just
> decommission it but I'm not sure that that would fix the corrupted data (if
> it is actually corrupted).  Do you know if compact or repair would detect
> bad data and disregard it?  I'd like to try something like that if possible
> before just upgrading the JVM and potentially hiding the real problem.
>
> On Wed, Oct 13, 2010 at 9:35 PM, B. Todd Burruss <bb...@real.com>wrote:
>
>>  you should upgrade to the latest version of the JVM, 1.6.0_21
>>
>> there was a bug around 1.6.0_18 (or there abouts) that affected cassandra
>>
>>
>> On 10/13/2010 07:55 PM, Eric Czech wrote:
>>
>> And this is the java version:
>>
>> java version "1.6.0_13"
>> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
>> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)
>>
>> and it's running on Ubuntu 9.04 (jaunty) linux
>> 4 cores
>> 4 GB RAM
>>
>> On Wed, Oct 13, 2010 at 8:30 PM, Eric Czech <er...@nextbigsound.com>wrote:
>>
>>> Yea there are several.  All of them have the same head and it looks like
>>> this:
>>>
>>> #
>>> # An unexpected error has been detected by Java Runtime Environment:
>>> #
>>> #  SIGSEGV (0xb) at pc=0x00007f140e588b32, pid=2359, tid=139720650078544
>>> #
>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed mode
>>> linux-amd64)
>>> # Problematic frame:
>>> # V  [libjvm.so+0x1d3b32]
>>> #
>>> # If you would like to submit a bug report, please visit:
>>>
>>> Have you ever seen that before?
>>>
>>>
>>>
>>>
>>> On Wed, Oct 13, 2010 at 7:52 PM, Jonathan Ellis <jb...@gmail.com>wrote:
>>>
>>>> is there a jvm crash log file?
>>>>
>>>> On Wed, Oct 13, 2010 at 8:43 PM, Eric Czech <er...@nextbigsound.com>
>>>> wrote:
>>>> > Recently, cassandra has been crashing with no apparent error on one
>>>> specific
>>>> > node in my cluster.  Has anyone else ever had this happen and is there
>>>> a way
>>>> > to possible figure out what is going on other than looking at what is
>>>> in the
>>>> > stdout and system.log files?
>>>> >
>>>> > Thanks!
>>>> >
>>>>
>>>>
>>>>
>>>>  --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of Riptano, the source for professional Cassandra support
>>>> http://riptano.com
>>>>
>>>
>>>
>>
>

Re: Silent Crash

Posted by Eric Czech <er...@nextbigsound.com>.
Thanks again for the help.  I upgraded my JVM to update 22 but I'm still
getting the same error just as before, and just as, if not more,
frequently.  I'm thinking that the best course of action at this point is to
replace the hardware.  I would try the test builds, but I can't imagine they
wouldn't tell me the same thing.  Does anyone know of any other possible
solution to this sort of problem?



On Thu, Oct 14, 2010 at 8:58 AM, Nicholas Knight <nk...@runawaynet.com>wrote:

> On Oct 14, 2010, at 10:37 PM, Eric Evans wrote:
> >> sorry to say, your best bet is to upgrade
> >
> > I would actually start with some large test builds, kernels work well
> > for this.  Use a high concurrency (> 4).
>
> Whether or not those fail, assuming x86, download memtest86+ and boot it.
> Symptoms like this tend to be bad RAM, and it's also the easiest thing to
> test (and, if broken, fix).
>
> -NK

Re: Silent Crash

Posted by Nicholas Knight <nk...@runawaynet.com>.
On Oct 14, 2010, at 10:37 PM, Eric Evans wrote:
>> sorry to say, your best bet is to upgrade 
> 
> I would actually start with some large test builds, kernels work well
> for this.  Use a high concurrency (> 4).

Whether or not those fail, assuming x86, download memtest86+ and boot it. Symptoms like this tend to be bad RAM, and it's also the easiest thing to test (and, if broken, fix). 

-NK

Re: Silent Crash

Posted by Eric Evans <ee...@rackspace.com>.
On Wed, 2010-10-13 at 22:41 -0700, B. Todd Burruss wrote:
>   that type of error report indicates a bug in the JVM.  something
> that 
> should *never* occur if the JVM is operating properly.  corrupt 
> cassandra data, auto-bootstrapping should never cause that kind of
> crash.
> 
> the SIGSEGV in the report indicates a segmentation fault 
> (http://en.wikipedia.org/wiki/SIGSEGV), which again, should *never* 
> happen if the JVM is operating properly.  the real problem is inside
> the JVM, not with Cassandra

A SIGSEGV is also an indication of hardware issues, which is the first
place I would start looking if it were only occurring on one machine.

> sorry to say, your best bet is to upgrade 

I would actually start with some large test builds, kernels work well
for this.  Use a high concurrency (> 4).

-- 
Eric Evans
eevans@rackspace.com


Re: Silent Crash

Posted by "B. Todd Burruss" <bb...@real.com>.
  that type of error report indicates a bug in the JVM.  something that 
should *never* occur if the JVM is operating properly.  corrupt 
cassandra data, auto-bootstrapping should never cause that kind of crash.

the SIGSEGV in the report indicates a segmentation fault 
(http://en.wikipedia.org/wiki/SIGSEGV), which again, should *never* 
happen if the JVM is operating properly.  the real problem is inside the 
JVM, not with Cassandra

sorry to say, your best bet is to upgrade


On 10/13/2010 10:09 PM, Eric Czech wrote:
> Thank you Todd.  It seems strange though that this is only happening 
> on one node and has never occurred on any others that are using the 
> same JVM version.  This node was just auto-bootstrapped so do you 
> think this might be the result of some sort of data corruption?  I 
> would like to just decommission it but I'm not sure that that would 
> fix the corrupted data (if it is actually corrupted).  Do you know if 
> compact or repair would detect bad data and disregard it?  I'd like to 
> try something like that if possible before just upgrading the JVM and 
> potentially hiding the real problem.
>
> On Wed, Oct 13, 2010 at 9:35 PM, B. Todd Burruss <bburruss@real.com 
> <ma...@real.com>> wrote:
>
>     you should upgrade to the latest version of the JVM, 1.6.0_21
>
>     there was a bug around 1.6.0_18 (or there abouts) that affected
>     cassandra
>
>
>     On 10/13/2010 07:55 PM, Eric Czech wrote:
>>     And this is the java version:
>>
>>     java version "1.6.0_13"
>>     Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
>>     Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)
>>
>>     and it's running on Ubuntu 9.04 (jaunty) linux
>>     4 cores
>>     4 GB RAM
>>
>>     On Wed, Oct 13, 2010 at 8:30 PM, Eric Czech
>>     <eric@nextbigsound.com <ma...@nextbigsound.com>> wrote:
>>
>>         Yea there are several.  All of them have the same head and it
>>         looks like this:
>>
>>         #
>>         # An unexpected error has been detected by Java Runtime
>>         Environment:
>>         #
>>         #  SIGSEGV (0xb) at pc=0x00007f140e588b32, pid=2359,
>>         tid=139720650078544
>>         #
>>         # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed
>>         mode linux-amd64)
>>         # Problematic frame:
>>         # V  [libjvm.so+0x1d3b32]
>>         #
>>         # If you would like to submit a bug report, please visit:
>>
>>         Have you ever seen that before?
>>
>>
>>
>>
>>         On Wed, Oct 13, 2010 at 7:52 PM, Jonathan Ellis
>>         <jbellis@gmail.com <ma...@gmail.com>> wrote:
>>
>>             is there a jvm crash log file?
>>
>>             On Wed, Oct 13, 2010 at 8:43 PM, Eric Czech
>>             <eric@nextbigsound.com <ma...@nextbigsound.com>> wrote:
>>             > Recently, cassandra has been crashing with no apparent
>>             error on one specific
>>             > node in my cluster.  Has anyone else ever had this
>>             happen and is there a way
>>             > to possible figure out what is going on other than
>>             looking at what is in the
>>             > stdout and system.log files?
>>             >
>>             > Thanks!
>>             >
>>
>>
>>
>>             --
>>             Jonathan Ellis
>>             Project Chair, Apache Cassandra
>>             co-founder of Riptano, the source for professional
>>             Cassandra support
>>             http://riptano.com
>>
>>
>>
>

Re: Silent Crash

Posted by Eric Czech <er...@nextbigsound.com>.
Thank you Todd.  It seems strange though that this is only happening on one
node and has never occurred on any others that are using the same JVM
version.  This node was just auto-bootstrapped so do you think this might be
the result of some sort of data corruption?  I would like to just
decommission it but I'm not sure that that would fix the corrupted data (if
it is actually corrupted).  Do you know if compact or repair would detect
bad data and disregard it?  I'd like to try something like that if possible
before just upgrading the JVM and potentially hiding the real problem.

On Wed, Oct 13, 2010 at 9:35 PM, B. Todd Burruss <bb...@real.com> wrote:

>  you should upgrade to the latest version of the JVM, 1.6.0_21
>
> there was a bug around 1.6.0_18 (or there abouts) that affected cassandra
>
>
> On 10/13/2010 07:55 PM, Eric Czech wrote:
>
> And this is the java version:
>
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)
>
> and it's running on Ubuntu 9.04 (jaunty) linux
> 4 cores
> 4 GB RAM
>
> On Wed, Oct 13, 2010 at 8:30 PM, Eric Czech <er...@nextbigsound.com> wrote:
>
>> Yea there are several.  All of them have the same head and it looks like
>> this:
>>
>> #
>> # An unexpected error has been detected by Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x00007f140e588b32, pid=2359, tid=139720650078544
>> #
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed mode
>> linux-amd64)
>> # Problematic frame:
>> # V  [libjvm.so+0x1d3b32]
>> #
>> # If you would like to submit a bug report, please visit:
>>
>> Have you ever seen that before?
>>
>>
>>
>>
>> On Wed, Oct 13, 2010 at 7:52 PM, Jonathan Ellis <jb...@gmail.com>wrote:
>>
>>> is there a jvm crash log file?
>>>
>>> On Wed, Oct 13, 2010 at 8:43 PM, Eric Czech <er...@nextbigsound.com>
>>> wrote:
>>> > Recently, cassandra has been crashing with no apparent error on one
>>> specific
>>> > node in my cluster.  Has anyone else ever had this happen and is there
>>> a way
>>> > to possible figure out what is going on other than looking at what is
>>> in the
>>> > stdout and system.log files?
>>> >
>>> > Thanks!
>>> >
>>>
>>>
>>>
>>>  --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>>
>>
>>
>

Re: Silent Crash

Posted by "B. Todd Burruss" <bb...@real.com>.
  you should upgrade to the latest version of the JVM, 1.6.0_21

there was a bug around 1.6.0_18 (or there abouts) that affected cassandra

On 10/13/2010 07:55 PM, Eric Czech wrote:
> And this is the java version:
>
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)
>
> and it's running on Ubuntu 9.04 (jaunty) linux
> 4 cores
> 4 GB RAM
>
> On Wed, Oct 13, 2010 at 8:30 PM, Eric Czech <eric@nextbigsound.com 
> <ma...@nextbigsound.com>> wrote:
>
>     Yea there are several.  All of them have the same head and it
>     looks like this:
>
>     #
>     # An unexpected error has been detected by Java Runtime Environment:
>     #
>     #  SIGSEGV (0xb) at pc=0x00007f140e588b32, pid=2359,
>     tid=139720650078544
>     #
>     # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed mode
>     linux-amd64)
>     # Problematic frame:
>     # V  [libjvm.so+0x1d3b32]
>     #
>     # If you would like to submit a bug report, please visit:
>
>     Have you ever seen that before?
>
>
>
>
>     On Wed, Oct 13, 2010 at 7:52 PM, Jonathan Ellis <jbellis@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         is there a jvm crash log file?
>
>         On Wed, Oct 13, 2010 at 8:43 PM, Eric Czech
>         <eric@nextbigsound.com <ma...@nextbigsound.com>> wrote:
>         > Recently, cassandra has been crashing with no apparent error
>         on one specific
>         > node in my cluster.  Has anyone else ever had this happen
>         and is there a way
>         > to possible figure out what is going on other than looking
>         at what is in the
>         > stdout and system.log files?
>         >
>         > Thanks!
>         >
>
>
>
>         --
>         Jonathan Ellis
>         Project Chair, Apache Cassandra
>         co-founder of Riptano, the source for professional Cassandra
>         support
>         http://riptano.com
>
>
>

Re: Silent Crash

Posted by Eric Czech <er...@nextbigsound.com>.
And this is the java version:

java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)

and it's running on Ubuntu 9.04 (jaunty) linux
4 cores
4 GB RAM

On Wed, Oct 13, 2010 at 8:30 PM, Eric Czech <er...@nextbigsound.com> wrote:

> Yea there are several.  All of them have the same head and it looks like
> this:
>
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f140e588b32, pid=2359, tid=139720650078544
> #
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed mode
> linux-amd64)
> # Problematic frame:
> # V  [libjvm.so+0x1d3b32]
> #
> # If you would like to submit a bug report, please visit:
>
> Have you ever seen that before?
>
>
>
>
> On Wed, Oct 13, 2010 at 7:52 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> is there a jvm crash log file?
>>
>> On Wed, Oct 13, 2010 at 8:43 PM, Eric Czech <er...@nextbigsound.com>
>> wrote:
>> > Recently, cassandra has been crashing with no apparent error on one
>> specific
>> > node in my cluster.  Has anyone else ever had this happen and is there a
>> way
>> > to possible figure out what is going on other than looking at what is in
>> the
>> > stdout and system.log files?
>> >
>> > Thanks!
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>

Re: Silent Crash

Posted by Eric Czech <er...@nextbigsound.com>.
Yea there are several.  All of them have the same head and it looks like
this:

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f140e588b32, pid=2359, tid=139720650078544
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed mode
linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x1d3b32]
#
# If you would like to submit a bug report, please visit:

Have you ever seen that before?



On Wed, Oct 13, 2010 at 7:52 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> is there a jvm crash log file?
>
> On Wed, Oct 13, 2010 at 8:43 PM, Eric Czech <er...@nextbigsound.com> wrote:
> > Recently, cassandra has been crashing with no apparent error on one
> specific
> > node in my cluster.  Has anyone else ever had this happen and is there a
> way
> > to possible figure out what is going on other than looking at what is in
> the
> > stdout and system.log files?
> >
> > Thanks!
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Silent Crash

Posted by Jonathan Ellis <jb...@gmail.com>.
is there a jvm crash log file?

On Wed, Oct 13, 2010 at 8:43 PM, Eric Czech <er...@nextbigsound.com> wrote:
> Recently, cassandra has been crashing with no apparent error on one specific
> node in my cluster.  Has anyone else ever had this happen and is there a way
> to possible figure out what is going on other than looking at what is in the
> stdout and system.log files?
>
> Thanks!
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com