You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Glenn Thompson <ga...@gmail.com> on 2013/06/30 05:39:22 UTC

CorruptBlockException

Hi,

I'm Glenn Thompson and new to Cassandra.  I have been trying to figure out
how to recover from a CorruptBlockException.  My travels have led me to
numerous email threads and trouble tickets.  I think I did the right
thing(s) based on my research.

My basic situation.

I'm running on non enterprise hardware.  My personal cloud playground.  8
identical mini-itx Gigabyte GA-Z77N systems running centos 6.4 84 bit.
 Each has 16GB of Ram and two 750GB wd black laptop drives(raid 0) and
i3-3220 processors(2 cores 4 threads).  Anyone interested is the hardware
can go here :
https://drive.google.com/folderview?id=0B54Jqmw0tKp0c19kYy1kUW54VVE&usp=sharing

I've been loading NOAA ISH data in an effort to learn/slash evaluate
Cassandra.

One of my nodes must have a hardware problem.  Although I've been unable to
find anything wrong via logs, smart, or mce.

Cassandra discovered the error during a compaction.  My loading continued
so I let it finish.

Then I:

Flushed
repaired
scrubbed
and finally decommissioned the node.

At no point did Cassandra declare any of the tokens as down or anything.
 Other than the Exceptions In the logs.  Cassandra was happy.

The repair, scrub, and decommission all produced Exceptions related to the
same few corrupt files.

I plumb this whole thing with SaltStack so I'm going to start over and
attempt another load with new RAM in the bad node.  I'll save my logs and
configs if anyone is interested.  I'll post them on my google drive if
anyone thinks it will be useful.

Cheers,
Glenn

Re: CorruptBlockException

Posted by Glenn Thompson <ga...@gmail.com>.
Hi Rob,

It was hardware.  Memory.  I've been loading data since I originally
posted.  No exceptions so far.  I had some issues with OOMs when I first
started playing with cassandra.  I increased the amount RAM to the VM and
reduced the memtable size.  I'm guessing it's because I'm using I3s.  More
cores would most likely improve GC performance.

I put all the logs and my configs on my google drive.  The link is in the
original post.  I'm running 1.2.4.  There have been two releases since my
original download.  I'm going to attempt an upgrade soon.

I'm also considering using leveled compaction.  I just have two 750GB
drives per node.  I'd like to use more than 50% of the drives if I can.

Thanks,
Glenn


On Mon, Jul 1, 2013 at 11:08 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Sat, Jun 29, 2013 at 8:39 PM, Glenn Thompson <ga...@gmail.com>
> wrote:
> > I'm Glenn Thompson and new to Cassandra.  I have been trying to figure
> out
> > how to recover from a CorruptBlockException.
> > ...
> > One of my nodes must have a hardware problem.  Although I've been unable
> to
> > find anything wrong via logs, smart, or mce.
> > ...
> > The repair, scrub, and decommission all produced Exceptions related to
> the
> > same few corrupt files.
>
> Hardware problem sounds relatively likely, especially if you have not
> crashed your nodes. Only other thing I can think of is an issue with
> the relationship of the compression library and the JVM. What JVM/JDK
> are you using, and what compression method is in use on the Column
> Family?
>
> In general the actions you took were reasonable. Do you have the full
> stack trace?
>
> =Rob
>

Re: CorruptBlockException

Posted by Robert Coli <rc...@eventbrite.com>.
On Sat, Jun 29, 2013 at 8:39 PM, Glenn Thompson <ga...@gmail.com> wrote:
> I'm Glenn Thompson and new to Cassandra.  I have been trying to figure out
> how to recover from a CorruptBlockException.
> ...
> One of my nodes must have a hardware problem.  Although I've been unable to
> find anything wrong via logs, smart, or mce.
> ...
> The repair, scrub, and decommission all produced Exceptions related to the
> same few corrupt files.

Hardware problem sounds relatively likely, especially if you have not
crashed your nodes. Only other thing I can think of is an issue with
the relationship of the compression library and the JVM. What JVM/JDK
are you using, and what compression method is in use on the Column
Family?

In general the actions you took were reasonable. Do you have the full
stack trace?

=Rob