You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alexander Shutyaev <sh...@gmail.com> on 2013/04/01 10:15:49 UTC

CorruptSSTableException in system keyspace

Hi all!

Our app currently moves from *hibernate* to *cassandra+solr* combination
(we use our own storage abstraction for this). We have a lot of junit
integration tests that use database. Now we've decided to use these tests
to test our *cassandra+solr* storage implementation (given that on *
hibernate* implementation these tests run fine). For that purpose we've
installed *cassandra* and *solr* on a dedicated vm and we've launched the
tests.

Now there are two things that should be mentioned about these tests and how
we use cassandra.

1. We have 1 *keyspace* and about 200-300 *column families*.
2. Tests are written in such a manner that at the beginning of each test we
drop the *keyspace* and then recreate it from scratch (together with
all *column
families*).
3. On client side we use *hector*.

The problem is that each time we launch the tests they start failing
randomly at some point. In *cassandra* logs we see a lot of *
CorruptSSTableException*-s in *system keyspace *caused by *
CorruptBlockException*-s. An example of full stack trace of such exception
can be found here [1]. The tests start to fail due to thrift
*transport*exceptions. At that point we are also unable to connect to
*cassandra* using *cassandra-cli*. Then, after some time *cassandra* goes
back to normal state all by itself.

[1] http://pastebin.com/9SpEJpap

P.S. I can provide full system.log and output.log from clean start (all
data folders erased) till the errors and later when the system is ok once
again. Although I'm not sure which file sharing service to utilise.

Thanks in advance,
Alexander

Re: CorruptSSTableException in system keyspace

Posted by aaron morton <aa...@thelastpickle.com>.
There is a ticket there from an older version but I doubt thats it https://issues.apache.org/jira/browse/CASSANDRA-4837

You may be hitting an edge case by quickly creating 200 to 300 CF's. Can you reproduce the problem outside of your test infrastructure ? If so can you raise a ticket https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 2/04/2013, at 6:39 PM, Alexander Shutyaev <sh...@gmail.com> wrote:

> Hi, Aaron!
> 
> We were using 1.2.2. Now after your suggestion I've upgraded it to 1.2.3. That CorruptSSTableException is now gone, but the problem is still here - after some time we start getting another exception in cassandra logs (and transport exception on the client side). The new exception is java.lang.IllegalStateException: One row required, 0 found. Full stacktrace available here [1]
> 
> [1] http://pastebin.com/W0nL6hK7
> 
> Thanks in advance,
> Alexander
> 
> 
> 2013/4/2 aaron morton <aa...@thelastpickle.com>
> What version are you using ?
> 
> There are two tickets with similar issues fixed in 1.2.X releases.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-5225
> https://issues.apache.org/jira/browse/CASSANDRA-5365
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 1/04/2013, at 1:45 PM, Alexander Shutyaev <sh...@gmail.com> wrote:
> 
> > Hi all!
> >
> > Our app currently moves from hibernate to cassandra+solr combination (we use our own storage abstraction for this). We have a lot of junit integration tests that use database. Now we've decided to use these tests to test our cassandra+solr storage implementation (given that on hibernate implementation these tests run fine). For that purpose we've installed cassandra and solr on a dedicated vm and we've launched the tests.
> >
> > Now there are two things that should be mentioned about these tests and how we use cassandra.
> >
> > 1. We have 1 keyspace and about 200-300 column families.
> > 2. Tests are written in such a manner that at the beginning of each test we drop the keyspace and then recreate it from scratch (together with all column families).
> > 3. On client side we use hector.
> >
> > The problem is that each time we launch the tests they start failing randomly at some point. In cassandra logs we see a lot of CorruptSSTableException-s in system keyspace caused by CorruptBlockException-s. An example of full stack trace of such exception can be found here [1]. The tests start to fail due to thrift transport exceptions. At that point we are also unable to connect to cassandra using cassandra-cli. Then, after some time cassandra goes back to normal state all by itself.
> >
> > [1] http://pastebin.com/9SpEJpap
> >
> > P.S. I can provide full system.log and output.log from clean start (all data folders erased) till the errors and later when the system is ok once again. Although I'm not sure which file sharing service to utilise.
> >
> > Thanks in advance,
> > Alexander
> 
> 


Re: CorruptSSTableException in system keyspace

Posted by Alexander Shutyaev <sh...@gmail.com>.
Hi, Aaron!

We were using 1.2.2. Now after your suggestion I've upgraded it to 1.2.3.
That *CorruptSSTableException* is now gone, but the problem is still here -
after some time we start getting another exception in cassandra logs (and
transport exception on the client side). The new exception is
*java.lang.IllegalStateException:
One row required, 0 found*. Full stacktrace available here [1]

[1] http://pastebin.com/W0nL6hK7

Thanks in advance,
Alexander


2013/4/2 aaron morton <aa...@thelastpickle.com>

> What version are you using ?
>
> There are two tickets with similar issues fixed in 1.2.X releases.
>
> https://issues.apache.org/jira/browse/CASSANDRA-5225
> https://issues.apache.org/jira/browse/CASSANDRA-5365
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/04/2013, at 1:45 PM, Alexander Shutyaev <sh...@gmail.com> wrote:
>
> > Hi all!
> >
> > Our app currently moves from hibernate to cassandra+solr combination (we
> use our own storage abstraction for this). We have a lot of junit
> integration tests that use database. Now we've decided to use these tests
> to test our cassandra+solr storage implementation (given that on hibernate
> implementation these tests run fine). For that purpose we've installed
> cassandra and solr on a dedicated vm and we've launched the tests.
> >
> > Now there are two things that should be mentioned about these tests and
> how we use cassandra.
> >
> > 1. We have 1 keyspace and about 200-300 column families.
> > 2. Tests are written in such a manner that at the beginning of each test
> we drop the keyspace and then recreate it from scratch (together with all
> column families).
> > 3. On client side we use hector.
> >
> > The problem is that each time we launch the tests they start failing
> randomly at some point. In cassandra logs we see a lot of
> CorruptSSTableException-s in system keyspace caused by
> CorruptBlockException-s. An example of full stack trace of such exception
> can be found here [1]. The tests start to fail due to thrift transport
> exceptions. At that point we are also unable to connect to cassandra using
> cassandra-cli. Then, after some time cassandra goes back to normal state
> all by itself.
> >
> > [1] http://pastebin.com/9SpEJpap
> >
> > P.S. I can provide full system.log and output.log from clean start (all
> data folders erased) till the errors and later when the system is ok once
> again. Although I'm not sure which file sharing service to utilise.
> >
> > Thanks in advance,
> > Alexander
>
>

Re: CorruptSSTableException in system keyspace

Posted by aaron morton <aa...@thelastpickle.com>.
What version are you using ? 

There are two tickets with similar issues fixed in 1.2.X releases. 

https://issues.apache.org/jira/browse/CASSANDRA-5225
https://issues.apache.org/jira/browse/CASSANDRA-5365

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/04/2013, at 1:45 PM, Alexander Shutyaev <sh...@gmail.com> wrote:

> Hi all!
> 
> Our app currently moves from hibernate to cassandra+solr combination (we use our own storage abstraction for this). We have a lot of junit integration tests that use database. Now we've decided to use these tests to test our cassandra+solr storage implementation (given that on hibernate implementation these tests run fine). For that purpose we've installed cassandra and solr on a dedicated vm and we've launched the tests.
> 
> Now there are two things that should be mentioned about these tests and how we use cassandra.
> 
> 1. We have 1 keyspace and about 200-300 column families.
> 2. Tests are written in such a manner that at the beginning of each test we drop the keyspace and then recreate it from scratch (together with all column families).
> 3. On client side we use hector.
> 
> The problem is that each time we launch the tests they start failing randomly at some point. In cassandra logs we see a lot of CorruptSSTableException-s in system keyspace caused by CorruptBlockException-s. An example of full stack trace of such exception can be found here [1]. The tests start to fail due to thrift transport exceptions. At that point we are also unable to connect to cassandra using cassandra-cli. Then, after some time cassandra goes back to normal state all by itself.
> 
> [1] http://pastebin.com/9SpEJpap
> 
> P.S. I can provide full system.log and output.log from clean start (all data folders erased) till the errors and later when the system is ok once again. Although I'm not sure which file sharing service to utilise.
> 
> Thanks in advance,
> Alexander