You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by George Sigletos <si...@textkernel.nl> on 2015/09/15 12:26:14 UTC

Currupt sstables when upgrading from 2.1.8 to 2.1.9

Hello,

I tried to upgrade two of our clusters from 2.1.8 to 2.1.9. In some, but
not all nodes, I got errors about corrupt sstables when restarting. I
downgraded back to 2.1.8 for now.

Has anybody else faced the same problem? Should sstablescrub fix the
problem? I ddin't tried that yet.

Kind regards,
George

ERROR [SSTableBatchOpen:3] 2015-09-14 10:16:03,296 FileUtils.java:447 -
Exiting forcefully due to file system exception on startup, disk failure
policy "stop"
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at
org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534)
~[apache-cassandra-2.1.9.jar:2.1.9]
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source) [na:1.7.0_75]
        at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.7.0_75]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source) [na:1.7.0_75]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source) [na:1.7.0_75]
        at java.lang.Thread.run(Unknown Source) [na:1.7.0_75]
Caused by: java.io.EOFException: null
        at java.io.DataInputStream.readUnsignedShort(Unknown Source)
~[na:1.7.0_75]
        at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75]
        at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75]
        at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106)
~[apache-cassandra-2.1.9.jar:2.1.9]

Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Sep 30, 2015 at 3:08 AM, George Sigletos <si...@textkernel.nl>
wrote:

> This is way too many manual steps. I was wondering why not just removing
> the entire /var/lib/cassandra/data folder + commitlogs, restart the node
> again and wait to catch up
>

Briefly, there is a low but non-zero chance that any given piece of data is
only stored on the node that you are wiping. I refer to this concept as the
"unique replica count".

=Rob

Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

Posted by George Sigletos <si...@textkernel.nl>.

Hello again and sorry for the late response,

Still having problems with upgrading from 2.1.8 to 2.1.9.

I decided to start the problematic nodes with "disk_failure_policy:
best_effort"

Currently running "nodetool scrub <keyspace> <table>"

Then removing the corrupted sstables and planning to run repair afterwards

This is way too many manual steps. I was wondering why not just removing
the entire /var/lib/cassandra/data folder + commitlogs, restart the node
again and wait to catch up

Kind regards,
George

On Fri, Sep 25, 2015 at 12:01 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Sep 24, 2015 at 3:00 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> A node which has lost a SSTable also needs to be repaired immediately.
>>
>
> Forgot to mention, you can repair via this technique :
>
> https://issues.apache.org/jira/browse/CASSANDRA-6961
>
> =Rob
>
>

Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

Posted by Robert Coli <rc...@eventbrite.com>.

On Thu, Sep 24, 2015 at 3:00 PM, Robert Coli <rc...@eventbrite.com> wrote:

> A node which has lost a SSTable also needs to be repaired immediately.
>

Forgot to mention, you can repair via this technique :

https://issues.apache.org/jira/browse/CASSANDRA-6961

=Rob

Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Sep 15, 2015 at 9:42 AM, Nate McCall <na...@thelastpickle.com> wrote:

> Either way, you are going to have to run nodetool scrub. I'm not sure if
> it's better to do this from 2.1.8 or from 2.1.9 with "disk_failure_policy:
> ignore"
>

A node which has lost a SSTable also needs to be repaired immediately. If
it is not repaired before being brought back into the cluster, there are
cases where it can poison consistency on other nodes. For example, perhaps
the SSTable you lost contained the only copy of a tombstone, and the row is
now unmasked.

=Rob

Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

Posted by Nate McCall <na...@thelastpickle.com>.

You have a/some corrupt SSTables. 2.1.9 is doing strict checking at startup
and reacting based on "disk_failure_policy" per the stack trace.

For details, see:
https://issues.apache.org/jira/browse/CASSANDRA-9686

Either way, you are going to have to run nodetool scrub. I'm not sure if
it's better to do this from 2.1.8 or from 2.1.9 with "disk_failure_policy:
ignore"

It feels like that option got overloaded a bit strangely with the changes
in CASSANDRA-9686 and I have not yet tried it with it's new meaning.

On Tue, Sep 15, 2015 at 5:26 AM, George Sigletos <si...@textkernel.nl>
wrote:

> Hello,
>
> I tried to upgrade two of our clusters from 2.1.8 to 2.1.9. In some, but
> not all nodes, I got errors about corrupt sstables when restarting. I
> downgraded back to 2.1.8 for now.
>
> Has anybody else faced the same problem? Should sstablescrub fix the
> problem? I ddin't tried that yet.
>
> Kind regards,
> George
>
> ERROR [SSTableBatchOpen:3] 2015-09-14 10:16:03,296 FileUtils.java:447 -
> Exiting forcefully due to file system exception on startup, disk failure
> policy "stop"
> org.apache.cassandra.io.sstable.CorruptSSTableException:
> java.io.EOFException
> at
> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source) [na:1.7.0_75]
>         at java.util.concurrent.FutureTask.run(Unknown Source)
> [na:1.7.0_75]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.7.0_75]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.7.0_75]
>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_75]
> Caused by: java.io.EOFException: null
>         at java.io.DataInputStream.readUnsignedShort(Unknown Source)
> ~[na:1.7.0_75]
>         at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75]
>         at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75]
>         at
> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>



-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com