You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rustam Aliyev <ru...@code.az> on 2013/04/03 00:39:06 UTC

Cassandra 1.0.10 to 1.2.3 upgrade "post-mortem"

Hi,

I just wanted to share our experience of upgrading 1.0.10 to 1.2.3. It 
happened that first we upgraded both of our two seeds to 1.2.3. And 
basically after that old nodes couldn't communicate with new ones 
anymore. Cluster was down until we upgraded all nodes to 1.2.3. We don't 
have many nodes and that process didn't took long. Yet it caused outage 
for ~10 mins.

Here are some logs:

On the new, freshly upgraded seed node (v1.2.3):
----------------
ERROR [OptionalTasks:1] 2013-03-31 08:48:19,370 CassandraDaemon.java 
(line 164) Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at 
org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:137)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
  WARN [MutationStage:20] 2013-03-31 08:48:23,613 StorageProxy.java 
(line 577) Unable to store hint for host with missing ID, /10.0.1.8 (old 
node?)

....

ERROR [MutationStage:33] 2013-03-31 09:00:02,899 CassandraDaemon.java 
(line 164) Exception in thread Thread[MutationStage:33,5,main]
java.lang.AssertionError: Missing host ID for 10.0.1.8
at 
org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:580)
at 
org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:555)
at 
org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1643)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

----------------

At the same time, old nodes (v1.0.10) were blinded:

----------------
ERROR [RequestResponseStage:441] 2013-03-31 09:04:07,955 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[RequestResponseStage:441,5,main]
java.io.IOError: java.io.EOFException
at 
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
at org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:132)
at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at 
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
at 
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
at 
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
... 6 more

.....

  INFO [GossipStage:3] 2013-03-31 09:06:08,885 Gossiper.java (line 804) 
InetAddress /10.0.1.8 is now UP
ERROR [GossipStage:3] 2013-03-31 09:06:08,885 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[GossipStage:3,5,main]
java.lang.UnsupportedOperationException: Not a time-based UUID
at java.util.UUID.timestamp(UUID.java:308)
at 
org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
at 
org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
at 
org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:923)
at 
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
ERROR [GossipStage:3] 2013-03-31 09:06:08,886 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[GossipStage:3,5,main]
java.lang.UnsupportedOperationException: Not a time-based UUID
at java.util.UUID.timestamp(UUID.java:308)
at 
org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
at 
org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
at 
org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:923)
at 
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
----------

I've seen other people reporting similar problem with 1.1.x.

Not sure, but maybe we could avoid this by upgrading seeds last.

Best,
Rustam.


Re: Cassandra 1.0.10 to 1.2.3 upgrade "post-mortem"

Posted by Rustam Aliyev <ru...@code.az>.
On 04/04/2013 02:24, aaron morton wrote:
>> I just wanted to share our experience of upgrading 1.0.10 to 1.2.3
> In general it's dangerous to skip a major release when upgrading.
>
True. But in that case it was supposed to be fine.
>> ERROR [MutationStage:33] 2013-03-31 09:00:02,899 CassandraDaemon.java 
>> (line 164) Exception in thread Thread[MutationStage:33,5,main]
>> java.lang.AssertionError: Missing host ID for 10.0.1.8
> Was 10.0.1.8 been updated ?
IIRC not at this stage. 10.0.1.8 was second seed server (at that moment 
1.0.10) and this particular error appeared on the first seed server 
after upgrade to 1.2.3.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/04/2013, at 4:09 AM, Rustam Aliyev <rustam.lists@code.az 
> <ma...@code.az>> wrote:
>
>> Hi,
>>
>> I just wanted to share our experience of upgrading 1.0.10 to 1.2.3. 
>> It happened that first we upgraded both of our two seeds to 1.2.3. 
>> And basically after that old nodes couldn't communicate with new ones 
>> anymore. Cluster was down until we upgraded all nodes to 1.2.3. We 
>> don't have many nodes and that process didn't took long. Yet it 
>> caused outage for ~10 mins.
>>
>> Here are some logs:
>>
>> On the new, freshly upgraded seed node (v1.2.3):
>> ----------------
>> ERROR [OptionalTasks:1] 2013-03-31 08:48:19,370 CassandraDaemon.java 
>> (line 164) Exception in thread Thread[OptionalTasks:1,5,main]
>> java.lang.NullPointerException
>> at 
>> org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:137)
>> at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>> WARN [MutationStage:20] 2013-03-31 08:48:23,613 StorageProxy.java 
>> (line 577) Unable to store hint for host with missing ID, /10.0.1.8 
>> (old node?)
>>
>> ....
>>
>> ERROR [MutationStage:33] 2013-03-31 09:00:02,899 CassandraDaemon.java 
>> (line 164) Exception in thread Thread[MutationStage:33,5,main]
>> java.lang.AssertionError: Missing host ID for 10.0.1.8
>> at 
>> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:580)
>> at 
>> org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:555)
>> at 
>> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1643)
>> at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>>
>> ----------------
>>
>> At the same time, old nodes (v1.0.10) were blinded:
>>
>> ----------------
>> ERROR [RequestResponseStage:441] 2013-03-31 09:04:07,955 
>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
>> Thread[RequestResponseStage:441,5,main]
>> java.io.IOError: java.io.EOFException
>> at 
>> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
>> at 
>> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:132)
>> at 
>> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
>> at 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.io.EOFException
>> at java.io.DataInputStream.readFully(DataInputStream.java:180)
>> at 
>> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
>> at 
>> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
>> at 
>> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
>> ... 6 more
>>
>> .....
>>
>> INFO [GossipStage:3] 2013-03-31 09:06:08,885 Gossiper.java (line 804) 
>> InetAddress /10.0.1.8 is now UP
>> ERROR [GossipStage:3] 2013-03-31 09:06:08,885 
>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
>> Thread[GossipStage:3,5,main]
>> java.lang.UnsupportedOperationException: Not a time-based UUID
>> at java.util.UUID.timestamp(UUID.java:308)
>> at 
>> org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
>> at 
>> org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
>> at 
>> org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
>> at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
>> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:923)
>> at 
>> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
>> at 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>> ERROR [GossipStage:3] 2013-03-31 09:06:08,886 
>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
>> Thread[GossipStage:3,5,main]
>> java.lang.UnsupportedOperationException: Not a time-based UUID
>> at java.util.UUID.timestamp(UUID.java:308)
>> at 
>> org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
>> at 
>> org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
>> at 
>> org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
>> at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
>> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:923)
>> at 
>> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
>> at 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>> ----------
>>
>> I've seen other people reporting similar problem with 1.1.x.
>>
>> Not sure, but maybe we could avoid this by upgrading seeds last.
>>
>> Best,
>> Rustam.
>>
>


Re: Cassandra 1.0.10 to 1.2.3 upgrade "post-mortem"

Posted by aaron morton <aa...@thelastpickle.com>.
> I just wanted to share our experience of upgrading 1.0.10 to 1.2.3
In general it's dangerous to skip a major release when upgrading. 

> ERROR [MutationStage:33] 2013-03-31 09:00:02,899 CassandraDaemon.java (line 164) Exception in thread Thread[MutationStage:33,5,main]
> java.lang.AssertionError: Missing host ID for 10.0.1.8
Was 10.0.1.8 been updated ?

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/04/2013, at 4:09 AM, Rustam Aliyev <ru...@code.az> wrote:

> Hi,
> 
> I just wanted to share our experience of upgrading 1.0.10 to 1.2.3. It happened that first we upgraded both of our two seeds to 1.2.3. And basically after that old nodes couldn't communicate with new ones anymore. Cluster was down until we upgraded all nodes to 1.2.3. We don't have many nodes and that process didn't took long. Yet it caused outage for ~10 mins.
> 
> Here are some logs:
> 
> On the new, freshly upgraded seed node (v1.2.3):
> ----------------
> ERROR [OptionalTasks:1] 2013-03-31 08:48:19,370 CassandraDaemon.java (line 164) Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.NullPointerException
> at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:137)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> WARN [MutationStage:20] 2013-03-31 08:48:23,613 StorageProxy.java (line 577) Unable to store hint for host with missing ID, /10.0.1.8 (old node?)
> 
> ....
> 
> ERROR [MutationStage:33] 2013-03-31 09:00:02,899 CassandraDaemon.java (line 164) Exception in thread Thread[MutationStage:33,5,main]
> java.lang.AssertionError: Missing host ID for 10.0.1.8
> at org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:580)
> at org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:555)
> at org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1643)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 
> ----------------
> 
> At the same time, old nodes (v1.0.10) were blinded:
> 
> ----------------
> ERROR [RequestResponseStage:441] 2013-03-31 09:04:07,955 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[RequestResponseStage:441,5,main]
> java.io.IOError: java.io.EOFException
> at org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
> at org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:132)
> at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
> at org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
> at org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
> ... 6 more
> 
> .....
> 
> INFO [GossipStage:3] 2013-03-31 09:06:08,885 Gossiper.java (line 804) InetAddress /10.0.1.8 is now UP
> ERROR [GossipStage:3] 2013-03-31 09:06:08,885 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[GossipStage:3,5,main]
> java.lang.UnsupportedOperationException: Not a time-based UUID
> at java.util.UUID.timestamp(UUID.java:308)
> at org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
> at org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
> at org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
> at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:923)
> at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> ERROR [GossipStage:3] 2013-03-31 09:06:08,886 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[GossipStage:3,5,main]
> java.lang.UnsupportedOperationException: Not a time-based UUID
> at java.util.UUID.timestamp(UUID.java:308)
> at org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
> at org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
> at org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
> at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:923)
> at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> ----------
> 
> I've seen other people reporting similar problem with 1.1.x.
> 
> Not sure, but maybe we could avoid this by upgrading seeds last.
> 
> Best,
> Rustam.
>