You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "david.pan (JIRA)" <ji...@apache.org> on 2010/01/26 11:06:34 UTC
[jira] Updated: (CASSANDRA-742) write operation will throw internal error if the bootstrapping node is down

     [ https://issues.apache.org/jira/browse/CASSANDRA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

david.pan updated CASSANDRA-742:
--------------------------------

    Attachment: 742-write_failed_when_bootstrapping_down.patch

This patch is not a perfect solution for this issue, but I can have a sweet dream at night and I can deal with this accident the next morning.  :-)

This patch will remove the bootstrapping endpoint from the tokenMetadata if other nodes find this node is down.
The write opertion will be timeout before other nodes find the bootstrapping node is down, but it will be OK after other nodes remove the bootstrapping node from the pendingRanges.

> write operation will throw internal error if the bootstrapping node is down
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-742
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: linux2.6
>            Reporter: david.pan
>             Fix For: 0.6
>
>         Attachments: 742-write_failed_when_bootstrapping_down.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> the opertions are that :
> 1) bootstrap a node A;
> 2) keep on inserting data while bootstrapping;
> 3) stop the service of the node A;
> 4) then the following exception was found:
> ERROR [pool-1-thread-9] 2010-01-26 10:32:39,688 Cassandra.java (line 1064) Internal error processing insert
> java.lang.AssertionError
> at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:213)
> at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:142)
> at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
> at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1188)
> at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
> at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
> at org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:417)
> at org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:1056)
> at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> I traced the code and found that "org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(Collection<InetAddress>)" will select a hinted endpoint for a dead endpoint, no mater whether it's a normal node or a bootstrapping node. To get the tokenID of the endpoint, this method will call "tokenMetadata_.getToken(ep);", but getToken() asserts that the endpoint should be  a member of the ring only. Of course, the bootstrapping endpoint is not a member and a internal exception is throwed out.
> This exception will always be throwed out until I re-boostrapping. This is really a big prolem for me, because the bootstrapping will last  30 hours and my machines are not very durable. I have to get up from bed at night to deal with this accident. :-(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.