You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "david.pan (JIRA)" <ji...@apache.org> on 2010/01/26 11:00:34 UTC

[jira] Created: (CASSANDRA-742) write operation will throw internal error if the bootstrapping node is down

write operation will throw internal error if the bootstrapping node is down
---------------------------------------------------------------------------

                 Key: CASSANDRA-742
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-742
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.5
         Environment: linux2.6
            Reporter: david.pan
             Fix For: 0.6


the opertions are that :
1) bootstrap a node A;
2) keep on inserting data while bootstrapping;
3) stop the service of the node A;
4) then the following exception was found:
ERROR [pool-1-thread-9] 2010-01-26 10:32:39,688 Cassandra.java (line 1064) Internal error processing insert
java.lang.AssertionError
at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:213)
at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:142)
at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1188)
at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
at org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:417)
at org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:1056)
at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

I traced the code and found that "org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(Collection<InetAddress>)" will select a hinted endpoint for a dead endpoint, no mater whether it's a normal node or a bootstrapping node. To get the tokenID of the endpoint, this method will call "tokenMetadata_.getToken(ep);", but getToken() asserts that the endpoint should be  a member of the ring only. Of course, the bootstrapping endpoint is not a member and a internal exception is throwed out.
This exception will always be throwed out until I re-boostrapping. This is really a big prolem for me, because the bootstrapping will last  30 hours and my machines are not very durable. I have to get up from bed at night to deal with this accident. :-(



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-742) write operation will throw internal error if the bootstrapping node is down

Posted by "david.pan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

david.pan updated CASSANDRA-742:
--------------------------------

    Attachment: 742-write_failed_when_bootstrapping_down.patch

This patch is not a perfect solution for this issue, but I can have a sweet dream at night and I can deal with this accident the next morning.  :-)

This patch will remove the bootstrapping endpoint from the tokenMetadata if other nodes find this node is down.
The write opertion will be timeout before other nodes find the bootstrapping node is down, but it will be OK after other nodes remove the bootstrapping node from the pendingRanges.

> write operation will throw internal error if the bootstrapping node is down
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-742
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: linux2.6
>            Reporter: david.pan
>             Fix For: 0.6
>
>         Attachments: 742-write_failed_when_bootstrapping_down.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> the opertions are that :
> 1) bootstrap a node A;
> 2) keep on inserting data while bootstrapping;
> 3) stop the service of the node A;
> 4) then the following exception was found:
> ERROR [pool-1-thread-9] 2010-01-26 10:32:39,688 Cassandra.java (line 1064) Internal error processing insert
> java.lang.AssertionError
> at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:213)
> at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:142)
> at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
> at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1188)
> at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
> at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
> at org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:417)
> at org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:1056)
> at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> I traced the code and found that "org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(Collection<InetAddress>)" will select a hinted endpoint for a dead endpoint, no mater whether it's a normal node or a bootstrapping node. To get the tokenID of the endpoint, this method will call "tokenMetadata_.getToken(ep);", but getToken() asserts that the endpoint should be  a member of the ring only. Of course, the bootstrapping endpoint is not a member and a internal exception is throwed out.
> This exception will always be throwed out until I re-boostrapping. This is really a big prolem for me, because the bootstrapping will last  30 hours and my machines are not very durable. I have to get up from bed at night to deal with this accident. :-(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-742) write operation will throw internal error if the bootstrapping node is down

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-742.
--------------------------------------

    Resolution: Duplicate

fixed in CASSANDRA-722 for 0.5.1

> write operation will throw internal error if the bootstrapping node is down
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-742
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: linux2.6
>            Reporter: david.pan
>             Fix For: 0.6
>
>         Attachments: 742-write_failed_when_bootstrapping_down.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> the opertions are that :
> 1) bootstrap a node A;
> 2) keep on inserting data while bootstrapping;
> 3) stop the service of the node A;
> 4) then the following exception was found:
> ERROR [pool-1-thread-9] 2010-01-26 10:32:39,688 Cassandra.java (line 1064) Internal error processing insert
> java.lang.AssertionError
> at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:213)
> at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:142)
> at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
> at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1188)
> at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
> at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
> at org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:417)
> at org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:1056)
> at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> I traced the code and found that "org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(Collection<InetAddress>)" will select a hinted endpoint for a dead endpoint, no mater whether it's a normal node or a bootstrapping node. To get the tokenID of the endpoint, this method will call "tokenMetadata_.getToken(ep);", but getToken() asserts that the endpoint should be  a member of the ring only. Of course, the bootstrapping endpoint is not a member and a internal exception is throwed out.
> This exception will always be throwed out until I re-boostrapping. This is really a big prolem for me, because the bootstrapping will last  30 hours and my machines are not very durable. I have to get up from bed at night to deal with this accident. :-(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.