You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2012/08/20 19:53:38 UTC
[jira] [Commented] (CASSANDRA-3730) If some streaming sessions fail on decommission, decommission hangs

    [ https://issues.apache.org/jira/browse/CASSANDRA-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438031#comment-13438031 ] 

Yuki Morishita commented on CASSANDRA-3730:
-------------------------------------------

Sorry for late reply.
In CASSANDRA-4051, we added IStreamCallback interface to handle streaming failure similar to attached patch. Decommission now does not hang when streaming failure occurred and node will be removed from ring with notice.
However, restoring node state when streaming failed is somewhat difficult. StorageService#setMode is not enough, we also have to recalculate token range and bring back gossip state. I'm not sure if we can make gossip state "transactional" so we can rollback cluster state to its original.
                
> If some streaming sessions fail on decommission, decommission hangs
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-3730
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3730
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: FreeBSD
>            Reporter: Vitalii Tymchyshyn
>              Labels: streaming
>
> Currently cassandra do not handle StreamOutSession fails, e.g.:
>         // Instead of just not calling the callback on failure, we could have
>         // allow to register a specific callback for failures, but we leave
>         // that to a future ticket (likely CASSANDRA-3112)
>         if (callback != null && success)
>             callback.run();
> This means that if during decommission a node that receives decommission data fails or (my case) the node that tries to decommission becomes overloaded, the streaming session fails and decommission don't know anything about this. This makes it hard to decommission overloaded nodes because I need to restart the node to restart decommission.
> Also I can see next errors because of streaming files try to get streaming session that is closed by gossip:
> ERROR [Streaming to /10.112.0.216:1] 2012-01-11 15:57:28,882 AbstractCassandraDaemon.java (line 138) Fatal exception in thread Thread[Streaming to /10.112.0.216:1,5,main]
> java.lang.NullPointerException
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:97)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira