You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Vitalii Tymchyshyn (Commented) (JIRA)" <ji...@apache.org> on 2012/01/27 13:33:38 UTC

[jira] [Commented] (CASSANDRA-3730) If some streaming sessions fail on decommission, decommission hangs

    [ https://issues.apache.org/jira/browse/CASSANDRA-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194630#comment-13194630 ] 

Vitalii Tymchyshyn commented on CASSANDRA-3730:
-----------------------------------------------

I've introduced simplistic handling that should at least abort decommission or move command with problematic streaming sessions: https://github.com/apache/cassandra/pull/6
                
> If some streaming sessions fail on decommission, decommission hangs
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-3730
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3730
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1
>         Environment: FreeBSD
>            Reporter: Vitalii Tymchyshyn
>
> Currently cassandra do not handle StreamOutSession fails, e.g.:
>         // Instead of just not calling the callback on failure, we could have
>         // allow to register a specific callback for failures, but we leave
>         // that to a future ticket (likely CASSANDRA-3112)
>         if (callback != null && success)
>             callback.run();
> This means that if during decommission a node that receives decommission data fails or (my case) the node that tries to decommission becomes overloaded, the streaming session fails and decommission don't know anything about this. This makes it hard to decommission overloaded nodes because I need to restart the node to restart decommission.
> Also I can see next errors because of streaming files try to get streaming session that is closed by gossip:
> ERROR [Streaming to /10.112.0.216:1] 2012-01-11 15:57:28,882 AbstractCassandraDaemon.java (line 138) Fatal exception in thread Thread[Streaming to /10.112.0.216:1,5,main]
> java.lang.NullPointerException
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:97)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira