You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ryan McGuire (JIRA)" <ji...@apache.org> on 2013/06/19 04:43:20 UTC

[jira] [Comment Edited] (CASSANDRA-5658) TracingStage frequently times out

    [ https://issues.apache.org/jira/browse/CASSANDRA-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687525#comment-13687525 ] 

Ryan McGuire edited comment on CASSANDRA-5658 at 6/19/13 2:42 AM:
------------------------------------------------------------------

cluster.start() does wait for all nodes to be up. It checks for the text "[ip address] is now UP" in the logs.

To your point though, I added a 30 second sleep after the cluster.start() and I was still able to reproduce it. Here's some greps from the logs to show that each node was "UP" before the error occurred:

{code}
$ grep -R "is now UP" *
node1.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:51,855 Gossiper.java (line 771) InetAddress /127.0.0.3 is now UP
node1.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:52,202 Gossiper.java (line 771) InetAddress /127.0.0.2 is now UP
node2.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:52,730 Gossiper.java (line 771) InetAddress /127.0.0.1 is now UP
node2.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:52,748 Gossiper.java (line 771) InetAddress /127.0.0.3 is now UP
node3.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:52,020 Gossiper.java (line 771) InetAddress /127.0.0.1 is now UP
node3.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:53,013 Gossiper.java (line 771) InetAddress /127.0.0.2 is now UP

$ grep -R "ERROR" *
node1.log:ERROR [TracingStage:1] 2013-06-18 22:36:33,336 CassandraDaemon.java (line 196) Exception in thread Thread[TracingStage:1,5,main]
{code}

There shouldn't be much load going on here, it's just a single INSERT. I've uploaded the full logs from each node for reference.
                
      was (Author: enigmacurry):
    cluster.start() does wait for all nodes to be up. It's checks for the text "[ip address] is now UP" in the logs.

To your point though, I added a 30 second sleep after the cluster.start() and I was still able to reproduce it. Here's some greps from the logs to show that each node was "UP" before the error occurred:

{code}
$ grep -R "is now UP" *
node1.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:51,855 Gossiper.java (line 771) InetAddress /127.0.0.3 is now UP
node1.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:52,202 Gossiper.java (line 771) InetAddress /127.0.0.2 is now UP
node2.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:52,730 Gossiper.java (line 771) InetAddress /127.0.0.1 is now UP
node2.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:52,748 Gossiper.java (line 771) InetAddress /127.0.0.3 is now UP
node3.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:52,020 Gossiper.java (line 771) InetAddress /127.0.0.1 is now UP
node3.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:53,013 Gossiper.java (line 771) InetAddress /127.0.0.2 is now UP

$ grep -R "ERROR" *
node1.log:ERROR [TracingStage:1] 2013-06-18 22:36:33,336 CassandraDaemon.java (line 196) Exception in thread Thread[TracingStage:1,5,main]
{code}

There shouldn't be much load going on here, it's just a single INSERT. I've uploaded the full logs from each node for reference.
                  
> TracingStage frequently times out
> ---------------------------------
>
>                 Key: CASSANDRA-5658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5658
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 2.0
>            Reporter: Ryan McGuire
>         Attachments: 5658-logs.tar.gz, trace_bug.py
>
>
> I am seeing frequent timeout errors when doing programmatic traces via trace_next_query()
> {code}
> ERROR [TracingStage:1] 2013-06-18 19:10:20,669 CassandraDaemon.java (line 196) Exception in thread Thread[TracingStage:1,5,main]
> java.lang.RuntimeException: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
>         at com.google.common.base.Throwables.propagate(Throwables.java:160)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
>         at org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:81)
>         at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:454)
>         at org.apache.cassandra.tracing.TraceState$1.runMayThrow(TraceState.java:100)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         ... 3 more
> {code}
> Attached is the sample code which produced this error and the logs. The error occurs directly after the INSERT statement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira