You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ryan McGuire (JIRA)" <ji...@apache.org> on 2013/06/19 04:43:20 UTC
[jira] [Comment Edited] (CASSANDRA-5658) TracingStage frequently
times out
[ https://issues.apache.org/jira/browse/CASSANDRA-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687525#comment-13687525 ]
Ryan McGuire edited comment on CASSANDRA-5658 at 6/19/13 2:42 AM:
------------------------------------------------------------------
cluster.start() does wait for all nodes to be up. It checks for the text "[ip address] is now UP" in the logs.
To your point though, I added a 30 second sleep after the cluster.start() and I was still able to reproduce it. Here's some greps from the logs to show that each node was "UP" before the error occurred:
{code}
$ grep -R "is now UP" *
node1.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:51,855 Gossiper.java (line 771) InetAddress /127.0.0.3 is now UP
node1.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:52,202 Gossiper.java (line 771) InetAddress /127.0.0.2 is now UP
node2.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:52,730 Gossiper.java (line 771) InetAddress /127.0.0.1 is now UP
node2.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:52,748 Gossiper.java (line 771) InetAddress /127.0.0.3 is now UP
node3.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:52,020 Gossiper.java (line 771) InetAddress /127.0.0.1 is now UP
node3.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:53,013 Gossiper.java (line 771) InetAddress /127.0.0.2 is now UP
$ grep -R "ERROR" *
node1.log:ERROR [TracingStage:1] 2013-06-18 22:36:33,336 CassandraDaemon.java (line 196) Exception in thread Thread[TracingStage:1,5,main]
{code}
There shouldn't be much load going on here, it's just a single INSERT. I've uploaded the full logs from each node for reference.
was (Author: enigmacurry):
cluster.start() does wait for all nodes to be up. It's checks for the text "[ip address] is now UP" in the logs.
To your point though, I added a 30 second sleep after the cluster.start() and I was still able to reproduce it. Here's some greps from the logs to show that each node was "UP" before the error occurred:
{code}
$ grep -R "is now UP" *
node1.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:51,855 Gossiper.java (line 771) InetAddress /127.0.0.3 is now UP
node1.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:52,202 Gossiper.java (line 771) InetAddress /127.0.0.2 is now UP
node2.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:52,730 Gossiper.java (line 771) InetAddress /127.0.0.1 is now UP
node2.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:52,748 Gossiper.java (line 771) InetAddress /127.0.0.3 is now UP
node3.log: INFO [RequestResponseStage:1] 2013-06-18 22:35:52,020 Gossiper.java (line 771) InetAddress /127.0.0.1 is now UP
node3.log: INFO [RequestResponseStage:2] 2013-06-18 22:35:53,013 Gossiper.java (line 771) InetAddress /127.0.0.2 is now UP
$ grep -R "ERROR" *
node1.log:ERROR [TracingStage:1] 2013-06-18 22:36:33,336 CassandraDaemon.java (line 196) Exception in thread Thread[TracingStage:1,5,main]
{code}
There shouldn't be much load going on here, it's just a single INSERT. I've uploaded the full logs from each node for reference.
> TracingStage frequently times out
> ---------------------------------
>
> Key: CASSANDRA-5658
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5658
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 2.0
> Reporter: Ryan McGuire
> Attachments: 5658-logs.tar.gz, trace_bug.py
>
>
> I am seeing frequent timeout errors when doing programmatic traces via trace_next_query()
> {code}
> ERROR [TracingStage:1] 2013-06-18 19:10:20,669 CassandraDaemon.java (line 196) Exception in thread Thread[TracingStage:1,5,main]
> java.lang.RuntimeException: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
> at com.google.common.base.Throwables.propagate(Throwables.java:160)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
> at org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:81)
> at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:454)
> at org.apache.cassandra.tracing.TraceState$1.runMayThrow(TraceState.java:100)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ... 3 more
> {code}
> Attached is the sample code which produced this error and the logs. The error occurs directly after the INSERT statement.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira