You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Vincent White (JIRA)" <ji...@apache.org> on 2017/11/28 13:26:00 UTC

[jira] [Commented] (CASSANDRA-14047) test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: Missing: ['127.0.0.3.* now UP']:

    [ https://issues.apache.org/jira/browse/CASSANDRA-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268714#comment-16268714 ] 

Vincent White commented on CASSANDRA-14047:
-------------------------------------------

While looking into this the only error I came across was related to joining nodes begin sent read messages for column families they didn't know about yet. Which look like ...


{noformat}
WARN  [MessagingService-NettyInbound-Thread-3-1] 2017-11-28 12:40:41,123 MessageInHandler.java:273 - Got message from unknown table while reading from socket; closing
org.apache.cassandra.exceptions.UnknownTableException: Couldn't find table with id 5bc52802-de25-35ed-aeab-188eecebb090. If a table was just created, this is likely due to the sch
emanot being fully propagated.  Please wait for schema agreement on table creation.
        at org.apache.cassandra.schema.Schema.getExistingTableMetadata(Schema.java:474) ~[main/:na]
        at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:670) ~[main/:na]
        at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:606) ~[main/:na]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:139) ~[main/:na]
        at org.apache.cassandra.net.async.MessageInHandler.decode(MessageInHandler.java:178) ~[main/:na]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) [netty-all-4.1.14.Final.jar:4.1.14.Final]

{noformat}

...followed by 
{noformat}
WARN  [MessagingService-NettyInbound-Thread-3-1] 2017-11-28 12:40:41,193 MessageInHandler.java:277 - Unexpected exception caught in inbound channel pipeline from /127.0.0.1:41522
java.lang.ArrayIndexOutOfBoundsException: 90
        at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:657) ~[main/:na]
        at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:606) ~[main/:na]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:139) ~[main/:na]
        at org.apache.cassandra.net.async.MessageInHandler.decode(MessageInHandler.java:178) ~[main/:na]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) [netty-all-4.1.14.Final.jar:4.1.14.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) [netty-all-4.1.14.Final.jar:4.1.14.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) [netty-all-4.1.14.Final.jar:4.1.14.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-all-4.1.14.Final.jar:4.1.14.Final]
{noformat}

...as C* appears to get stuck (at least for a good amount of time) in a loop where decode() is called (after the initial exception), ReadCommand() throws an exception(due to the read index into the buffer being left in an incorrect state following the initial exception) each time which again calls ctx.close() on the channel before this loop continues. 

I probably don't have enough netty knowledge to propose a good fix for this but [this|https://github.com/apache/cassandra/compare/trunk...vincewhite:debug-1] appeared to help, though I didn't get time to look too deeply into the new behaviour. Maybe someone like [~jasobrown] would have a more correct fix. 

I suspect this is probably affects most of these consistency tests. 

> test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: Missing: ['127.0.0.3.* now UP']:
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14047
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14047
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Testing
>            Reporter: Michael Kjellman
>            Assignee: Vincent White
>
> test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: Missing: ['127.0.0.3.* now UP']:
> 15 Nov 2017 11:23:37 [node1] Missing: ['127.0.0.3.* now UP']:
> INFO  [main] 2017-11-15 11:21:32,452 YamlConfigura.....
> See system.log for remainder
> -------------------- >> begin captured logging << --------------------
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-v3VgyS
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
>     'num_tokens': '32',
>     'phi_convict_threshold': 5,
>     'range_request_timeout_in_ms': 10000,
>     'read_request_timeout_in_ms': 10000,
>     'request_timeout_in_ms': 10000,
>     'truncate_request_timeout_in_ms': 10000,
>     'write_request_timeout_in_ms': 10000}
> dtest: DEBUG: Testing single dc, users, each quorum reads
> --------------------- >> end captured logging << ---------------------
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/cassandra/cassandra-dtest/tools/decorators.py", line 48, in wrapped
>     f(obj)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 621, in test_simple_strategy_each_quorum_users
>     self._run_test_function_in_parallel(TestAccuracy.Validation.validate_users, [self.nodes], [self.rf], combinations)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in _run_test_function_in_parallel
>     self._start_cluster(save_sessions=True, requires_local_reads=requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 141, in _start_cluster
>     cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
>   File "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", line 428, in start
>     node.watch_log_for_alive(other_node, from_mark=mark)
>   File "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 520, in watch_log_for_alive
>     self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, filename=filename)
>   File "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 488, in watch_log_for
>     raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + reads[:50] + ".....\nSee {} for remainder".format(filename))
> "15 Nov 2017 11:23:37 [node1] Missing: ['127.0.0.3.* now UP']:\nINFO  [main] 2017-11-15 11:21:32,452 YamlConfigura.....\nSee system.log for remainder\n-------------------- >> begin captured logging << --------------------\ndtest: DEBUG: cluster ccm directory: /tmp/dtest-v3VgyS\ndtest: DEBUG: Done setting configuration options:\n{   'initial_token': None,\n    'num_tokens': '32',\n    'phi_convict_threshold': 5,\n    'range_request_timeout_in_ms': 10000,\n    'read_request_timeout_in_ms': 10000,\n    'request_timeout_in_ms': 10000,\n    'truncate_request_timeout_in_ms': 10000,\n    'write_request_timeout_in_ms': 10000}\ndtest: DEBUG: Testing single dc, users, each quorum reads\n--------------------- >> end captured logging << ---------------------"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org