You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Dinesh Joshi (Jira)" <ji...@apache.org> on 2021/11/17 23:08:00 UTC

[jira] [Comment Edited] (CASSANDRA-17116) When zero-copy-streaming sees a channel close this triggers the disk failure policy

    [ https://issues.apache.org/jira/browse/CASSANDRA-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445541#comment-17445541 ] 

Dinesh Joshi edited comment on CASSANDRA-17116 at 11/17/21, 11:07 PM:
----------------------------------------------------------------------

I think with the introduction of {{COMPLETE_ACK}}, we would need to also introduce a timeout after the {{COMPLETE}} message is transmitted but before the sender closes the Channel. This would give the receiving peer time to consume all the data and the {{COMPLETE}} message. This would not only solve the issue but also be backward compatible as we could send {{COMPLETE_ACK}} to peers that support the message and not to other peers.


was (Author: djoshi3):
I think with the introduction of {{COMPLETE_ACK}}, we would need to also introduce a timeout after the {{COMPLETE}} message is transmitted but before the sender closes the Channel. This would give the receiving peer time to consume all the data and the `COMPLETE` message. This would also be backward compatible as we could send {{COMPLETE_ACK}} to peers that support the message and not to other peers.

> When zero-copy-streaming sees a channel close this triggers the disk failure policy
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17116
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17116
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Streaming
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 4.x
>
>
> Found in CASSANDRA-17085.
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1069/workflows/26b7b83a-686f-4516-a56a-0709d428d4f2/jobs/7264
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1069/workflows/26b7b83a-686f-4516-a56a-0709d428d4f2/jobs/7256
> {code}
> ERROR [Stream-Deserializer-/127.0.0.1:7000-f2eb1a15] 2021-11-02 21:35:40,983 DefaultFSErrorHandler.java:104 - Exiting forcefully due to file system exception on startup, disk failure policy "stop"
> org.apache.cassandra.io.FSWriteError: java.nio.channels.ClosedChannelException
> 	at org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.write(BigTableZeroCopyWriter.java:227)
> 	at org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.writeComponent(BigTableZeroCopyWriter.java:206)
> 	at org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:125)
> 	at org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84)
> 	at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:51)
> 	at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:37)
> 	at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
> 	at org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:62)
> 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.channels.ClosedChannelException: null
> 	at org.apache.cassandra.net.AsyncStreamingInputPlus.reBuffer(AsyncStreamingInputPlus.java:136)
> 	at org.apache.cassandra.net.AsyncStreamingInputPlus.consume(AsyncStreamingInputPlus.java:155)
> 	at org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.write(BigTableZeroCopyWriter.java:217)
> 	... 9 common frames omitted
> {code}
> When bootstrap fails and streaming is closed, this triggers the disk failure policy which causes the JVM to halt by default (if this happens outside of bootstrap, then we stop transports and keep the JVM up).
> org.apache.cassandra.streaming.StreamDeserializingTask attempts to handle this by ignoring this exception, but the call to org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize
>  Does try/catch and inspects exception; triggering this condition.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org