You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeff Jirsa (JIRA)" <ji...@apache.org> on 2017/03/17 21:26:41 UTC

[jira] [Updated] (CASSANDRA-11995) Commitlog replaced with all NULs

     [ https://issues.apache.org/jira/browse/CASSANDRA-11995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Jirsa updated CASSANDRA-11995:
-----------------------------------
    Status: Patch Available  (was: In Progress)

|| branch || utests || dtests ||
| [3.0|https://github.com/jeffjirsa/cassandra/tree/cassandra-3.0-11995] | [testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.0-11995-testall/] | [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.0-11995-dtest/] |
| [3.11|https://github.com/jeffjirsa/cassandra/tree/cassandra-3.11-11995] | [testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.11-11995-testall/] | [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.11-11995-dtest/] |
| [trunk|https://github.com/jeffjirsa/cassandra/tree/cassandra-11995] | [testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-11995-testall/] | [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-11995-dtest/] |

Note to reviewer: [~aweisberg] and I talked about this offline a bit, and one of the things worth questioning is "how do we even get in this position". It seems like there may be a window after [CommitLogDescriptor.writeHeader|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/commitlog/CommitLogSegment.java#L168-L171] is called where we don't actually sync, but short of a system reboot, we should still have the data in memory and the kernel should keep it consistent - however, if we're crashing for some other reason, we could certainly have an all-0 file, which will fail to replay. We may want to open up a subsequent JIRA to talk address that particular problem, but we see it as distinct from the replay problem. 

This patch, then, is only dealing with the problem of replaying the final all-0 file, which we consider to be a change in behavior from 2.x. Continuing to replay a "corrupt" all-null file is the 2.x behavior, and presumably should only allowed if we're the last segment, which we already explicitly tolerate in the rest of that segment via {{tolerateTruncation}} flag - this patch just makes {{tolerateTruncation}} also tolerate truncation of the header without interrupting replay and startup.


> Commitlog replaced with all NULs
> --------------------------------
>
>                 Key: CASSANDRA-11995
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11995
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Windows 10 Enterprise 1511
> DataStax Cassandra Community Server 2.2.3
>            Reporter: James Howe
>            Assignee: Jeff Jirsa
>
> I noticed this morning that Cassandra was failing to start, after being shut down on Friday.
> {code}
> ERROR 09:13:37 Exiting due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Could not read commit log descriptor in file C:\Program Files\DataStax Community\data\commitlog\CommitLog-5-1465571056722.log
> 	at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:622) [apache-cassandra-2.2.3.jar:2.2.3]
> 	at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:302) [apache-cassandra-2.2.3.jar:2.2.3]
> 	at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:147) [apache-cassandra-2.2.3.jar:2.2.3]
> 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-2.2.3.jar:2.2.3]
> 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-2.2.3.jar:2.2.3]
> 	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:273) [apache-cassandra-2.2.3.jar:2.2.3]
> 	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:513) [apache-cassandra-2.2.3.jar:2.2.3]
> 	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) [apache-cassandra-2.2.3.jar:2.2.3]
> {code}
> Checking the referenced file reveals it comprises 33,554,432 (32 * 1024 * 1024) NUL bytes.
> No logs (stdout, stderr, prunsrv) from the shutdown show any other issues and appear exactly as normal.
> Is installed as a service via DataStax's distribution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)