You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Joel Knighton (JIRA)" <ji...@apache.org> on 2016/05/19 23:07:13 UTC

[jira] [Issue Comment Deleted] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted

     [ https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Knighton updated CASSANDRA-11742:
--------------------------------------
    Comment: was deleted

(was: I think this second patch is an improvement - I traced this issue to determine exactly why it worked on 2.1. This behavior was introduced by [CASSANDRA-8049] which centralized Cassandra startup checks. Prior to this change, we inserted cluster name directly after checking the health of the system keyspace, so if an sstable for the system keyspace was flushed, we could guarantee that some sstable contained cluster name. After [CASSANDRA-8049], we insert cluster name with the rest of the local metadata in {{SystemKeyspace.finishStartup()}}.

[~beobal] - I couldn't find a reason for the change as to when cluster name is inserted other than that it didn't seem like a good idea to mutate anything in a startup check. Can you think of any reason we can't just call {{SystemKeyspace.persistLocalMetadata}} immediately after snapshotting the system keyspace in {{CassandraDaemon}}? The root cause of this problem is that we need the data persisted before any truncate/schema logic, since these will write to the system keyspace, so we can have flushed sstables with this data but no sstable with cluster name, which breaks the logic of the system keyspace health check. I ran full unit tests/dtests on a branch that moved {{SystemKeyspace.persistLocalMetadata}} to immediately after the snapshot of the system keyspace and the results looked good.)

> Failed bootstrap results in exception when node is restarted
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-11742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Tommy Stendahl
>            Assignee: Tommy Stendahl
>            Priority: Minor
>             Fix For: 2.2.x, 3.0.x, 3.x
>
>         Attachments: 11742-2.txt, 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a {{org.apache.cassandra.exceptions.ConfigurationException: Found system keyspace files, but they couldn't be loaded!}} exception when the node is restarted. This did not happen in 2.1, it just tried to bootstrap again. I know that the workaround is relatively easy, just delete the system keyspace in the data folder on disk and try again, but its a bit annoying that you have to do that.
> The problem seems to be that the creation of the {{system.local}} table has been moved to just before the bootstrap begins (in 2.1 it was done much earlier) and as a result its still in the memtable och commitlog if the bootstrap failes. Still a few values is inserted to the {{system.local}} table at an earlier point in the startup and they have been flushed from the memtable to an sstable. When the node is restarted the {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed and therefore only see the sstable with an incomplete {{system.local}} table and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in the {{StorageServiceShutdownHook}}, I have included a patch that does this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)