You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Michael Shuler (JIRA)" <ji...@apache.org> on 2014/07/29 19:14:39 UTC

[jira] [Resolved] (CASSANDRA-5924) If migration (upgrade) failed mid-way, some data will be "lost" on the upgraded instance

     [ https://issues.apache.org/jira/browse/CASSANDRA-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Shuler resolved CASSANDRA-5924.
---------------------------------------

    Resolution: Not a Problem

Closing as not a problem, due to unexpected data where data is expected. Feel free to re-open with some concrete reproduction steps on the latest version of 1.2.x or 2.0.x, if you would like to pursue further.  Thanks!

> If migration (upgrade) failed mid-way, some data will be "lost" on the upgraded instance
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5924
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5924
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jackson Chung
>
> When upgrading from 1.0 to 1.1, C* checks from the system keyspace (schema_keyspaces) to see if a migration is needed.
> When it is needed, it proceeds with migrate migrateSSTables.
> But this process does not have any particular order (File.listFiles() has no guarantee order), and IOException can be thrown (eg fail to create directory).
> In some of our upgrades, system was migrated first, followed by some KSs/CFs, but before it finishes all the KSs/CFs, it failed on a custom directory, with files in this directory that similar to sstables file convention (contains "-"). 
> They really shouldn't be there and we are removing them. But this results in C* tried to create directory for this file, but it fails, because of ownership/permission, with IOException. As a result C* failed to start.
> Without knowing why C* failed to start to begin with, C* was restarted. Only this time C* does not think it needs to migrate any more (system already migrated, so schema_keyspaces exists). This results in the those remaining KS/CF failed to be migrated.
> Our root cause is because of the custom directory and the ownership/permission of it, and again we are removing them to re-upgrade. But the purpose of this jira is IOException (or any other exception) can still be thrown for various reasons during this process, and can result in the same problem: some CF failed to be migrated.
> 1.2 seems to have some handling codes, but it looks like a RuntimeException would still be thrown, and that would still be caught by the AbstractCassandraDaemon (or CassandraDaemon if 1.2) :
> {code}
>         catch (Throwable e)
>         {
>             logger.error("Exception encountered during startup", e);
>             // try to warn user on stdout too, if we haven't already detached
>             e.printStackTrace();
>             System.out.println("Exception encountered during startup: " + e.getMessage());
>             System.exit(3);
>         }
> {code}
> And so I think this problem still exists in 1.2



--
This message was sent by Atlassian JIRA
(v6.2#6252)