You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2020/07/10 12:09:58 UTC

[GitHub] [cassandra] blerer opened a new pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

blerer opened a new pull request #675:
URL: https://github.com/apache/cassandra/pull/675


   The patch ensure that the data of the local system keyspaces is stored by default within the first data directory (at the exception of the paxos table) to allow the node to tolerate a failure of the disks associated with the other directories.
   The patch also allow users to configure a different directory for the system keyspaces. This would allow people to use a disk providing redundancy to support the lost of any of the disks used to store the data.
   On startup existing system keyspace data will be automatically migrated to support 4.0 upgrades or configuration changes (use of a separate disk for the system keyspaces). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r564437724



##########
File path: NEWS.txt
##########
@@ -38,6 +38,14 @@ using the provided 'sstableupgrade' tool.
 
 New features
 ------------
+    - The data of the system keyspaces using a local strategy (at the exception of the system.batches,
+      system.paxos, system.compaction_history, system.prepared_statements and system.repair tables)
+      is now stored by default in the first data directory. This approach will allow the server

Review comment:
       Perhaps `is now stored by default in the first data directory, instead of being distributed among all the data directories.`, to be even more clear?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r565440824



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForLocalSystemData()
+    {
+        return conf.local_system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code local_system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getLocalSystemKeyspacesDataFileLocations()

Review comment:
       I see the problem with tests, `DatabaseDescriptor#getLocalSystemKeyspacesDataFileLocations` is using an empty array to represent the fact that `data_file_directories` hasn't been set. [Here](https://github.com/adelapena/cassandra/commit/93bbe2e3a983811cce91c3ded989346ee8ea7c4d) is what I tried, returning a single `String` that can be null if `data_file_directories` is empty. Managing this situation inside the constructor of `DataDirectories` makes those tests to pass ([here](https://app.circleci.com/pipelines/github/adelapena/cassandra/167/workflows/aed26434-3657-47fa-a3a5-c5b20eced2aa)), while having a `DatabaseDescriptor#getLocalSystemKeyspacesDataFileLocation` method returning a nullable string (it could also be an optional). Nevertheless I think this is not a big deal and I guess we can just return an array in `getLocalSystemKeyspacesDataFileLocations`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on pull request #675:
URL: https://github.com/apache/cassandra/pull/675#issuecomment-748185013


   Regarding the testing of the migration part. The migration part is automatically tested in all the upgrade tests that create a keyspace or/and table before upgrading and used it after.
   One part that we do not test is the import in the same version. I will look into how to test that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] krummas commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
krummas commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r553354189



##########
File path: doc/source/configuration/cass_env_sh_file.rst
##########
@@ -28,6 +28,10 @@ Setting this property to true causes the dynamic snitch to ignore the severity i
 
 **Default:** false
 
+``cassandra.import_system_data_files_from``
+-------------------------------------------
+The directory location of the ``local system keyspaces`` data to import. This property can be used to move back to the first data directory the ``local system keyspaces`` data that were stored in a specific directory using the ``system_data_file_directory`` cassandra.yaml property.

Review comment:
       Could you expand on this? Not sure I understand what the use case would be for this? Is it if a user changes the configured `system_data_file_directory`? If so I think we could expect them to manually move the files as well

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -632,6 +642,106 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the local system keyspaces.
+         */
+        private final DataDirectory[] systemKeyspaceDataDirectories;
+
+        /**
+         * The directories where the data of the non local system keyspaces should be stored.
+         */
+        private final DataDirectory[] nonSystemKeyspacesDirectories;
+
+
+        public DataDirectories(String[] locationsForNonSystemKeyspaces, String[] locationsForSystemKeyspace)
+        {
+            nonSystemKeyspacesDirectories = toDataDirectories(locationsForNonSystemKeyspaces);
+            systemKeyspaceDataDirectories = toDataDirectories(locationsForSystemKeyspace);
+        }
+
+        private static DataDirectory[] toDataDirectories(String... locations)
+        {
+            DataDirectory[] directories = new DataDirectory[locations.length];
+            for (int i = 0; i < locations.length; ++i)
+                directories[i] = new DataDirectory(new File(locations[i]));
+            return directories;
+        }
+
+        /**
+         * Returns the data directories used to store the data of the specified keyspace.
+         *
+         * @param keyspace the keyspace name
+         * @return the data directories used to store the data of the specified keyspace
+         */
+        public DataDirectory[] getDataDirectoriesUsedBy(String keyspace)

Review comment:
       this is confusing - we return all directories when its a system keyspace? (guess its because of the paxos exception)
   
   looking at the uses of this method it seems to all be cleanup/check related so maybe it would make sense to remove it and just use `getAllDirectories()` ?

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};
+    }
+
+    /**
+     * Returns the locations where should be stored the non local system keyspaces data.
+     *
+     * @return the locations where should be stored the non local system keyspaces data
+     */
+    public static String[] getNonSystemKeyspacesDataFileLocations()

Review comment:
       Yeah keeping `system` in the name is probably a good idea

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};

Review comment:
       Feels slightly wrong to rely on the order of the directories in the configuration - maybe better to sort and grab the first one. My concern is that if a user auto-generates the config which changes the order of the directories in the config file and we have to move the files every startup (not the end of the world, but anyway)

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)

Review comment:
       nit: maybe `if (useSpecificLocationForSystemData())` instead of null check

##########
File path: src/java/org/apache/cassandra/io/util/FileUtils.java
##########
@@ -969,4 +973,70 @@ public Object getAttribute(String attribute) throws IOException
             return fileStore.getAttribute(attribute);
         }
     }
+
+    /**
+     * Moves the contents of a directory to another directory.
+     * <p>Once a file has been copied to the target directory it will be deleted from the source directory.
+     * If a file already exists in the target directory a warning will be logged and the file will not
+     * be deleted.</p>
+     *
+     * @param source the directory containing the files to move
+     * @param target the directory where the files must be moved
+     */
+    public static void moveRecursively(Path source, Path target) throws IOException
+    {
+        logger.info("Moving {} to {}" , source, target);
+
+        if (Files.isDirectory(source))
+        {
+            Files.createDirectories(target);
+
+            try (Stream<Path> paths = Files.list(source))

Review comment:
       I found this part quite hard to follow (the resolving/relativizing) 
   
   Would be a bit simpler if we could do something like this?:
   ```
   for (File f : source.toFile().listFiles())
   {
       String fileName = f.getName();
       moveRecursively(source.resolve(fileName), target.resolve(fileName));
   }
   ```
   

##########
File path: src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java
##########
@@ -67,6 +65,18 @@ public void handleFSError(FSError e)
                 StorageService.instance.stopTransports();
                 break;
             case best_effort:
+
+                // There are a few scenarios where we know that the node will not be able to operate properly.
+                // For those scenarios we want to stop the transports and let the administrators handle the problem.
+                // Those scenarios are:
+                // * All the disks are full
+                // * All the disks for a given keyspace have been marked as unwriteable

Review comment:
       can't we continue serving reads here?

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -472,6 +492,78 @@ public void runStartupChecks()
         }
 
     }
+
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    public void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.import_system_data_files_from");
+
+        // If there is only one directory and no system keyspace directory has been specified we do not need to do
+        // anything. If it is not the case we want to try to migrate the data.
+        if (!DatabaseDescriptor.useSpecificLocationForSystemData()
+                && DatabaseDescriptor.getNonSystemKeyspacesDataFileLocations().length <= 1
+                && importSystemDataFrom == null)
+            return;
+
+        // We can face several cases:
+        //  1) The system data are spread accross the data file locations and need to be moved to
+        //     the first data location (upgrade to 4.0)
+        //  2) The system data are spread accross the data file locations and need to be moved to
+        //     the system keyspace location configured by the user (upgrade to 4.0)
+        //  3) The system data are stored in the first data location and need to be moved to
+        //     the system keyspace location configured by the user (system_data_file_directory has been configured)
+        //  4) The system data have been stored in the system keyspace location configured by the user
+        //     and need to be moved to the first data location (the import of the data has been requested)
+        Path target = Paths.get(DatabaseDescriptor.getSystemKeyspacesDataFileLocations()[0]);
+
+        String[] nonSystemKeyspacesFileLocations = DatabaseDescriptor.getNonSystemKeyspacesDataFileLocations();
+        String[] sources = importSystemDataFrom != null
+            ? new String[] {importSystemDataFrom}
+            : DatabaseDescriptor.useSpecificLocationForSystemData() ? nonSystemKeyspacesFileLocations
+                                                                    : Arrays.copyOfRange(nonSystemKeyspacesFileLocations, 1, nonSystemKeyspacesFileLocations.length);
+
+        for (String source : sources)
+        {
+            Path dataFileLocation = Paths.get(source);
+
+            if (!Files.exists(dataFileLocation))
+                continue;
+
+            try (Stream<Path> locationChildren = Files.list(dataFileLocation))
+            {
+                Path[] keyspaceDirectories = locationChildren.filter(p -> SchemaConstants.isLocalSystemKeyspace(p.getFileName().toString()))
+                        .toArray(Path[]::new);
+
+                for (Path keyspaceDirectory : keyspaceDirectories)
+                {
+                    try (Stream<Path> keyspaceChildren = Files.list(keyspaceDirectory))
+                    {
+                        Path[] tableDirectories = keyspaceChildren.filter(Files::isDirectory)

Review comment:
       nit: indentation is a bit broken here - `.filter` should align with the first `.filter`

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -472,6 +492,78 @@ public void runStartupChecks()
         }
 
     }
+
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    public void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.import_system_data_files_from");

Review comment:
       As mentioned above, I think we can remove this property which would simplify this method a bit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] smiklosovic closed pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
smiklosovic closed pull request #675:
URL: https://github.com/apache/cassandra/pull/675


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r545863755



##########
File path: src/java/org/apache/cassandra/io/util/FileUtils.java
##########
@@ -920,4 +922,69 @@ public Object getAttribute(String attribute) throws IOException
             return fileStore.getAttribute(attribute);
         }
     }
+
+
+    /**
+     * Moves the files from a directory to another directory.
+     * <p>Once a file has been copied to the target directory it will be deleted from the source directory.
+     * If a file already exist in the target directory a warning will be logged and the file will not
+     * be deleted.</p>
+     *
+     * @param source the directory containing the files to move
+     * @param target the directory where the files must be moved
+     */
+    public static void moveRecursively(Path source, Path target) throws IOException
+    {
+        logger.info("Moving {} to {}" , source, target);
+
+        if (Files.isDirectory(source))
+        {
+            Files.createDirectories(target);
+
+            try (Stream<Path> paths = Files.list(source))
+            {
+                Path[] children = paths.toArray(Path[]::new);
+
+                for (Path child : children)
+                    moveRecursively(child, target.resolve(source.relativize(child)));
+            }
+
+            deleteDirectoryIfEmpty(source);
+        }
+        else
+        {
+            if (Files.exists(target))
+            {
+                logger.warn("Cannot move the file {} to {} as the target file already exists." , source, target);
+            }
+            else
+            {
+                Files.copy(source, target, StandardCopyOption.COPY_ATTRIBUTES);
+                Files.delete(source);
+            }
+        }
+    }
+
+    /**
+     * Deletes the specified directory if it is empty
+     *
+     * @param path the path to the directory
+     */
+    public static void deleteDirectoryIfEmpty(Path path) throws IOException

Review comment:
       Done :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r564446973



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForLocalSystemData()
+    {
+        return conf.local_system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code local_system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getLocalSystemKeyspacesDataFileLocations()

Review comment:
       We could return a single `String` instead of an array, since it is always a single directory. I gave it a go and doing so doesn't seem problematic with the current callers of the method, and `CassandraDaemon#migrateSystemDataIfNeeded()` is explicitly extracting the first element of the array.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] krummas commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
krummas commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r553354189



##########
File path: doc/source/configuration/cass_env_sh_file.rst
##########
@@ -28,6 +28,10 @@ Setting this property to true causes the dynamic snitch to ignore the severity i
 
 **Default:** false
 
+``cassandra.import_system_data_files_from``
+-------------------------------------------
+The directory location of the ``local system keyspaces`` data to import. This property can be used to move back to the first data directory the ``local system keyspaces`` data that were stored in a specific directory using the ``system_data_file_directory`` cassandra.yaml property.

Review comment:
       Could you expand on this? Not sure I understand what the use case would be for this? Is it if a user changes the configured `system_data_file_directory`? If so I think we could expect them to manually move the files as well

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -632,6 +642,106 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the local system keyspaces.
+         */
+        private final DataDirectory[] systemKeyspaceDataDirectories;
+
+        /**
+         * The directories where the data of the non local system keyspaces should be stored.
+         */
+        private final DataDirectory[] nonSystemKeyspacesDirectories;
+
+
+        public DataDirectories(String[] locationsForNonSystemKeyspaces, String[] locationsForSystemKeyspace)
+        {
+            nonSystemKeyspacesDirectories = toDataDirectories(locationsForNonSystemKeyspaces);
+            systemKeyspaceDataDirectories = toDataDirectories(locationsForSystemKeyspace);
+        }
+
+        private static DataDirectory[] toDataDirectories(String... locations)
+        {
+            DataDirectory[] directories = new DataDirectory[locations.length];
+            for (int i = 0; i < locations.length; ++i)
+                directories[i] = new DataDirectory(new File(locations[i]));
+            return directories;
+        }
+
+        /**
+         * Returns the data directories used to store the data of the specified keyspace.
+         *
+         * @param keyspace the keyspace name
+         * @return the data directories used to store the data of the specified keyspace
+         */
+        public DataDirectory[] getDataDirectoriesUsedBy(String keyspace)

Review comment:
       this is confusing - we return all directories when its a system keyspace? (guess its because of the paxos exception)
   
   looking at the uses of this method it seems to all be cleanup/check related so maybe it would make sense to remove it and just use `getAllDirectories()` ?

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};
+    }
+
+    /**
+     * Returns the locations where should be stored the non local system keyspaces data.
+     *
+     * @return the locations where should be stored the non local system keyspaces data
+     */
+    public static String[] getNonSystemKeyspacesDataFileLocations()

Review comment:
       Yeah keeping `system` in the name is probably a good idea

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};

Review comment:
       Feels slightly wrong to rely on the order of the directories in the configuration - maybe better to sort and grab the first one. My concern is that if a user auto-generates the config which changes the order of the directories in the config file and we have to move the files every startup (not the end of the world, but anyway)

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)

Review comment:
       nit: maybe `if (useSpecificLocationForSystemData())` instead of null check

##########
File path: src/java/org/apache/cassandra/io/util/FileUtils.java
##########
@@ -969,4 +973,70 @@ public Object getAttribute(String attribute) throws IOException
             return fileStore.getAttribute(attribute);
         }
     }
+
+    /**
+     * Moves the contents of a directory to another directory.
+     * <p>Once a file has been copied to the target directory it will be deleted from the source directory.
+     * If a file already exists in the target directory a warning will be logged and the file will not
+     * be deleted.</p>
+     *
+     * @param source the directory containing the files to move
+     * @param target the directory where the files must be moved
+     */
+    public static void moveRecursively(Path source, Path target) throws IOException
+    {
+        logger.info("Moving {} to {}" , source, target);
+
+        if (Files.isDirectory(source))
+        {
+            Files.createDirectories(target);
+
+            try (Stream<Path> paths = Files.list(source))

Review comment:
       I found this part quite hard to follow (the resolving/relativizing) 
   
   Would be a bit simpler if we could do something like this?:
   ```
   for (File f : source.toFile().listFiles())
   {
       String fileName = f.getName();
       moveRecursively(source.resolve(fileName), target.resolve(fileName));
   }
   ```
   

##########
File path: src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java
##########
@@ -67,6 +65,18 @@ public void handleFSError(FSError e)
                 StorageService.instance.stopTransports();
                 break;
             case best_effort:
+
+                // There are a few scenarios where we know that the node will not be able to operate properly.
+                // For those scenarios we want to stop the transports and let the administrators handle the problem.
+                // Those scenarios are:
+                // * All the disks are full
+                // * All the disks for a given keyspace have been marked as unwriteable

Review comment:
       can't we continue serving reads here?

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -472,6 +492,78 @@ public void runStartupChecks()
         }
 
     }
+
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    public void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.import_system_data_files_from");
+
+        // If there is only one directory and no system keyspace directory has been specified we do not need to do
+        // anything. If it is not the case we want to try to migrate the data.
+        if (!DatabaseDescriptor.useSpecificLocationForSystemData()
+                && DatabaseDescriptor.getNonSystemKeyspacesDataFileLocations().length <= 1
+                && importSystemDataFrom == null)
+            return;
+
+        // We can face several cases:
+        //  1) The system data are spread accross the data file locations and need to be moved to
+        //     the first data location (upgrade to 4.0)
+        //  2) The system data are spread accross the data file locations and need to be moved to
+        //     the system keyspace location configured by the user (upgrade to 4.0)
+        //  3) The system data are stored in the first data location and need to be moved to
+        //     the system keyspace location configured by the user (system_data_file_directory has been configured)
+        //  4) The system data have been stored in the system keyspace location configured by the user
+        //     and need to be moved to the first data location (the import of the data has been requested)
+        Path target = Paths.get(DatabaseDescriptor.getSystemKeyspacesDataFileLocations()[0]);
+
+        String[] nonSystemKeyspacesFileLocations = DatabaseDescriptor.getNonSystemKeyspacesDataFileLocations();
+        String[] sources = importSystemDataFrom != null
+            ? new String[] {importSystemDataFrom}
+            : DatabaseDescriptor.useSpecificLocationForSystemData() ? nonSystemKeyspacesFileLocations
+                                                                    : Arrays.copyOfRange(nonSystemKeyspacesFileLocations, 1, nonSystemKeyspacesFileLocations.length);
+
+        for (String source : sources)
+        {
+            Path dataFileLocation = Paths.get(source);
+
+            if (!Files.exists(dataFileLocation))
+                continue;
+
+            try (Stream<Path> locationChildren = Files.list(dataFileLocation))
+            {
+                Path[] keyspaceDirectories = locationChildren.filter(p -> SchemaConstants.isLocalSystemKeyspace(p.getFileName().toString()))
+                        .toArray(Path[]::new);
+
+                for (Path keyspaceDirectory : keyspaceDirectories)
+                {
+                    try (Stream<Path> keyspaceChildren = Files.list(keyspaceDirectory))
+                    {
+                        Path[] tableDirectories = keyspaceChildren.filter(Files::isDirectory)

Review comment:
       nit: indentation is a bit broken here - `.filter` should align with the first `.filter`

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -472,6 +492,78 @@ public void runStartupChecks()
         }
 
     }
+
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    public void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.import_system_data_files_from");

Review comment:
       As mentioned above, I think we can remove this property which would simplify this method a bit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r563725819



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};

Review comment:
       To me using the first one as they are specified in `data_file_directories` sounds easier to understand than an alphabetical selection. The phrase `Cassandra will store the data of the local system keyspaces the first of the data directories specified by data_file_directories` in the documentation seems pretty clear about how the selection is done. We could also add a note in the documentation for `data_file_directories` mentioning that the data of the local system keyspaces will be stored in the first directory, and perhaps warning about the consequences of changing the order of the specified data directories, if we are concerned about accidental order changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r539426402



##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -631,6 +649,106 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the system keyspaces.
+         */
+        private final DataDirectory[] systemKeyspaceDataDirectories;
+
+        /**
+         * The directories where should be stored the data of the non system keyspaces.
+         */
+        private final DataDirectory[] nonSystemKeyspacesDirectories;
+
+
+        public DataDirectories(String[] locationsForNonSystemKeyspaces, String[] locationsForSystemKeyspace)
+        {
+            nonSystemKeyspacesDirectories = toDataDirectories(locationsForNonSystemKeyspaces);
+            systemKeyspaceDataDirectories = toDataDirectories(locationsForSystemKeyspace);
+        }
+
+        private static DataDirectory[] toDataDirectories(String... locations)
+        {
+            DataDirectory[] directories = new DataDirectory[locations.length];
+            for (int i = 0; i < locations.length; ++i)
+                directories[i] = new DataDirectory(new File(locations[i]));
+            return directories;
+        }
+
+        /**
+         * Returns the data directories used to store the data of the specified keyspace.
+         *
+         * @param keyspace the keyspace name
+         * @return the data directories used to store the data of the specified keyspace
+         */
+        public DataDirectory[] getDataDirectoriesUsedBy(String keyspace)
+        {
+            if (SchemaConstants.SYSTEM_KEYSPACE_NAME.equals(keyspace)
+                    && !ArrayUtils.isEmpty(systemKeyspaceDataDirectories)
+                    && !ArrayUtils.contains(nonSystemKeyspacesDirectories, systemKeyspaceDataDirectories[0]))
+            {
+                DataDirectory[] directories = Arrays.copyOf(nonSystemKeyspacesDirectories, nonSystemKeyspacesDirectories.length + 1);
+                directories[directories.length - 1] = systemKeyspaceDataDirectories[0];
+                return directories;
+            }
+            return SchemaConstants.isLocalSystemKeyspace(keyspace) ? systemKeyspaceDataDirectories
+                                                                   : nonSystemKeyspacesDirectories;
+        }
+
+        /**
+         * Returns the data directories for the specified keyspace.
+         *
+         * @param table the table metadata
+         * @return the data directories for the specified keyspace
+         */
+        public DataDirectory[] getDataDirectoriesFor(TableMetadata table)
+        {
+            return isStoredInSystemKeyspacesDataLocation(table.keyspace, table.name) ? systemKeyspaceDataDirectories
+                                                                                     : nonSystemKeyspacesDirectories;
+        }
+
+        @Override
+        public Iterator<DataDirectory> iterator()
+        {
+            Iterator<DataDirectory> iter = Iterators.forArray(nonSystemKeyspacesDirectories);
+
+            if (nonSystemKeyspacesDirectories == systemKeyspaceDataDirectories)

Review comment:
       This one is also wrong. Also some leftover from my initial approach.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r569405822



##########
File path: .circleci/config.yml.HIGHRES
##########
@@ -2183,6 +2183,98 @@ jobs:
     - CCM_HEAP_NEWSIZE: 256M
     - JAVA_HOME: /usr/lib/jvm/java-8-openjdk-amd64
     - JDK_HOME: /usr/lib/jvm/java-8-openjdk-amd64
+  utests_system_keyspace_directory:
+    docker:
+    - image: nastra/cassandra-testing-ubuntu1910-java11-w-dependencies:20200603

Review comment:
       Good catch! 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r545899976



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};
+    }
+
+    /**
+     * Returns the locations where should be stored the non local system keyspaces data.
+     *
+     * @return the locations where should be stored the non local system keyspaces data
+     */
+    public static String[] getNonSystemKeyspacesDataFileLocations()

Review comment:
        I understand your logic. It just seems to me that the `system` part is important because it implies that your server relies on it. I would be more in favor of `getLocalSystemKeyspacesDataFileLocations`.
   Let's wait for the opinion of the other reviewer then I will change it to what we agree upon at that point.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r545856409



##########
File path: conf/cassandra.yaml
##########
@@ -203,6 +203,12 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 # data_file_directories:
 #     - /var/lib/cassandra/data
 
+# Directory were Cassandra should store the data of the local system keyspaces.
+# By default Cassandra will store the data of the local system keyspaces in the first of the data directories.
+# This approach ensure that if one of the other disk is lost Cassandra can continue to operate. For extra security
+# this setting allow to store those data on a different directory that provide redundancy.
+# system_data_file_directory: 

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r535494746



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};
+    }
+
+    /**
+     * Returns the locations where should be stored the non local system keyspaces data.
+     *
+     * @return the locations where should be stored the non local system keyspaces data
+     */
+    public static String[] getNonSystemKeyspacesDataFileLocations()

Review comment:
       I was thinking that what determines if a table goes into the sepearate directory is the fact that it uses `LocalStrategy`, so we want it safer in a single directory, possibly with redundancy. AFAIK this strategy is only used by some system tables. If somehow there were not-system keyspaces with that strategy we would also want them in the safer directory, I guess. So, perhaps we could call this `getNonLocalKeyspacesDataFileLocations`, and the property in cassandra.yaml `local_data_file_directory`, etc., if that makes any sense? I find local vs. non-local terminology easier than local-system vs. non-local-system-or-non-system. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r563801852



##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -592,10 +581,34 @@ public static File getBackupsDirectory(File location)
         }
     }
 
+    /**
+     * Checks if the specified table should be stored with local system data.
+     *
+     * <p> To minimize the risk of failures, SSTables for local system keyspaces must be stored in a single data
+     * directory. The only exception to this is the system paxos table as it can be a high traffic table.</p>

Review comment:
       I think that after the last changes paxos is not the only exception

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -632,6 +645,89 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the local system keyspaces.
+         */
+        private final DataDirectory[] localSystemKeyspaceDataDirectories;
+
+        /**
+         * The directories where the data of the non local system keyspaces should be stored.
+         */
+        private final DataDirectory[] nonLocalSystemKeyspacesDirectories;
+
+
+        public DataDirectories(String[] locationsForNonSystemKeyspaces, String[] locationsForSystemKeyspace)
+        {
+            nonLocalSystemKeyspacesDirectories = toDataDirectories(locationsForNonSystemKeyspaces);
+            localSystemKeyspaceDataDirectories = toDataDirectories(locationsForSystemKeyspace);
+        }
+
+        private static DataDirectory[] toDataDirectories(String... locations)
+        {
+            DataDirectory[] directories = new DataDirectory[locations.length];
+            for (int i = 0; i < locations.length; ++i)
+                directories[i] = new DataDirectory(new File(locations[i]));
+            return directories;
+        }
+
+        /**
+         * Returns the data directories for the specified table.
+         *
+         * @param table the table metadata
+         * @return the data directories for the specified table
+         */
+        public DataDirectory[] getDataDirectoriesFor(TableMetadata table)
+        {
+            return isStoredInLocalSystemKeyspacesDataLocation(table.keyspace, table.name) ? localSystemKeyspaceDataDirectories
+                                                                                          : nonLocalSystemKeyspacesDirectories;
+        }
+
+        @Override
+        public Iterator<DataDirectory> iterator()
+        {
+            return getAllDirectories().iterator();
+        }
+
+        public Set<DataDirectory> getAllDirectories()
+        {
+            Set<DataDirectory> directories = new LinkedHashSet<>(nonLocalSystemKeyspacesDirectories.length + localSystemKeyspaceDataDirectories.length);
+            Collections.addAll(directories, nonLocalSystemKeyspacesDirectories);
+            Collections.addAll(directories, localSystemKeyspaceDataDirectories);
+            return directories;
+        }
+
+        @Override
+        public boolean equals(Object o)
+        {
+            if (this == o) return true;
+            if (o == null || getClass() != o.getClass()) return false;
+
+            DataDirectories that = (DataDirectories) o;
+
+            return Arrays.equals(this.localSystemKeyspaceDataDirectories, that.localSystemKeyspaceDataDirectories)
+                && Arrays.equals(this.nonLocalSystemKeyspacesDirectories, that.nonLocalSystemKeyspacesDirectories);
+        }
+
+        @Override
+        public int hashCode()
+        {
+            return Objects.hash(localSystemKeyspaceDataDirectories, nonLocalSystemKeyspacesDirectories);
+        }
+
+        public String toString()
+        {
+            return "DataDirectories {" +
+                   "systemKeyspaceDataDirectories=" + localSystemKeyspaceDataDirectories +
+                   ", nonSystemKeyspacesDirectories=" + nonLocalSystemKeyspacesDirectories +

Review comment:
       Nit: we could use something like `Arrays.toString` to better print the directories. Also, the method could use an `@Override` annotation.

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -472,6 +492,73 @@ public void runStartupChecks()
         }
 
     }
+
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    public void migrateSystemDataIfNeeded() throws IOException
+    {
+        // If there is only one directory and no system keyspace directory has been specified we do not need to do
+        // anything. If it is not the case we want to try to migrate the data.
+        if (!DatabaseDescriptor.useSpecificLocationForLocalSystemData()
+                && DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations().length <= 1)
+            return;
+
+        // We can face several cases:
+        //  1) The system data are spread accross the data file locations and need to be moved to
+        //     the first data location (upgrade to 4.0)
+        //  2) The system data are spread accross the data file locations and need to be moved to
+        //     the system keyspace location configured by the user (upgrade to 4.0)
+        //  3) The system data are stored in the first data location and need to be moved to
+        //     the system keyspace location configured by the user (system_data_file_directory has been configured)

Review comment:
       I understand that we can't have the opposite case, that the system data was originally stored in a separate directory but the user wants to move it to the first data location. In such case `system_data_file_directory` wouldn't be set so we can't know where the local system data is. That configuration change would require the user to manually move the data to the first data directory. Is this right? 

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2806,4 +2796,88 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
-}
\ No newline at end of file
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. Local system keyspaces can have their own disk
+     * to allow for special redundancy mechanism. If it is the case the executor services returned for
+     * local system keyspaces will be different from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non local system keyspaces.
+         */
+        private final ExecutorService[] nonLocalSystemflushExecutors;
+
+        /**
+         * The flush executors for the local system keyspaces.
+         */
+        private final ExecutorService[] localSystemDiskFlushExecutors;
+
+        /**
+         * {@code true} if local system keyspaces are stored in their own directory and use an extra flush executor,
+         * {@code false} otherwise.
+         */
+        private boolean useSpecificExecutorForSystemKeyspaces;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonLocalSystemflushExecutors = flushExecutors;
+            useSpecificExecutorForSystemKeyspaces = useSpecificLocationForSystemKeyspaces;
+            localSystemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("LocalSystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};
+        }
+
+        private ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)
+        {
+            ExecutorService[] flushExecutors = new ExecutorService[numberOfExecutors];
+
+            for (int i = 0; i < numberOfExecutors; i++)
+            {
+                flushExecutors[i] = newThreadPool("PerDiskMemtableFlushWriter_" + i, flushWriters);
+            }
+            return flushExecutors;
+        }
+
+        private static JMXEnabledThreadPoolExecutor newThreadPool(String poolName, int size)
+        {
+            return new JMXEnabledThreadPoolExecutor(size,
+                                                    Stage.KEEP_ALIVE_SECONDS,
+                                                    TimeUnit.SECONDS,
+                                                    new LinkedBlockingQueue<>(),
+                                                    new NamedThreadFactory(poolName),
+                                                    "internal");
+        }
+
+        /**
+         * Returns the flush executors for the specified keyspace.
+         *
+         * @param keyspaceName the keyspace name
+         * @param tableName the table name
+         * @return the flush executors that should be used for flushing the memtables of the specified keyspace.
+         */
+        public ExecutorService[] getExecutorsFor(String keyspaceName, String tableName)
+        {
+            return Directories.isStoredInLocalSystemKeyspacesDataLocation(keyspaceName, tableName) ? localSystemDiskFlushExecutors
+                                                                                              : nonLocalSystemflushExecutors;

Review comment:
       Nit: misaligned
   ```suggestion
                                                                                                      : nonLocalSystemflushExecutors;
   ```

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2806,4 +2796,88 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
-}
\ No newline at end of file
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. Local system keyspaces can have their own disk
+     * to allow for special redundancy mechanism. If it is the case the executor services returned for
+     * local system keyspaces will be different from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non local system keyspaces.
+         */
+        private final ExecutorService[] nonLocalSystemflushExecutors;
+
+        /**
+         * The flush executors for the local system keyspaces.
+         */
+        private final ExecutorService[] localSystemDiskFlushExecutors;
+
+        /**
+         * {@code true} if local system keyspaces are stored in their own directory and use an extra flush executor,
+         * {@code false} otherwise.
+         */
+        private boolean useSpecificExecutorForSystemKeyspaces;

Review comment:
       Nit: might be final

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2806,4 +2796,88 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
-}
\ No newline at end of file
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. Local system keyspaces can have their own disk
+     * to allow for special redundancy mechanism. If it is the case the executor services returned for
+     * local system keyspaces will be different from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non local system keyspaces.
+         */
+        private final ExecutorService[] nonLocalSystemflushExecutors;
+
+        /**
+         * The flush executors for the local system keyspaces.
+         */
+        private final ExecutorService[] localSystemDiskFlushExecutors;
+
+        /**
+         * {@code true} if local system keyspaces are stored in their own directory and use an extra flush executor,
+         * {@code false} otherwise.
+         */
+        private boolean useSpecificExecutorForSystemKeyspaces;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonLocalSystemflushExecutors = flushExecutors;
+            useSpecificExecutorForSystemKeyspaces = useSpecificLocationForSystemKeyspaces;
+            localSystemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("LocalSystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};

Review comment:
       Nit: misaligned
   ```suggestion
                                                                                     : new ExecutorService[] {flushExecutors[0]};
   ```

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2806,4 +2796,88 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
-}
\ No newline at end of file
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. Local system keyspaces can have their own disk
+     * to allow for special redundancy mechanism. If it is the case the executor services returned for
+     * local system keyspaces will be different from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non local system keyspaces.
+         */
+        private final ExecutorService[] nonLocalSystemflushExecutors;
+
+        /**
+         * The flush executors for the local system keyspaces.
+         */
+        private final ExecutorService[] localSystemDiskFlushExecutors;
+
+        /**
+         * {@code true} if local system keyspaces are stored in their own directory and use an extra flush executor,
+         * {@code false} otherwise.
+         */
+        private boolean useSpecificExecutorForSystemKeyspaces;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonLocalSystemflushExecutors = flushExecutors;
+            useSpecificExecutorForSystemKeyspaces = useSpecificLocationForSystemKeyspaces;
+            localSystemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("LocalSystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};
+        }
+
+        private ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)

Review comment:
       Might be static:
   ```suggestion
           private static ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)
   ```

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForLocalSystemData()
+    {
+        return conf.local_system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code local_system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>

Review comment:
       Rather than saying than `the server can tolerate the lost of n - 1 disks`, I think I'd say `the server can tolerate the lost of all the disks but the first one`, to avoid the risk of readers getting the wrong idea that they can lost *any* n - 1 disks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r562638634



##########
File path: src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java
##########
@@ -67,6 +65,18 @@ public void handleFSError(FSError e)
                 StorageService.instance.stopTransports();
                 break;
             case best_effort:
+
+                // There are a few scenarios where we know that the node will not be able to operate properly.
+                // For those scenarios we want to stop the transports and let the administrators handle the problem.
+                // Those scenarios are:
+                // * All the disks are full
+                // * All the disks for a given keyspace have been marked as unwriteable

Review comment:
       The problem I see if we keep serving reads but cannot serve writes is that the dataset will slowly become outdated. I am also not sure of how to do that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r555774406



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};

Review comment:
       I do not know how people name their directories so I imagine that if we rely on alphabetical order the place were the system files will end up will be a bit of a random one from the user perspective. It sounded easier to me to control the directory ordering than the directory names as it could be handled at the config level.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r537676403



##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2740,4 +2730,81 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. System keyspaces can have their own disk
+     * to allow for special redundency mechanism. If it is the case the executor services returned for
+     * system keyspace will be differents from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non system keyspaces.
+         */
+        private final ExecutorService[] nonSystemflushExecutors;
+
+        /**
+         * The flush executors for system keyspaces.
+         */
+        private final ExecutorService[] systemDiskFlushExecutors;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonSystemflushExecutors = flushExecutors;
+            systemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("SystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};
+        }
+
+        private ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)
+        {
+            ExecutorService[] flushExecutors = new ExecutorService[numberOfExecutors];
+
+            for (int i = 0; i < numberOfExecutors; i++)
+            {
+                flushExecutors[i] = newThreadPool("PerDiskMemtableFlushWriter_" + i, flushWriters);
+            }
+            return flushExecutors;
+        }
+
+        private static JMXEnabledThreadPoolExecutor newThreadPool(String poolName, int size)
+        {
+            return new JMXEnabledThreadPoolExecutor(size,
+                                                    Stage.KEEP_ALIVE_SECONDS,
+                                                    TimeUnit.SECONDS,
+                                                    new LinkedBlockingQueue<Runnable>(),
+                                                    new NamedThreadFactory(poolName),
+                                                    "internal");
+        }
+
+        /**
+         * Returns the flush executors for the specified keyspace.
+         *
+         * @param keyspaceName the keyspace name
+         * @param tableName the table name
+         * @return the flush executors that should be used for flushing the memtables of the specified keyspace.
+         */
+        public ExecutorService[] getExecutorsFor(String keyspaceName, String tableName)
+        {
+            return Directories.isStoredInSystemKeyspacesDataLocation(keyspaceName, tableName) ? systemDiskFlushExecutors
+                                                                  : nonSystemflushExecutors;
+        }
+
+        /**
+         * Appends all the {@code ExecutorService} used for flushes to the colection.
+         *
+         * @param collection the colection to append to.
+         */
+        public void appendAllExecutors(Collection<ExecutorService> collection)
+        {
+            Collections.addAll(collection, nonSystemflushExecutors);
+            if (nonSystemflushExecutors != systemDiskFlushExecutors)

Review comment:
       After looking into it, I believe that this code is some left over from a previous version and is actually wrong. I will fix it. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r564564996



##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -472,6 +492,73 @@ public void runStartupChecks()
         }
 
     }
+
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    public void migrateSystemDataIfNeeded() throws IOException
+    {
+        // If there is only one directory and no system keyspace directory has been specified we do not need to do
+        // anything. If it is not the case we want to try to migrate the data.
+        if (!DatabaseDescriptor.useSpecificLocationForLocalSystemData()
+                && DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations().length <= 1)
+            return;
+
+        // We can face several cases:
+        //  1) The system data are spread accross the data file locations and need to be moved to
+        //     the first data location (upgrade to 4.0)
+        //  2) The system data are spread accross the data file locations and need to be moved to
+        //     the system keyspace location configured by the user (upgrade to 4.0)
+        //  3) The system data are stored in the first data location and need to be moved to
+        //     the system keyspace location configured by the user (system_data_file_directory has been configured)

Review comment:
       Yes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r564579905



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForLocalSystemData()
+    {
+        return conf.local_system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code local_system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getLocalSystemKeyspacesDataFileLocations()

Review comment:
       I hit some issues with that when writing the initial patch when the data directories where empty. Some tests run with such a config. I came up with that approach because it was allowing the code to behave in the same way as before the patch.
   Unfortunately, I do not remeber all the details. I would have to try again if you think it is a better approach to use a String.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] krummas commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
krummas commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r569394141



##########
File path: .circleci/config.yml.HIGHRES
##########
@@ -2183,6 +2183,98 @@ jobs:
     - CCM_HEAP_NEWSIZE: 256M
     - JAVA_HOME: /usr/lib/jvm/java-8-openjdk-amd64
     - JDK_HOME: /usr/lib/jvm/java-8-openjdk-amd64
+  utests_system_keyspace_directory:
+    docker:
+    - image: nastra/cassandra-testing-ubuntu1910-java11-w-dependencies:20200603

Review comment:
       this should use the same docker image as the other test targets (`apache/cassandra-testing-ubuntu2004-java11-w-dependencies:20210105`)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r545858300



##########
File path: test/unit/org/apache/cassandra/io/util/FileUtilsTest.java
##########
@@ -128,6 +128,75 @@ public void testIsContained()
         assertFalse(FileUtils.isContained(new File("/tmp/abc/../abc"), new File("/tmp/abcc")));
     }
 
+    @Test
+    public void testMoveFiles() throws IOException

Review comment:
       Thanks. :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r563725819



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};

Review comment:
       To me using the first one as they are specified in `data_file_directories` sounds easier to understand than an alphabetical selection. The phrase `Cassandra will store the data of the local system keyspaces the first of the data directories specified by data_file_directories` in the documentation seems pretty clear about how the selection is done. We could also add a note in the documentation for `data_file_directories` mentioning that the data of the local system keyspaces will be stored in the first directory, and perhaps warning about the consequences of changing the order of the specified data directories, if we are concerned about accidental order changes.

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -592,10 +581,34 @@ public static File getBackupsDirectory(File location)
         }
     }
 
+    /**
+     * Checks if the specified table should be stored with local system data.
+     *
+     * <p> To minimize the risk of failures, SSTables for local system keyspaces must be stored in a single data
+     * directory. The only exception to this is the system paxos table as it can be a high traffic table.</p>

Review comment:
       I think that after the last changes paxos is not the only exception

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -632,6 +645,89 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the local system keyspaces.
+         */
+        private final DataDirectory[] localSystemKeyspaceDataDirectories;
+
+        /**
+         * The directories where the data of the non local system keyspaces should be stored.
+         */
+        private final DataDirectory[] nonLocalSystemKeyspacesDirectories;
+
+
+        public DataDirectories(String[] locationsForNonSystemKeyspaces, String[] locationsForSystemKeyspace)
+        {
+            nonLocalSystemKeyspacesDirectories = toDataDirectories(locationsForNonSystemKeyspaces);
+            localSystemKeyspaceDataDirectories = toDataDirectories(locationsForSystemKeyspace);
+        }
+
+        private static DataDirectory[] toDataDirectories(String... locations)
+        {
+            DataDirectory[] directories = new DataDirectory[locations.length];
+            for (int i = 0; i < locations.length; ++i)
+                directories[i] = new DataDirectory(new File(locations[i]));
+            return directories;
+        }
+
+        /**
+         * Returns the data directories for the specified table.
+         *
+         * @param table the table metadata
+         * @return the data directories for the specified table
+         */
+        public DataDirectory[] getDataDirectoriesFor(TableMetadata table)
+        {
+            return isStoredInLocalSystemKeyspacesDataLocation(table.keyspace, table.name) ? localSystemKeyspaceDataDirectories
+                                                                                          : nonLocalSystemKeyspacesDirectories;
+        }
+
+        @Override
+        public Iterator<DataDirectory> iterator()
+        {
+            return getAllDirectories().iterator();
+        }
+
+        public Set<DataDirectory> getAllDirectories()
+        {
+            Set<DataDirectory> directories = new LinkedHashSet<>(nonLocalSystemKeyspacesDirectories.length + localSystemKeyspaceDataDirectories.length);
+            Collections.addAll(directories, nonLocalSystemKeyspacesDirectories);
+            Collections.addAll(directories, localSystemKeyspaceDataDirectories);
+            return directories;
+        }
+
+        @Override
+        public boolean equals(Object o)
+        {
+            if (this == o) return true;
+            if (o == null || getClass() != o.getClass()) return false;
+
+            DataDirectories that = (DataDirectories) o;
+
+            return Arrays.equals(this.localSystemKeyspaceDataDirectories, that.localSystemKeyspaceDataDirectories)
+                && Arrays.equals(this.nonLocalSystemKeyspacesDirectories, that.nonLocalSystemKeyspacesDirectories);
+        }
+
+        @Override
+        public int hashCode()
+        {
+            return Objects.hash(localSystemKeyspaceDataDirectories, nonLocalSystemKeyspacesDirectories);
+        }
+
+        public String toString()
+        {
+            return "DataDirectories {" +
+                   "systemKeyspaceDataDirectories=" + localSystemKeyspaceDataDirectories +
+                   ", nonSystemKeyspacesDirectories=" + nonLocalSystemKeyspacesDirectories +

Review comment:
       Nit: we could use something like `Arrays.toString` to better print the directories. Also, the method could use an `@Override` annotation.

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -472,6 +492,73 @@ public void runStartupChecks()
         }
 
     }
+
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    public void migrateSystemDataIfNeeded() throws IOException
+    {
+        // If there is only one directory and no system keyspace directory has been specified we do not need to do
+        // anything. If it is not the case we want to try to migrate the data.
+        if (!DatabaseDescriptor.useSpecificLocationForLocalSystemData()
+                && DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations().length <= 1)
+            return;
+
+        // We can face several cases:
+        //  1) The system data are spread accross the data file locations and need to be moved to
+        //     the first data location (upgrade to 4.0)
+        //  2) The system data are spread accross the data file locations and need to be moved to
+        //     the system keyspace location configured by the user (upgrade to 4.0)
+        //  3) The system data are stored in the first data location and need to be moved to
+        //     the system keyspace location configured by the user (system_data_file_directory has been configured)

Review comment:
       I understand that we can't have the opposite case, that the system data was originally stored in a separate directory but the user wants to move it to the first data location. In such case `system_data_file_directory` wouldn't be set so we can't know where the local system data is. That configuration change would require the user to manually move the data to the first data directory. Is this right? 

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2806,4 +2796,88 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
-}
\ No newline at end of file
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. Local system keyspaces can have their own disk
+     * to allow for special redundancy mechanism. If it is the case the executor services returned for
+     * local system keyspaces will be different from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non local system keyspaces.
+         */
+        private final ExecutorService[] nonLocalSystemflushExecutors;
+
+        /**
+         * The flush executors for the local system keyspaces.
+         */
+        private final ExecutorService[] localSystemDiskFlushExecutors;
+
+        /**
+         * {@code true} if local system keyspaces are stored in their own directory and use an extra flush executor,
+         * {@code false} otherwise.
+         */
+        private boolean useSpecificExecutorForSystemKeyspaces;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonLocalSystemflushExecutors = flushExecutors;
+            useSpecificExecutorForSystemKeyspaces = useSpecificLocationForSystemKeyspaces;
+            localSystemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("LocalSystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};
+        }
+
+        private ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)
+        {
+            ExecutorService[] flushExecutors = new ExecutorService[numberOfExecutors];
+
+            for (int i = 0; i < numberOfExecutors; i++)
+            {
+                flushExecutors[i] = newThreadPool("PerDiskMemtableFlushWriter_" + i, flushWriters);
+            }
+            return flushExecutors;
+        }
+
+        private static JMXEnabledThreadPoolExecutor newThreadPool(String poolName, int size)
+        {
+            return new JMXEnabledThreadPoolExecutor(size,
+                                                    Stage.KEEP_ALIVE_SECONDS,
+                                                    TimeUnit.SECONDS,
+                                                    new LinkedBlockingQueue<>(),
+                                                    new NamedThreadFactory(poolName),
+                                                    "internal");
+        }
+
+        /**
+         * Returns the flush executors for the specified keyspace.
+         *
+         * @param keyspaceName the keyspace name
+         * @param tableName the table name
+         * @return the flush executors that should be used for flushing the memtables of the specified keyspace.
+         */
+        public ExecutorService[] getExecutorsFor(String keyspaceName, String tableName)
+        {
+            return Directories.isStoredInLocalSystemKeyspacesDataLocation(keyspaceName, tableName) ? localSystemDiskFlushExecutors
+                                                                                              : nonLocalSystemflushExecutors;

Review comment:
       Nit: misaligned
   ```suggestion
                                                                                                      : nonLocalSystemflushExecutors;
   ```

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2806,4 +2796,88 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
-}
\ No newline at end of file
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. Local system keyspaces can have their own disk
+     * to allow for special redundancy mechanism. If it is the case the executor services returned for
+     * local system keyspaces will be different from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non local system keyspaces.
+         */
+        private final ExecutorService[] nonLocalSystemflushExecutors;
+
+        /**
+         * The flush executors for the local system keyspaces.
+         */
+        private final ExecutorService[] localSystemDiskFlushExecutors;
+
+        /**
+         * {@code true} if local system keyspaces are stored in their own directory and use an extra flush executor,
+         * {@code false} otherwise.
+         */
+        private boolean useSpecificExecutorForSystemKeyspaces;

Review comment:
       Nit: might be final

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2806,4 +2796,88 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
-}
\ No newline at end of file
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. Local system keyspaces can have their own disk
+     * to allow for special redundancy mechanism. If it is the case the executor services returned for
+     * local system keyspaces will be different from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non local system keyspaces.
+         */
+        private final ExecutorService[] nonLocalSystemflushExecutors;
+
+        /**
+         * The flush executors for the local system keyspaces.
+         */
+        private final ExecutorService[] localSystemDiskFlushExecutors;
+
+        /**
+         * {@code true} if local system keyspaces are stored in their own directory and use an extra flush executor,
+         * {@code false} otherwise.
+         */
+        private boolean useSpecificExecutorForSystemKeyspaces;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonLocalSystemflushExecutors = flushExecutors;
+            useSpecificExecutorForSystemKeyspaces = useSpecificLocationForSystemKeyspaces;
+            localSystemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("LocalSystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};

Review comment:
       Nit: misaligned
   ```suggestion
                                                                                     : new ExecutorService[] {flushExecutors[0]};
   ```

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2806,4 +2796,88 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
-}
\ No newline at end of file
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. Local system keyspaces can have their own disk
+     * to allow for special redundancy mechanism. If it is the case the executor services returned for
+     * local system keyspaces will be different from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non local system keyspaces.
+         */
+        private final ExecutorService[] nonLocalSystemflushExecutors;
+
+        /**
+         * The flush executors for the local system keyspaces.
+         */
+        private final ExecutorService[] localSystemDiskFlushExecutors;
+
+        /**
+         * {@code true} if local system keyspaces are stored in their own directory and use an extra flush executor,
+         * {@code false} otherwise.
+         */
+        private boolean useSpecificExecutorForSystemKeyspaces;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonLocalSystemflushExecutors = flushExecutors;
+            useSpecificExecutorForSystemKeyspaces = useSpecificLocationForSystemKeyspaces;
+            localSystemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("LocalSystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};
+        }
+
+        private ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)

Review comment:
       Might be static:
   ```suggestion
           private static ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)
   ```

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1830,11 +1857,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which supports redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForLocalSystemData()
+    {
+        return conf.local_system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where the local system keyspaces data should be stored.
+     *
+     * <p>If the {@code local_system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>

Review comment:
       Rather than saying than `the server can tolerate the lost of n - 1 disks`, I think I'd say `the server can tolerate the lost of all the disks but the first one`, to avoid the risk of readers getting the wrong idea that they can lost *any* n - 1 disks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r545929464



##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -456,6 +476,78 @@ protected void setup()
         completeSetup();
     }
 
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    private void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.importSystemDataFilesFrom");

Review comment:
       I added the property to the documentation. I am not sure if we should add it to the JVM options file as the property is a bit of a corner case. It is only usefull if you have used a dedicated directory for your system table and want to go back to using your data directories. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r535359116



##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -456,6 +476,78 @@ protected void setup()
         completeSetup();
     }
 
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    private void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.importSystemDataFilesFrom");

Review comment:
       I think that snake case is more common for system properties:
   ```suggestion
           String importSystemDataFrom = System.getProperty("cassandra.import_system_data_files_from");
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r535362575



##########
File path: conf/cassandra.yaml
##########
@@ -203,6 +203,12 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 # data_file_directories:
 #     - /var/lib/cassandra/data
 
+# Directory were Cassandra should store the data of the local system keyspaces.
+# By default Cassandra will store the data of the local system keyspaces in the first of the data directories.
+# This approach ensure that if one of the other disk is lost Cassandra can continue to operate. For extra security
+# this setting allow to store those data on a different directory that provide redundancy.
+# system_data_file_directory: 

Review comment:
       We should add this to [the documentation for `cassandra.yaml`](https://github.com/apache/cassandra/blob/trunk/doc/source/configuration/cass_yaml_file.rst).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r535359436



##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -456,6 +476,78 @@ protected void setup()
         completeSetup();
     }
 
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    private void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.importSystemDataFilesFrom");

Review comment:
       Probably we should comment the new property in [the documentation for cassandra-ev](https://github.com/apache/cassandra/blob/trunk/doc/source/configuration/cass_env_sh_file.rst) and in the [JVM options file](jvm-server.options).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r535494746



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};
+    }
+
+    /**
+     * Returns the locations where should be stored the non local system keyspaces data.
+     *
+     * @return the locations where should be stored the non local system keyspaces data
+     */
+    public static String[] getNonSystemKeyspacesDataFileLocations()

Review comment:
       I was thinking that what determines if a table goes into the sepearate directory is the fact that it uses `LocalStrategy`, so we want it safer in a single directory, possibly with redundancy. AFAIK this strategy is only used by system tables, but if somehow there were not-system keyspaces with that strategy we would want them in the safer directory, I guess. So, perhaps we could call this `getLocalKeyspacesDataFileLocations`, and the property in cassandra.yaml `local_data_file_directory`, etc.? Does it make any sense?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r545864002



##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -243,7 +263,7 @@ protected void setup()
         }
         catch (IOException e)
         {
-            exitOrFail(3, e.getMessage(), e.getCause());
+            exitOrFail(StartupException.ERR_WRONG_DISK_STATE, e.getMessage(), e.getCause());

Review comment:
       Thanks :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r545855986



##########
File path: src/java/org/apache/cassandra/service/StorageService.java
##########
@@ -3291,12 +3291,29 @@ public String getKeyspaceReplicationInfo(String keyspaceName)
         return stringify(Gossiper.instance.getUnreachableMembers(), true);
     }
 
+    @Override
     public String[] getAllDataFileLocations()
     {
-        String[] locations = DatabaseDescriptor.getAllDataFileLocations();
-        for (int i = 0; i < locations.length; i++)
-            locations[i] = FileUtils.getCanonicalPath(locations[i]);
-        return locations;
+        return getCanonicalPaths(DatabaseDescriptor.getAllDataFileLocations());
+    }
+
+    private String[] getCanonicalPaths(String[] paths)
+    {
+        for (int i = 0; i < paths.length; i++)
+            paths[i] = FileUtils.getCanonicalPath(paths[i]);
+        return paths;
+    }

Review comment:
       Good catch. :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r535494746



##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};
+    }
+
+    /**
+     * Returns the locations where should be stored the non local system keyspaces data.
+     *
+     * @return the locations where should be stored the non local system keyspaces data
+     */
+    public static String[] getNonSystemKeyspacesDataFileLocations()

Review comment:
       I was thinking that what determines if a table goes into the sepearate directory is the fact that it uses `LocalStrategy`, so we want it safer in a single directory, possibly with redundancy. AFAIK this strategy is only used by system tables, but if somehow there were not-system keyspaces with that strategy we would want them in the safer directory, I guess. So, perhaps we could call this `getNonLocalKeyspacesDataFileLocations`, and the property in cassandra.yaml `local_data_file_directory`, etc.? Does it make any sense?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] blerer commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
blerer commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r564564848



##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -632,6 +645,89 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the local system keyspaces.
+         */
+        private final DataDirectory[] localSystemKeyspaceDataDirectories;
+
+        /**
+         * The directories where the data of the non local system keyspaces should be stored.
+         */
+        private final DataDirectory[] nonLocalSystemKeyspacesDirectories;
+
+
+        public DataDirectories(String[] locationsForNonSystemKeyspaces, String[] locationsForSystemKeyspace)
+        {
+            nonLocalSystemKeyspacesDirectories = toDataDirectories(locationsForNonSystemKeyspaces);
+            localSystemKeyspaceDataDirectories = toDataDirectories(locationsForSystemKeyspace);
+        }
+
+        private static DataDirectory[] toDataDirectories(String... locations)
+        {
+            DataDirectory[] directories = new DataDirectory[locations.length];
+            for (int i = 0; i < locations.length; ++i)
+                directories[i] = new DataDirectory(new File(locations[i]));
+            return directories;
+        }
+
+        /**
+         * Returns the data directories for the specified table.
+         *
+         * @param table the table metadata
+         * @return the data directories for the specified table
+         */
+        public DataDirectory[] getDataDirectoriesFor(TableMetadata table)
+        {
+            return isStoredInLocalSystemKeyspacesDataLocation(table.keyspace, table.name) ? localSystemKeyspaceDataDirectories
+                                                                                          : nonLocalSystemKeyspacesDirectories;
+        }
+
+        @Override
+        public Iterator<DataDirectory> iterator()
+        {
+            return getAllDirectories().iterator();
+        }
+
+        public Set<DataDirectory> getAllDirectories()
+        {
+            Set<DataDirectory> directories = new LinkedHashSet<>(nonLocalSystemKeyspacesDirectories.length + localSystemKeyspaceDataDirectories.length);
+            Collections.addAll(directories, nonLocalSystemKeyspacesDirectories);
+            Collections.addAll(directories, localSystemKeyspaceDataDirectories);
+            return directories;
+        }
+
+        @Override
+        public boolean equals(Object o)
+        {
+            if (this == o) return true;
+            if (o == null || getClass() != o.getClass()) return false;
+
+            DataDirectories that = (DataDirectories) o;
+
+            return Arrays.equals(this.localSystemKeyspaceDataDirectories, that.localSystemKeyspaceDataDirectories)
+                && Arrays.equals(this.nonLocalSystemKeyspacesDirectories, that.nonLocalSystemKeyspacesDirectories);
+        }
+
+        @Override
+        public int hashCode()
+        {
+            return Objects.hash(localSystemKeyspaceDataDirectories, nonLocalSystemKeyspacesDirectories);
+        }
+
+        public String toString()
+        {
+            return "DataDirectories {" +
+                   "systemKeyspaceDataDirectories=" + localSystemKeyspaceDataDirectories +
+                   ", nonSystemKeyspacesDirectories=" + nonLocalSystemKeyspacesDirectories +

Review comment:
       Good catch




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] adelapena commented on a change in pull request #675: CASSANDRA-14793: Improve system table handling when losing a disk when using JBOD

Posted by GitBox <gi...@apache.org>.
adelapena commented on a change in pull request #675:
URL: https://github.com/apache/cassandra/pull/675#discussion_r535239383



##########
File path: conf/cassandra.yaml
##########
@@ -203,6 +203,12 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 # data_file_directories:
 #     - /var/lib/cassandra/data
 
+# Directory were Cassandra should store the data of the local system keyspaces.
+# By default Cassandra will store the data of the local system keyspaces in the first of the data directories.
+# This approach ensure that if one of the other disk is lost Cassandra can continue to operate. For extra security
+# this setting allow to store those data on a different directory that provide redundancy.

Review comment:
       ```suggestion
   By default Cassandra will store the data of the local system keyspaces in the first of the data directories specified
   # by data_file_directories.
   # This approach ensures that if one of the other disks is lost Cassandra can continue to operate. For extra security
   # this setting allows to store those data on a different directory that provides redundancy.
   ```

##########
File path: NEWS.txt
##########
@@ -38,6 +38,15 @@ using the provided 'sstableupgrade' tool.
 
 New features
 ------------
+    - The data of the system keyspaces using a local strategy (at the exception of the system.paxos table)
+      is now stored by default in the first data directory. This approach will allow the server
+      to tolerate the failure of the other disks. To ensure that a disk failure will not bring
+      a node down, it is possible to use the system_data_file_directory yaml property to store
+      the system keyspaces data on a disk that provide redundancy.
+      On node startup the system keyspace data will be automatically migrated if needed to the
+      correct location. If a specific disk has been used for some time and the system keyspaces

Review comment:
       ```suggestion
         the local system keyspaces data on a directory that provides redundancy.
         On node startup the local system keyspaces data will be automatically migrated if needed to the
         correct location. If a specific disk has been used for some time and the local system keyspaces
   ```

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};
+    }
+
+    /**
+     * Returns the locations where should be stored the non local system keyspaces data.
+     *
+     * @return the locations where should be stored the non local system keyspaces data

Review comment:
       ```suggestion
        * Returns the locations where the non local system keyspaces data should be stored.
        *
        * @return the locations where the non local system keyspaces data should be stored.
   ```

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -591,10 +589,30 @@ public static File getBackupsDirectory(File location)
         }
     }
 
+    /**
+     * Checks if the specified table should be stored with locale system data.
+     *
+     * <p> To minimize the risk of failures, SSTables for local system keyspaces must be stored in a single data
+     * directory. The only exception to this is the system paxos table as it can be a high traffic table.</p>
+     *
+     * @param keyspace the keyspace name
+     * @param table the table name
+     * @return {@code true} if the specified table should be stored with locale system data, {@code false} otherwise.

Review comment:
       ```suggestion
        * Checks if the specified table should be stored with local system data.
        *
        * <p> To minimize the risk of failures, SSTables for local system keyspaces must be stored in a single data
        * directory. The only exception to this is the system paxos table as it can be a high traffic table.</p>
        *
        * @param keyspace the keyspace name
        * @param table the table name
        * @return {@code true} if the specified table should be stored with local system data, {@code false} otherwise.
   ```

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -631,6 +649,106 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the system keyspaces.
+         */
+        private final DataDirectory[] systemKeyspaceDataDirectories;
+
+        /**
+         * The directories where should be stored the data of the non system keyspaces.
+         */
+        private final DataDirectory[] nonSystemKeyspacesDirectories;
+
+
+        public DataDirectories(String[] locationsForNonSystemKeyspaces, String[] locationsForSystemKeyspace)
+        {
+            nonSystemKeyspacesDirectories = toDataDirectories(locationsForNonSystemKeyspaces);
+            systemKeyspaceDataDirectories = toDataDirectories(locationsForSystemKeyspace);
+        }
+
+        private static DataDirectory[] toDataDirectories(String... locations)
+        {
+            DataDirectory[] directories = new DataDirectory[locations.length];
+            for (int i = 0; i < locations.length; ++i)
+                directories[i] = new DataDirectory(new File(locations[i]));
+            return directories;
+        }
+
+        /**
+         * Returns the data directories used to store the data of the specified keyspace.
+         *
+         * @param keyspace the keyspace name
+         * @return the data directories used to store the data of the specified keyspace
+         */
+        public DataDirectory[] getDataDirectoriesUsedBy(String keyspace)
+        {
+            if (SchemaConstants.SYSTEM_KEYSPACE_NAME.equals(keyspace)
+                    && !ArrayUtils.isEmpty(systemKeyspaceDataDirectories)
+                    && !ArrayUtils.contains(nonSystemKeyspacesDirectories, systemKeyspaceDataDirectories[0]))
+            {
+                DataDirectory[] directories = Arrays.copyOf(nonSystemKeyspacesDirectories, nonSystemKeyspacesDirectories.length + 1);
+                directories[directories.length - 1] = systemKeyspaceDataDirectories[0];
+                return directories;
+            }
+            return SchemaConstants.isLocalSystemKeyspace(keyspace) ? systemKeyspaceDataDirectories
+                                                                   : nonSystemKeyspacesDirectories;
+        }
+
+        /**
+         * Returns the data directories for the specified keyspace.
+         *
+         * @param table the table metadata
+         * @return the data directories for the specified keyspace

Review comment:
       ```suggestion
            * Returns the data directories for the specified table.
            *
            * @param table the table metadata
            * @return the data directories for the specified table
   ```

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -631,6 +649,106 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the system keyspaces.
+         */
+        private final DataDirectory[] systemKeyspaceDataDirectories;
+
+        /**
+         * The directories where should be stored the data of the non system keyspaces.

Review comment:
       ```suggestion
            * The directories where the data of the non local system keyspaces should be stored.
   ```

##########
File path: src/java/org/apache/cassandra/service/StorageService.java
##########
@@ -3291,12 +3291,29 @@ public String getKeyspaceReplicationInfo(String keyspaceName)
         return stringify(Gossiper.instance.getUnreachableMembers(), true);
     }
 
+    @Override
     public String[] getAllDataFileLocations()
     {
-        String[] locations = DatabaseDescriptor.getAllDataFileLocations();
-        for (int i = 0; i < locations.length; i++)
-            locations[i] = FileUtils.getCanonicalPath(locations[i]);
-        return locations;
+        return getCanonicalPaths(DatabaseDescriptor.getAllDataFileLocations());
+    }
+
+    private String[] getCanonicalPaths(String[] paths)
+    {
+        for (int i = 0; i < paths.length; i++)
+            paths[i] = FileUtils.getCanonicalPath(paths[i]);
+        return paths;
+    }

Review comment:
       Overwriting the input array seems like it could have risky side effects, like overriding the contents of `DatabaseDescriptor.conf.data_file_directories`. I think we should return a new array.

##########
File path: src/java/org/apache/cassandra/io/FSDiskFullWriteError.java
##########
@@ -18,16 +18,22 @@
 
 package org.apache.cassandra.io;
 
+import java.io.File;
+import java.io.IOException;
+
 public class FSDiskFullWriteError extends FSWriteError
 {
-    public FSDiskFullWriteError(Throwable cause, String path)
+    public FSDiskFullWriteError(String keyspace, long mutationSize)
     {
-        super(cause, path);
+        super(new IOException(String.format("Insufficient disk space to write %s bytes into the %s keyspace",
+                                            mutationSize,
+                                            keyspace)),
+              new File(""));

Review comment:
       Nit: Both `FSDiskFullWriteError` and `FSNoDiskAvailableForWriteError` call the super constructor of `FSWriteError` with `new File("")`. Maybe we could add a new `FSWriteError` constructor without the file argument.

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>

Review comment:
       Nit: Although strictly correct, the `can tolerate the lost of n - 1 disks` might sound like the descriptions of RAID levels where they can tolerate the loss of any n - 1 disks, where here we can only tolerate the loss of the last n - 1 disks, which is worst (but better than before). I'd rephrase to something like `can tolerate the loss of any disk but the first one`, wdyt?

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.

Review comment:
       ```suggestion
        * Returns the locations where the local system keyspaces data should be stored.
   ```

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>

Review comment:
       ```suggestion
        * in the first data directory. This approach guarantees that the server can tolerate the lost of n - 1 disks.</p>
   ```

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.

Review comment:
       ```suggestion
        * Checks if the local system data must be stored in a specific location which supports redundancy.
   ```

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -512,6 +507,9 @@ public boolean hasAvailableDiskSpace(long estimatedSSTables, long expectedTotalW
                 allowedDirs.add(dir);
         }
 
+        if (allowedDirs.isEmpty())
+            throw new FSNoDiskAvailableForWriteError(metadata.keyspace);
+

Review comment:
       Not related to the patch, but we could simplify the call to `Collections.sort` right below with:
   ```java
   allowedDirs.sort(Comparator.comparing(o -> o.location));
   ``` 

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2740,4 +2730,81 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. System keyspaces can have their own disk
+     * to allow for special redundency mechanism. If it is the case the executor services returned for
+     * system keyspace will be differents from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non system keyspaces.
+         */
+        private final ExecutorService[] nonSystemflushExecutors;
+
+        /**
+         * The flush executors for system keyspaces.
+         */
+        private final ExecutorService[] systemDiskFlushExecutors;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonSystemflushExecutors = flushExecutors;
+            systemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("SystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};
+        }
+
+        private ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)
+        {
+            ExecutorService[] flushExecutors = new ExecutorService[numberOfExecutors];
+
+            for (int i = 0; i < numberOfExecutors; i++)
+            {
+                flushExecutors[i] = newThreadPool("PerDiskMemtableFlushWriter_" + i, flushWriters);
+            }
+            return flushExecutors;
+        }
+
+        private static JMXEnabledThreadPoolExecutor newThreadPool(String poolName, int size)
+        {
+            return new JMXEnabledThreadPoolExecutor(size,
+                                                    Stage.KEEP_ALIVE_SECONDS,
+                                                    TimeUnit.SECONDS,
+                                                    new LinkedBlockingQueue<Runnable>(),
+                                                    new NamedThreadFactory(poolName),
+                                                    "internal");
+        }
+
+        /**
+         * Returns the flush executors for the specified keyspace.
+         *
+         * @param keyspaceName the keyspace name
+         * @param tableName the table name
+         * @return the flush executors that should be used for flushing the memtables of the specified keyspace.
+         */
+        public ExecutorService[] getExecutorsFor(String keyspaceName, String tableName)
+        {
+            return Directories.isStoredInSystemKeyspacesDataLocation(keyspaceName, tableName) ? systemDiskFlushExecutors
+                                                                  : nonSystemflushExecutors;
+        }
+
+        /**
+         * Appends all the {@code ExecutorService} used for flushes to the colection.
+         *
+         * @param collection the colection to append to.
+         */
+        public void appendAllExecutors(Collection<ExecutorService> collection)
+        {
+            Collections.addAll(collection, nonSystemflushExecutors);
+            if (nonSystemflushExecutors != systemDiskFlushExecutors)

Review comment:
       Nothing wrong with how the `!=` operator is used here but a comment about its intentionality and/or a `@SuppressWarnings("ArrayEquality")` annotation could be helpful for suspicious readers.

##########
File path: src/java/org/apache/cassandra/config/DatabaseDescriptor.java
##########
@@ -1787,11 +1814,57 @@ public static void setInterDCStreamThroughputOutboundMegabitsPerSec(int value)
         conf.inter_dc_stream_throughput_outbound_megabits_per_sec = value;
     }
 
-    public static String[] getAllDataFileLocations()
+    /**
+     * Checks if the local system data must be stored in a specific location which support redundancy.
+     *
+     * @return {@code true} if the local system keyspaces data must be stored in a different location,
+     * {@code false} otherwise.
+     */
+    public static boolean useSpecificLocationForSystemData()
+    {
+        return conf.system_data_file_directory != null;
+    }
+
+    /**
+     * Returns the locations where should be stored the local system keyspaces data.
+     *
+     * <p>If the {@code system_data_file_directory} was unspecified, the local system keyspaces data should be stored
+     * in the first data directory. This approach guaranty that the server can tolerate the lost of n - 1 disks.</p>
+     *
+     * @return the locations where should be stored the local system keyspaces data
+     */
+    public static String[] getSystemKeyspacesDataFileLocations()
+    {
+        if (conf.system_data_file_directory != null)
+            return new String[] {conf.system_data_file_directory};
+
+        return conf.data_file_directories.length == 0  ? conf.data_file_directories
+                                                       : new String[] {conf.data_file_directories[0]};
+    }
+
+    /**
+     * Returns the locations where should be stored the non local system keyspaces data.
+     *
+     * @return the locations where should be stored the non local system keyspaces data
+     */
+    public static String[] getNonSystemKeyspacesDataFileLocations()

Review comment:
       Naming this is tricky, since these locations actually contain data of system keyspaces that are not local. Maybe `getSystemKeyspacesDataFileLocations` can be changed to the even longer `getLocalSystemKeyspacesDataFileLocations`, but I can't really think of a better name for this one.

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2740,4 +2730,81 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. System keyspaces can have their own disk
+     * to allow for special redundency mechanism. If it is the case the executor services returned for
+     * system keyspace will be differents from the ones for the other keyspaces.</p>

Review comment:
       ```suggestion
        * to allow for special redundancy mechanism. If it is the case the executor services returned for
        * local system keyspaces will be different from the ones for the other keyspaces.</p>
   ```

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -631,6 +649,106 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the system keyspaces.

Review comment:
       ```suggestion
            * The directories for storing the local system keyspaces.
   ```

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -631,6 +649,106 @@ public String toString()
         }
     }
 
+    /**
+     * Data directories used to store keyspace data.
+     */
+    public static final class DataDirectories implements Iterable<DataDirectory>
+    {
+        /**
+         * The directories for storing the system keyspaces.
+         */
+        private final DataDirectory[] systemKeyspaceDataDirectories;
+
+        /**
+         * The directories where should be stored the data of the non system keyspaces.
+         */
+        private final DataDirectory[] nonSystemKeyspacesDirectories;
+
+
+        public DataDirectories(String[] locationsForNonSystemKeyspaces, String[] locationsForSystemKeyspace)
+        {
+            nonSystemKeyspacesDirectories = toDataDirectories(locationsForNonSystemKeyspaces);
+            systemKeyspaceDataDirectories = toDataDirectories(locationsForSystemKeyspace);
+        }
+
+        private static DataDirectory[] toDataDirectories(String... locations)
+        {
+            DataDirectory[] directories = new DataDirectory[locations.length];
+            for (int i = 0; i < locations.length; ++i)
+                directories[i] = new DataDirectory(new File(locations[i]));
+            return directories;
+        }
+
+        /**
+         * Returns the data directories used to store the data of the specified keyspace.
+         *
+         * @param keyspace the keyspace name
+         * @return the data directories used to store the data of the specified keyspace
+         */
+        public DataDirectory[] getDataDirectoriesUsedBy(String keyspace)
+        {
+            if (SchemaConstants.SYSTEM_KEYSPACE_NAME.equals(keyspace)
+                    && !ArrayUtils.isEmpty(systemKeyspaceDataDirectories)
+                    && !ArrayUtils.contains(nonSystemKeyspacesDirectories, systemKeyspaceDataDirectories[0]))
+            {
+                DataDirectory[] directories = Arrays.copyOf(nonSystemKeyspacesDirectories, nonSystemKeyspacesDirectories.length + 1);
+                directories[directories.length - 1] = systemKeyspaceDataDirectories[0];
+                return directories;
+            }
+            return SchemaConstants.isLocalSystemKeyspace(keyspace) ? systemKeyspaceDataDirectories
+                                                                   : nonSystemKeyspacesDirectories;
+        }
+
+        /**
+         * Returns the data directories for the specified keyspace.
+         *
+         * @param table the table metadata
+         * @return the data directories for the specified keyspace
+         */
+        public DataDirectory[] getDataDirectoriesFor(TableMetadata table)
+        {
+            return isStoredInSystemKeyspacesDataLocation(table.keyspace, table.name) ? systemKeyspaceDataDirectories
+                                                                                     : nonSystemKeyspacesDirectories;
+        }
+
+        @Override
+        public Iterator<DataDirectory> iterator()
+        {
+            Iterator<DataDirectory> iter = Iterators.forArray(nonSystemKeyspacesDirectories);
+
+            if (nonSystemKeyspacesDirectories == systemKeyspaceDataDirectories)

Review comment:
       Same as before, we could add comment about the intentional use of `==` and/or a `@SuppressWarnings("ArrayEquality")` annotation.

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -456,6 +476,78 @@ protected void setup()
         completeSetup();
     }
 
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    private void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.importSystemDataFilesFrom");
+
+        // If there is only one directory and no system keyspace directory has been specified we do not need to do
+        // anything. If it is not the case we want to try to migrate the data.
+        if (DatabaseDescriptor.useSpecificLocationForSystemData()
+                || DatabaseDescriptor.getNonSystemKeyspacesDataFileLocations().length > 1
+                || importSystemDataFrom != null)

Review comment:
       We could invert this condition and return directly to reduce nesting:
   ```java
   if (!DatabaseDescriptor.useSpecificLocationForSystemData()
       && DatabaseDescriptor.getNonSystemKeyspacesDataFileLocations().length <= 1
       && importSystemDataFrom == null)
       return;
   ```

##########
File path: src/java/org/apache/cassandra/db/Directories.java
##########
@@ -1002,8 +1120,7 @@ public long getTrueAllocatedSizeIn(File input)
 
     public static List<File> getKSChildDirectories(String ksName)
     {
-        return getKSChildDirectories(ksName, dataDirectories);
-
+        return getKSChildDirectories(ksName, dataDirectories.getDataDirectoriesUsedBy(ksName));

Review comment:
       Maybe the two versions of `getKSChildDirectories` can be merged, given that one is the only caller of the other?

##########
File path: src/java/org/apache/cassandra/db/ColumnFamilyStore.java
##########
@@ -2740,4 +2730,81 @@ public boolean getNeverPurgeTombstones()
     {
         return neverPurgeTombstones;
     }
+
+    /**
+     * The thread pools used to flush memtables.
+     *
+     * <p>Each disk has its own set of thread pools to perform memtable flushes.</p>
+     * <p>Based on the configuration. System keyspaces can have their own disk
+     * to allow for special redundency mechanism. If it is the case the executor services returned for
+     * system keyspace will be differents from the ones for the other keyspaces.</p>
+     */
+    private static final class PerDiskFlushExecutors
+    {
+        /**
+         * The flush executors for non system keyspaces.
+         */
+        private final ExecutorService[] nonSystemflushExecutors;
+
+        /**
+         * The flush executors for system keyspaces.
+         */
+        private final ExecutorService[] systemDiskFlushExecutors;
+
+        public PerDiskFlushExecutors(int flushWriters,
+                                     String[] locationsForNonSystemKeyspaces,
+                                     boolean useSpecificLocationForSystemKeyspaces)
+        {
+            ExecutorService[] flushExecutors = createPerDiskFlushWriters(locationsForNonSystemKeyspaces.length, flushWriters);
+            nonSystemflushExecutors = flushExecutors;
+            systemDiskFlushExecutors = useSpecificLocationForSystemKeyspaces ? new ExecutorService[] {newThreadPool("SystemKeyspacesDiskMemtableFlushWriter", flushWriters)}
+                                                                             : new ExecutorService[] {flushExecutors[0]};
+        }
+
+        private ExecutorService[] createPerDiskFlushWriters(int numberOfExecutors, int flushWriters)
+        {
+            ExecutorService[] flushExecutors = new ExecutorService[numberOfExecutors];
+
+            for (int i = 0; i < numberOfExecutors; i++)
+            {
+                flushExecutors[i] = newThreadPool("PerDiskMemtableFlushWriter_" + i, flushWriters);
+            }
+            return flushExecutors;
+        }
+
+        private static JMXEnabledThreadPoolExecutor newThreadPool(String poolName, int size)
+        {
+            return new JMXEnabledThreadPoolExecutor(size,
+                                                    Stage.KEEP_ALIVE_SECONDS,
+                                                    TimeUnit.SECONDS,
+                                                    new LinkedBlockingQueue<Runnable>(),

Review comment:
       ```suggestion
                                                       new LinkedBlockingQueue<>(),
   ```

##########
File path: src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java
##########
@@ -26,10 +26,9 @@
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.DisallowedDirectories;
 import org.apache.cassandra.db.Keyspace;
-import org.apache.cassandra.io.FSError;
-import org.apache.cassandra.io.FSErrorHandler;
-import org.apache.cassandra.io.FSReadError;
+import org.apache.cassandra.io.*;
 import org.apache.cassandra.io.sstable.CorruptSSTableException;
+import org.apache.cassandra.schema.SchemaConstants;

Review comment:
       Nit: unused

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -456,6 +476,78 @@ protected void setup()
         completeSetup();
     }
 
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    private void migrateSystemDataIfNeeded() throws IOException

Review comment:
       It would be nice to have some upgrade dtests for this, provided that we have the machinery to do so in either Java or Python dtests.

##########
File path: src/java/org/apache/cassandra/io/util/FileUtils.java
##########
@@ -920,4 +922,69 @@ public Object getAttribute(String attribute) throws IOException
             return fileStore.getAttribute(attribute);
         }
     }
+
+
+    /**
+     * Moves the files from a directory to another directory.
+     * <p>Once a file has been copied to the target directory it will be deleted from the source directory.
+     * If a file already exist in the target directory a warning will be logged and the file will not
+     * be deleted.</p>

Review comment:
       Nit: it also moves subdirectories.
   ```suggestion
   
       /**
        * Moves the contents of a directory to another directory.
        * <p>Once a file has been copied to the target directory it will be deleted from the source directory.
        * If a file already exists in the target directory a warning will be logged and the file will not
        * be deleted.</p>
   ```

##########
File path: src/java/org/apache/cassandra/service/StorageServiceMBean.java
##########
@@ -118,6 +118,20 @@
      */
     public String[] getAllDataFileLocations();
 
+    /**
+     * Returns the locations where should be stored the system keyspaces data.
+     *
+     * @return the locations where should be stored the system keyspaces data

Review comment:
       ```suggestion
        * Returns the locations where the local system keyspaces data should be stored.
        *
        * @return the locations where the local system keyspaces data should be stored
   ```

##########
File path: test/unit/org/apache/cassandra/db/DirectoriesTest.java
##########
@@ -88,18 +86,21 @@ public static void beforeClass() throws IOException
         tempDataDir.delete(); // hack to create a temp dir
         tempDataDir.mkdir();
 
-        Directories.overrideDataDirectoriesForTest(tempDataDir.getPath());
-        // Create two fake data dir for tests, one using CF directories, one that do not.
+       // Create two fake data dir for tests, one using CF directories, one that do not.

Review comment:
       ```suggestion
           // Create two fake data dir for tests, one using CF directories, one that do not.
   ```

##########
File path: src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java
##########
@@ -67,6 +66,18 @@ public void handleFSError(FSError e)
                 StorageService.instance.stopTransports();
                 break;
             case best_effort:
+
+                // There are a few scenarios where we know that the node will not be able to operate properly

Review comment:
       ```suggestion
                   // There are a few scenarios where we know that the node will not be able to operate properly.
   ```

##########
File path: src/java/org/apache/cassandra/io/FSDiskFullWriteError.java
##########
@@ -18,16 +18,22 @@
 
 package org.apache.cassandra.io;
 
+import java.io.File;
+import java.io.IOException;
+
 public class FSDiskFullWriteError extends FSWriteError
 {
-    public FSDiskFullWriteError(Throwable cause, String path)
+    public FSDiskFullWriteError(String keyspace, long mutationSize)
     {
-        super(cause, path);
+        super(new IOException(String.format("Insufficient disk space to write %s bytes into the %s keyspace",

Review comment:
       ```suggestion
           super(new IOException(String.format("Insufficient disk space to write %d bytes into the %s keyspace",
   ```

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -212,6 +219,19 @@ protected void setup()
     {
         FileUtils.setFSErrorHandler(new DefaultFSErrorHandler());
 
+        // Since CASSANDRA-14793 the local system file data are not dispatched accross the data directories
+        // anymore to reduce the risks in case of disk failures. By consequence, the system need to ensure in case of
+        // upgrade that the old data files have been migrated to the new directories before we start deleting
+        // snapshot and upgrading system tables.

Review comment:
       ```suggestion
           // Since CASSANDRA-14793 the local system keyspaces data are not dispatched across the data directories
           // anymore to reduce the risks in case of disk failures. By consequence, the system need to ensure in case of
           // upgrade that the old data files have been migrated to the new directories before we start deleting
           // snapshots and upgrading system tables.
   ```

##########
File path: src/java/org/apache/cassandra/io/util/FileUtils.java
##########
@@ -920,4 +922,69 @@ public Object getAttribute(String attribute) throws IOException
             return fileStore.getAttribute(attribute);
         }
     }
+
+
+    /**
+     * Moves the files from a directory to another directory.
+     * <p>Once a file has been copied to the target directory it will be deleted from the source directory.
+     * If a file already exist in the target directory a warning will be logged and the file will not
+     * be deleted.</p>
+     *
+     * @param source the directory containing the files to move
+     * @param target the directory where the files must be moved
+     */
+    public static void moveRecursively(Path source, Path target) throws IOException
+    {
+        logger.info("Moving {} to {}" , source, target);
+
+        if (Files.isDirectory(source))
+        {
+            Files.createDirectories(target);
+
+            try (Stream<Path> paths = Files.list(source))
+            {
+                Path[] children = paths.toArray(Path[]::new);
+
+                for (Path child : children)
+                    moveRecursively(child, target.resolve(source.relativize(child)));
+            }
+
+            deleteDirectoryIfEmpty(source);
+        }
+        else
+        {
+            if (Files.exists(target))
+            {
+                logger.warn("Cannot move the file {} to {} as the target file already exists." , source, target);
+            }
+            else
+            {
+                Files.copy(source, target, StandardCopyOption.COPY_ATTRIBUTES);
+                Files.delete(source);
+            }
+        }
+    }
+
+    /**
+     * Deletes the specified directory if it is empty
+     *
+     * @param path the path to the directory
+     */
+    public static void deleteDirectoryIfEmpty(Path path) throws IOException

Review comment:
       I think this could be called with the path of a file, we should probably check that the path belongs to a directory. It would be nice to have a simple unit test for this method in `FileUtilsTest`.

##########
File path: test/unit/org/apache/cassandra/io/util/FileUtilsTest.java
##########
@@ -128,6 +128,75 @@ public void testIsContained()
         assertFalse(FileUtils.isContained(new File("/tmp/abc/../abc"), new File("/tmp/abcc")));
     }
 
+    @Test
+    public void testMoveFiles() throws IOException

Review comment:
       Nice test

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -243,7 +263,7 @@ protected void setup()
         }
         catch (IOException e)
         {
-            exitOrFail(3, e.getMessage(), e.getCause());
+            exitOrFail(StartupException.ERR_WRONG_DISK_STATE, e.getMessage(), e.getCause());

Review comment:
       Good catch

##########
File path: src/java/org/apache/cassandra/service/CassandraDaemon.java
##########
@@ -456,6 +476,78 @@ protected void setup()
         completeSetup();
     }
 
+    /**
+     * Checks if the data of the local system keyspaces need to be migrated to a different location.
+     *
+     * @throws IOException
+     */
+    private void migrateSystemDataIfNeeded() throws IOException
+    {
+        String importSystemDataFrom = System.getProperty("cassandra.importSystemDataFilesFrom");
+
+        // If there is only one directory and no system keyspace directory has been specified we do not need to do
+        // anything. If it is not the case we want to try to migrate the data.
+        if (DatabaseDescriptor.useSpecificLocationForSystemData()
+                || DatabaseDescriptor.getNonSystemKeyspacesDataFileLocations().length > 1
+                || importSystemDataFrom != null)
+        {
+            // We can face several cases:
+            //  1) The system data are spread accross the data file locations and need to be moved to
+            //     the first data location (upgrade to 4.0)
+            //  2) The system data are spread accross the data file locations and need to be moved to
+            //     the system keyspace location configured by the user (upgrade to 4.0)
+            //  3) The system data are stored in the first data location and need to be moved to
+            //     the system keyspace location configured by the user (system_data_file_directory has been configured)
+            //  4) The system data have been stored in the system keyspace location configured by the user
+            //     and need to be moved to the first data location (the import of the data has been requested)

Review comment:
       Nice comment!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org