You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/06/08 15:39:59 UTC

[jira] [Created] (CASSANDRA-2749) fine-grained control over data directories

fine-grained control over data directories
------------------------------------------

                 Key: CASSANDRA-2749
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Jonathan Ellis
            Priority: Minor


Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.

Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-2749:
---------------------------------------

    Fix Version/s: 1.0

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 1.0
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170467#comment-13170467 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

sounds great (both just supporting new-style-layout and limiting names to 32chars)

guess we need to supply a tool to rename sstables files if anyone is on longer names? and rolling upgrades are out of the question then right? (maybe the already are?)
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155218#comment-13155218 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

+1 first few additions:
 - You forgot to set `chmod +x ./bin/sstablemover`.
 - Good idea would be to use `mv` instead of `cp` when -d option is specified.

                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment:     (was: 0001-2749-v2.patch)
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244746#comment-13244746 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

Did some more research on path limitations:

NTFS is technically okay with paths up to 32K long[1], but the windows api is limited to 256[2].  Common Linux filesystems have a limit of 255 bytes per path *component* (i.e. directory or filename) but no total path limit.  However, Linux defines PATH_MAX and FILENAME_MAX, both 4096. [3]

[1] http://en.wikipedia.org/wiki/Comparison_of_file_systems
[2] http://msdn.microsoft.com/en-us/library/aa365247.aspx
[3] http://serverfault.com/questions/9546/filename-length-limits-on-linux

In short: restricting KS and CF names to 32 characters is a good idea for the benefit of Windows portability.  However, we may want to exempt Linux systems from the startup length check to allow easier upgrades.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.1.0
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163571#comment-13163571 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
---------------------------------------------

bq. i had on algorithm for detecting what kind of file it is in the old backwards compatible patch, it iterates over all data directories and figures out which data directory the file is in, then it knows the keyspace is the next part of the filename, and can check if there are files or directories in that directory.

I tried that too, but this doesn't work when say you're opening a sstable by the sstableloader, because the files are not in a data directories then. 
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Héctor Izquierdo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046045#comment-13046045 ] 

Héctor Izquierdo commented on CASSANDRA-2749:
---------------------------------------------

What about being configurable in a separate file like the network topology? Could that work as a first approximation?

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment: 0003-Fixes.patch
                0002-fix-unit-tests.patch
                0001-2749.patch

Attaching rebased patches, with a 3rd patches (0003-Fixes.patch) addressing the Pavel's remarks. More specifically:

bq. o.a.c.db.Directories comment should be updated because it still uses SSTable file name without keyspace.

Fixed, thanks

bq. o.a.c.io.sstable.SSTableReaderTest won't compile

Sorry, I forgot to check the test after a last rebase, fixed too (this involved renaming a number of sstables from test/data/legacy-sstables/hb to include the keyspace name, so that specific change is in the 2nd 'fix unit tests' patch to avoid polluting the 3rd one).

bq. if you start with empty data directory you get following exception and process exits

Fixed. I've actually made two modifications: the migration checks the existence of the directory to avoid the NPE during listFiles(), but I've also modified the 'should we migrate' check to detect new nodes (checking if the system keyspace directory exists) and thus not print the migration message at all.

bq. on snapshot doesn't create or move (from older schema) index SSTables related to CF

I'm not sure I see what this one is. Are we talking of the migration process?

In any case, you made me think about secondary indexes. Maybe it is more "natural" to have secondary indexes sstables be in the same directory than the base cfs? Since the indexes name is not really something exposed (granted you don't have to be a genius to figure it out), it feels like it would slightly simplify administration to not put them in a separate directory.

I've updated the patch to implement this last idea (so indexes are in the same directory than their base cf), but it would be nice to have multiple opinions on that move since we don't want to have to do a new migration in 6 month because "we've changed our mind".

bq. shouldn't old "snapshots" directory be removed after move?

Your right, fixed (for backups too).

                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment:     (was: 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz)
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment:     (was: 0002-fix-unit-tests.patch)
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170364#comment-13170364 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

bq. we cannot stream between two nodes, one using separate cf directory

I don't see any reason to continue to support the old-style directory layout.  That adds complexity (operationally as well as in the code) for no benefit that I can think of.  I think we should migrate from old layout to new on the first startup under 1.1.

bq. regarding keyspaces in file names, sure, why not, guess having a header with this info in the file is out of the question, then the only meta data we have is the file name, right? A problem could be if we want to do CASSANDRA-1983 later, that would increase the file name length even more

I'm on the fence here -- on the one hand having ks + cf in the filename simplifies some things.  On the other hand, we allow arbitrary-length KS + CF names (up to 64K iirc) so UUID aside we're already in trouble on ext3/ext4, xfs, and ntfs, which all support max filename length of ~256.  I'm starting to think we should move these into the metadata component instead of the filename.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment:     (was: 0003-Fixes.patch)
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment: 0002-fix-unit-tests.patch
                0001-add-new-directory-layout.patch

I've only look at the last posted patchset (2749_proper.tar.gz) but I've found the following few problems:
* The disk space free space checks doesn't take the new layout into account.  In other words, it always checks the keyspace directory for free space. But in case we use CF directories, the keyspace directory could likely be on a disk with almost no freespace (and in any case, this won't always be the right disk to check).
* The check that a snapshot is existing and the clearing of snapshot is broken for the CF directories layout (in Table.java)
* The leveled manifest is not put into it's CF directory (it follows that its snapshot is not in the right place either).

I'm also not a fan of having the directory argument given to the Descriptor constructors not be the actual directory of the sstable it describes. I think it's error prone, making Descriptor harder to use (the fact that Descriptor.getDirectory() doesn't necessarily returns the directory you've fed to the constructor is confusing).

More generally, the current code dealing with the paths to the data directories is currently very badly encapsulated (not this patch fault at all). Which means that adding a new "parallel" data directory layout leaks everywhere and make the code hard to read/maintain. I strongly believe we should use the opportunity to improve that code encapsulation.

So attaching a patch that refactor a bit more the code to improve encapsulation. It essentially consists in adding a new Directories class that abstracts the actual layout of the data directory (including the fact that we can use multiple locations, which imho greatly clean the code). Aside from that, the patch reuse the sstablemover of the preceding patch (with an added verbose option). The patch also includes fairly extensive unit tests for the new class.

Now there is one thing that really bother me with this patch (the preceding patches have the same problem). It's the fact that the Descriptor.fromFilename method needs to query the global useSeparateCFDirectories option to work correctly. This has a number of practical consequences:
# it is the reason for a number of super annoying gymnastic in the unit tests, like the need to duplicate the sstables used by LegacySSTableTest for instance. I've put all of this gymnastic in a separate patch (0002-fix-unit-tests.patch).
# this means we cannot stream between two nodes, one using separate cf directory, one that don't (that limitation could be solve independently for streaming only, but it's worth considering a more general fix imo, in particular because a specific fix would be a tad ugly).
# tools like the sstableloader will require that you put the sstable to load into a ksname/cfname/ directory when you use the separate directory option.  It's not a big deal, but it feels arbitrary and annoying.

At first, I had hoped that we could make Descriptor.fromFilename 'detect' whether or not we were using the separate directory layout by using the fact that the file name contains the column family name already, but that doesn't work when the ksname == cfname (which is allowed afaik).

One option would be to stop relying on the directory structure the sstable is in to detect the keyspace name. We could do that if we included the keyspace name in the filename for instance. The pros would be to fix all the problem listed above, and in particular we would be able to allow stuffs like {{sstableloader <filename>}}, instead of requiring that the files are put inside a specifically name directory, which would be nice. I don't see much cons except making the file names longer. Any opinion on that idea?

                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Chris Chiappone (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470655#comment-13470655 ] 

Chris Chiappone commented on CASSANDRA-2749:
--------------------------------------------

This actually causes a bunch of problems for customers that have used longer the 48 character keyspace and cf names. How will this affect the upgrade path for those customers?
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.1.0
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz, 2749.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 2749.tar.gz

4 patches, one is the same as the v4, the other three patches are

* change configuration option name
* make LegacySSTable test work by telling Descriptor that the file is old-style
* make incremental backups work
  - problem was when migrating from old-style to new-style, the generation-counter in ColumnFamilyStore generated clashing ids, which was not a problem until it tried to hard link the files to the same directory with the same name. This should be refactored in CASSANDRA-1983 (i'll give that a try when this is merged)
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046102#comment-13046102 ] 

Peter Schuller commented on CASSANDRA-2749:
-------------------------------------------

+1 on that. We have been discussing the same thing, for the same purpose. The only kink is that you don't want to do something like having a per-cf setting that is tied to local node details like paths. But simply placing CF:s in a named subdirectory (similar to the pg tablespace) which can, on a per-node basis, by a symlink or a mountpoint, avoids that.

This means there's no problem doing a rolling re-configuration of a cluster, and there is no need to realize before hand that you might want to move some particular CF and do something like assign it to a tablespace (to get the level of indirection). It all just works by default, and you can move CF:s at any time on any node without co-ordination other than the node being down for a bit.

I can foresee it being easier to accidentally start a node which seems to work but has some CF:s be completely empty, because Cassandra won't be able to distinguish between an actual empty CF and a directory that wasn't mounted (or a symlink pointing to a non-mounted directory). Something simple like creating a marker of some kind on CF creation might help with that; on start-up CF:s that are missing the marker could be rejected. But - I suppose this is overkill at least initially.


> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118696#comment-13118696 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

It is not backwards compatible, i figured it was not worth the extra complexity, i imagine the "upgrade"-path to be:
1. shut down node
2. edit config file
3. symlink in SSDs and move the files into the subdirectories
4. start node

Of course, this could be done on startup by cassandra itself, but providing a shell-script that does it keeps it simple, what do you think? I really dont want the complexity with sstable files in two places at the same time.

My solution simply changed the Descriptor class to return paths to sstable files in subdirs, and then fix everything that was affected. Many of the classes modified were due to assumptions in the unit tests. I'm currently refactoring the Descriptor class to make it cleaner and a central place to glob for files etc. Expect a patch tonight or tomorrow.

I will of course change DatabaseDescriptor.getPerCFDirectory() to read the config file, kept it hard coded to make testing simpler.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127329#comment-13127329 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

Biggest part left is figuring out how to estimate disk space to know where to write an sstable, ill work on that this weekend

Other than that small cleanups (codestyle), unit tests and adding the config parameter
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124330#comment-13124330 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

Sorry Marcus I had no time to look at this but I will by the end of this week.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 2749_proper.tar.gz

updated, of course mv should be used :)

rebased as well
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment: 0002-fix-unit-tests-v2.patch
                0001-2749-v2.patch


Patch attached (0001-2749-v2.patch and 0002-fix-unit-tests-v2.patch) that:
# only support the new layout with sstables in cf directories
# migrate automagically files the first time
# adds the keyspace into the file name (to not rely on the directory sstables are in)
# limit keyspace and cf names to 32 characters

During the initial upgrade, we also check if the user would be fine with the current keyspace and cf name. If not, we just refuse to start. I hope this won't happen to anyone because renaming a CF/Keyspace in a rolling fashion is not something fun (or even possible for that matter). Note that this check don't fully enforce the 32 chars limitation however but rather tries to be as permissive as possible checking that any file path resulting for the upgrade should be less than 256 chars (the windows limit).

PS: the first patch is the bulk of the change, the second one is the unit tests. The later is huge because it renames the sstables in test/data to include the keyspace name in them.

                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749-v2.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests-v2.patch, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment: 0003-Fixes.patch
                0002-fix-unit-tests.patch
                0001-2749.patch

Rebased patches attached.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118720#comment-13118720 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

This should be done like compression which compresses only newly created SSTables, we don't want to make users to stop their node for indefinite time to change config and move files.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment:     (was: 0002-fix-unit-tests-v2.patch)
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045953#comment-13045953 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

We could also have a "memory" location that would be useful for temporary data.

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085812#comment-13085812 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

After crushing my head again this for a few days I can say that this is more complicated that it sounds for a few reasons:

 - We will need to support both old/new directory structures which requires major changes in the way how Descriptor class works and how CFS and SSTable classes do file lookup and path generation.
 - Adds additional complexity to the way how we do backups, snapshots and recover which could potentially lead to some nasty bugs.
 - As Peter already mentioned "Cassandra won't be able to distinguish between an actual empty CF and a directory that wasn't mounted (or a symlink pointing to a non-mounted directory)".

There are more but those mentioned are major ones. Let's skip this for now.

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146250#comment-13146250 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

Tested v4 on trunk and I see few test failures - LegacySSTableTest and ColumnFamilyStoreTest, it is related to how you determine what placement style is used for a given SSTable, feels like !DatabaseDescriptor.useSeparateCFDirectories() condition is insufficient. Also I think it would be correct to rename "one_directory_per_column_family" option to "use_separate_column_family_directories" with comment like "Useful when you have disks with different speeds (HDD/SSD) and you want explicitly distribute Column Families between them".
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch

submitting a "working" patch for separating column families into subdirectories, <ks>/<cf>/<cf>-xyz.db

There are alot of things to clean up/refactor, but submitting patch for comments.

Unit tests work both for new-style dirs and old (except for a backup test that i will fix when backups actually end up where they should).

TODO: (probably more than this)
- snapshots need to go into the <ks>/<cf>/snapshots dir since the purpose of this patch is to make it possible to have the <cf>/ directory on a separate drive
- incremental backups - same issue as for snapshots
- refactoring of Descriptor class, quite hairy now
- Disk space checking in column family subdirs
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-2749.
-----------------------------------------

    Resolution: Fixed
      Assignee: Sylvain Lebresne

Committed with nits above fixed, thanks Pavel.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118868#comment-13118868 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

On my opinion it's more user-friendly if we support old directory structure for existing files and let compaction do replace for us.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-2749:
---------------------------------------

    Fix Version/s:     (was: 1.0)
         Assignee:     (was: Pavel Yaskevich)

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170480#comment-13170480 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
---------------------------------------------

bq. guess we need to supply a tool to rename sstables files if anyone is on longer names?

We probably don't need to do anything. I don't think anyone is really using names long enough for them to it the file system limit, the goal of limiting the names is just so to prevent this from happening but there will be no other assumption that the names are short from the code.

I also don't think anything will prevent rolling upgrades, do you had something in mind?

Note: I have a long flight ahead of me so I plan to update my last patch with both those changes, as I still like the moving of all the directories handling in a dedicated class, even if we don't support both layout.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126910#comment-13126910 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

ah that is a leftover from my first non-backwards-compatible patch

it works even without the cf in the new File(...)
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Chris Burroughs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081679#comment-13081679 ] 

Chris Burroughs commented on CASSANDRA-2749:
--------------------------------------------

It would also be cool (but this is obviously speculative) to have the ability to keep Index files on an SSD, and the larger data files on rotating disks.

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 1.0
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118017#comment-13118017 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

btw, yes, i did include the data files in the patch on purpose, they need to be in subdirs for some of the unit tests to work
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170394#comment-13170394 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

+1 limiting to 32
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment:     (was: 0001-2749.patch)
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 2749_backwards_compatible_v1.patch

First version of a backwards compatible patch.

Approach is:
* On startup, recursively glob all data directories to find existing sstables
* whenever generating new files, look at the setting in the config, and put the file there
* probably more, cant remember, my brain is messed up after fighting with this for a few days, look at patch! :)

this gives that a user can enable/disable the setting without mocking around manually with the sstables.

Comments? Yes, the setting is still hardcoded as a marker that this is not ready to be merged :)
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126899#comment-13126899 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

Looks better now but I don't like how you changed DescriptorTest especially testLegacy() "userActionUtilsKey" there is meant to be a CF name, so it seems like current version does not work correctly?
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Ryan King (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046069#comment-13046069 ] 

Ryan King commented on CASSANDRA-2749:
--------------------------------------

Since each keyspace is stored in a different sub-directory of the DataDiretories, you can already split the storage of different keyspaces with some clever mount options. Maybe we could give column families the same treatment?

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz

ok, actually touched it again, a non-backwards compatible patch with a tool to move(copy) the files into subdirs

script is in tools/sstablemover
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143863#comment-13143863 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

v3 looks good (needs rebase), here is my comments:

 - DatabaseDescriptor.getPerCFDirectory() should be renamed to something like "useSeparateCFDirectories"
 - in the ColumnFamilyStory.accept method file name should be checked depending on case because we can CF names are case-sensitive.
 - we don't have ColumnFamilyStore.rename method anymore which is a good thing for this patch, you can just rebase and remove related code.
 - in the Table.snapshotExists I think we should be more careful determining if snapshot actually exists.
 - TODO should be cleaned up.

Wish: if it's possible I think we can remove ColumnFamily name from the SSTable files if those are in the CF directory already.

I think we are almost done in here.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2749:
-----------------------------------------

    Assignee: Pavel Yaskevich

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146556#comment-13146556 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

I don't really like the idea of keeping a map of all generates for each of ks/cf and updating it all the time (this also implies that we need to resize it after cf is created/dropped). After re-thinking this once again it feels like we really should just make a tool to transfer SSTable to separate directories and back, make a flag in config that will indicate old/new style of file structure, because I don't see a error-prone way to handle this when old/new style SSTables are mixed...
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 2749_backwards_compatible_v4.patch

v4 attached

* renamed DatabaseDescriptor method
* made file name checking case sensitive again...
* removed unused rename code (following the renameSSTables call path, it looks very strange, even in trunk, ill fix this after this patch is merged)
* cant figure out a way to make snapshotExists any better, im looking for a directory with the snapshot tag in all places, not sure if it could be wrong here (even if it is quite naive)
* removed dead todos


                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170475#comment-13170475 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

I think if we would be just supporting new style layout we can convert on startup. +1 on both ideas tho.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126980#comment-13126980 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

I forgot to mention - would be great to see a test for RecursiveGlob and please put a license banner on top of that file like we do everywhere else.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170487#comment-13170487 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

It might be worth adding a "are my filenames going to be too large" check against all KS + CF combinations before starting to migrate data files around, though.  It would suck to end up with a partially converted database if some short CF names complete early on, before erroring out on a long one.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118454#comment-13118454 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

Thanks for your work, Marcus! First of all, can you please write your algorithm down so people can participate without reading actual code?

I don't think that this is the right way to go because I can't find how does it manage to keep backward compatibility with current directory structure (and that test SSTables should be moved is yet another confirmation) that would imply major change in the Descriptor/Component classes, all the current changes relay on DatabaseDescriptor.getPerCFDirectory() which is static "true" which makes it redundant. In my vision of things major changes should be made only in Descriptor, Component, ColumnFamilyStore and SSTable classes to change the way they create/lookup file locations and do backups and snapshoting.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118859#comment-13118859 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

Ok, i'll update the patch
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170385#comment-13170385 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
---------------------------------------------

{quote}
On the other hand, we allow arbitrary-length KS + CF names (up to 64K iirc) so UUID aside we're already in trouble on ext3/ext4, xfs, and ntfs, which all support max filename length of ~256. I'm starting to think we should move these into the metadata component instead of the filename.
{quote}

The thing with the metadata component is that from a code perspective, there is lots of places where we want to create a Descriptor, which involves extracting the keyspace/cf names only based on the filename. Adding the necessity to locate and read the metadata in those places will likely don't be very fun.

So I'd be in favor of just limiting the keyspace and column family names. It's one for which there is no real point to have very long names. Limiting each one to 32 characters shouldn't be a strong limitation.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment:     (was: 0001-add-new-directory-layout.patch)
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118864#comment-13118864 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

I don't think taking compression as a model makes sense. With compression, it's fine to set the option globally and each node can start using compression when it gets the schema update. But this affects the storage engine "below" schema.

I think it's reasonable to make the change offline, especially if an upgrade tool is provided to make it simple.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175390#comment-13175390 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

bq. I'm not sure I see what this one is. Are we talking of the migration process?

I was testing it like this : 

    # run 1.1 *without* modifications
    # ./tools/stress/bin/stress -n 50000 -S 512 -x KEYS
    # ./bin/nodetool -h localhost flush Keyspace1 Standard1
    # ./bin/nodetool -h localhost snapshot Keyspace1
    # made sure that Standard1.Idx-* SSTables are in the snapshots/<timestamp> directory
    # run 1.1 *with* you patch applied
    # checked if snapshots directory was moved and what files did it include - it was lucking Standard1.Idx-* files
    # cleaned data directory
    # repeated steps 1 - 5 but this time *with* your patch applied and it didn't include Standard1.Idx-* into snapshot

bq. Maybe it is more "natural" to have secondary indexes sstables be in the same directory than the base cfs? 

+1

                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175070#comment-13175070 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

Thanks, Sylvain! We are getting really close :)

Here are problems I found:

- o.a.c.db.Directories comment should be updated because it still uses SSTable file name without keyspace.
- o.a.c.io.sstable.SSTableReaderTest won't compile
- if you start with empty data directory you get following exception and process exits
{code}
 INFO 23:51:11,987 Upgrade from pre-1.1 version detected: migrating sstables to new directory layout
ERROR 23:51:11,988 Exception encountered during startup
java.lang.NullPointerException
	at org.apache.cassandra.db.Directories.migrateSSTables(Directories.java:416)
	at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:164)
	at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:360)
	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
java.lang.NullPointerException
	at org.apache.cassandra.db.Directories.migrateSSTables(Directories.java:416)
	at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:164)
	at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:360)
	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
Exception encountered during startup: null
{code}

- on snapshot doesn't create or move (from older schema) index SSTables related to CF
- shouldn't old "snapshots" directory be removed after move? 

                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749-v2.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests-v2.patch, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 2749_backwards_compatible_v3.patch

third version, this is could be considered feature-complete, changes compared to v2 of the patch:

* make sure disk space is calculated correctly
  - if there is a subdirectory in the datadirectory with the same name as the CF, check in that directory, otherwise, check in the datadirectory (the subdirectory will be created)
* add config param
* add test for RecursiveGlob
* make RecursiveGlob ignore case when comparing file names
* add test for scrubbing old style CFs to new style CFs (or, modify an existing test to check this too)
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126954#comment-13126954 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

Good, can you briefly tell what is left? I see that ColumnFamilyStoreTest has an commented assertion and code still have some TODOs. Also please make sure you follow your CodeStyle and put space after if/for/while/... statements...
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178822#comment-13178822 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

+1 with last nit - Directories.SSTableLister methods skip{Compacted,Temporary}(boolean) and includeBackups(boolean) ignore argument and always set corresponding option to "true".
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175447#comment-13175447 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
---------------------------------------------

Weird. I just tried the same scenario and everything worked correctly. I should mention that when moving the snapshots/backups, the migration process rename them to the new filename convention, so they will be called Keyspace1-Standard1.Idx-*. Or maybe I fixed it with the last version of the patch without realizing it.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 2749_not_backwards.tar.gz

updated;

* sstablemover can now reverse changes (-r) and delete old files (-d).
* LegacySSTableTest fixed, hopefully
* added comments to cassandra.yaml
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045956#comment-13045956 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

There's some tension between managing this cluster-wide and the actual data directory definitions being per-machine. Not sure what the best solution there is.

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson reassigned CASSANDRA-2749:
------------------------------------------

    Assignee:     (was: Marcus Eriksson)

ok, im not touching this anymore, code is there if anyone wants it
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145731#comment-13145731 ] 

Marcus Eriksson edited comment on CASSANDRA-2749 at 11/7/11 7:47 PM:
---------------------------------------------------------------------

rebased after CASSANDRA-3464, renameSSTable code removed
                
      was (Author: krummas):
    rebased after 3464, renameSSTable code removed
                  
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 2749_backwards_compatible_v4_rebase1.patch

rebased after 3464, renameSSTable code removed
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153620#comment-13153620 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

Thanks a lot for taking care about this, Marcus!

I have tested your latest patch using stress and sstablemover and then stress (+ flush/scrub/snapshot) again for Standard (+ secondary indexes) and Super ColumnFamilies and everything worked just fine for me. Here are my last comments and I think after this gets done we are ready to push your changes:

1). a). Tool should be made more user-friendly by adding at least --help and add an option to move sstables back to the keyspace directory (as we have an two states in the config);
    b). I think it should delete old sstables after copy automatically or by asking user using --delete-old option?
    c). Move sstablemover tool the ./bin instead of ./tools/sstablemover.

2). LegacySSTableTest fails on my machine can you, please, check that?

3). I think we should extend comment in the config describing what should be done before it is safe to change to change the option e.g. "Note: before setting this option to 'true' you should move existing SSTable files to the separate directories manually by using ./bin/sstablemover tool, the same applies for 'false' state - ./bin/sstablemover could be used to restore original directory structure (Please use ./bin/sstablemover --help for help and information)."...  
 
Also please remove/implement remaining // TODO: comments.

                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124288#comment-13124288 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

Anyone got a suggestion on how to do the available disk space checking? Reason in that now we cannot simply check which dataFileLocation that has the most available disk space, we need to check the subdirs where files are actually written.

My naive first approach would be to simply change  DatabaseDescriptor.getDataFileLocationForTable(..) to take a column family name, which we ought to always know when calling that method (and a quick check seems to say that only compactions might prove a problem for this approach).

Comments?
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046093#comment-13046093 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

Ryan's idea sounds like the simplest way to get something "good enough" to me.

> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163570#comment-13163570 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

agree that the file structure is all over the place, that's why this patch was so incredibly painful to do

seems i only did the disk space checking in sub directories in the backwards compatible patch

i had on algorithm for detecting what kind of file it is in the old backwards compatible patch, it iterates over all data directories and figures out which data directory the file is in, then it knows the keyspace is the next part of the filename, and can check if there are files or directories in that directory.

but i agree, it is incredibly ugly, your modifications make it alot better, and yes, this would be a good time to do this properly, not like me just trying to do a minimal patch that "works".

regarding keyspaces in file names, sure, why not, guess having a header with this info in the file is out of the question, then the only meta data we have is the file name, right? A problem could be if we want to do CASSANDRA-1983 later, that would increase the file name length even more.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175460#comment-13175460 ] 

Pavel Yaskevich commented on CASSANDRA-2749:
--------------------------------------------

That my be the case :) I will re-test as part of the review anyway.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2749:
--------------------------------------

         Reviewer: xedin
    Fix Version/s: 1.1
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment:     (was: 0002-fix-unit-tests.patch)
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-2749.patch, 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121019#comment-13121019 ] 

Jonathan Ellis commented on CASSANDRA-2749:
-------------------------------------------

Yes, trunk is the right place for this.
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-2749:
---------------------------------------

    Attachment: 2749_backwards_compatible_v2.patch

Updated patch, working snapshots and incremental backups

also a bit of cleanups

btw, the patches are always against trunk, not against the older patches, is this the correct way to do it?
                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Marcus Eriksson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118867#comment-13118867 ] 

Marcus Eriksson commented on CASSANDRA-2749:
--------------------------------------------

agree, gonna look at how much pain it is to actually build though

wont need alot of downtime if it is done offline, can rsync files into the subdir right before stopping the node


                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-2749) fine-grained control over data directories

Posted by "Pavel Yaskevich (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich reassigned CASSANDRA-2749:
------------------------------------------

    Assignee: Marcus Eriksson
    
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch
>
>
> Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs.
> Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira