You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/07/04 22:11:47 UTC

[jira] Created: (CASSANDRA-276) use subdirectory-per-table for data files

use subdirectory-per-table for data files
-----------------------------------------

                 Key: CASSANDRA-276
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jonathan Ellis


it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731766#action_12731766 ] 

Jonathan Ellis commented on CASSANDRA-276:
------------------------------------------

looks great!

a couple things:

1)
    public static String getDataFileLocationForTable(String table)
    {
        String dataFileDirectory = dataFileDirectories_[currentIndex_] + File.separator + table;
        return dataFileDirectory;
    }

needs to rotate currentIndex like getCompaction or it won't rotate.

avoid creating redundant local vars like this, just return the String directly.

why do we need a separate method?  getCompaction... will do fine, or am I missing something? (feel free to rename it getDataFile... if you think that is more clear.)

2)
    public static String getCompactionFileLocationForTable(String table)
    {
        String[] dataDirectoryForTable = getAllDataFileLocationsForTable(table);
        String dataFileDirectory = dataDirectoryForTable[currentIndex_];
        currentIndex_ = (currentIndex_ + 1 )%dataDirectoryForTable.length ;
        return dataFileDirectory;
    }

this is a little confusing because there is a hidden assumption that all tables have the same numbers of directories.  if this ever changes you will risk indexoutofbounds here.  Better to just add table name to the next item from the global data directory list, like you did in getDataFile...

... that's it.

I do think the right thing to do is lazy directory creation (for efficiency when there are thousands of tables -- that's not an entirely hypothetical situation :) but I will make a separate ticket for that.

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-changes.patch, 0001-cleanup-the-patch-for-a-second-round.patch, 0001-Single-pacth-for-Cassandra-276.patch, 0002-Cassandra-276-no-WS-Diff.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Arin Sarkissian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arin Sarkissian updated CASSANDRA-276:
--------------------------------------

    Attachment: jbellis-comments.patch

Here's a patch with the comments from Jonathan Ellis.

There were zero usages for DD.getDataFileLocationForTable() so i renamed I renamed DD.getCompactionFileLocationForTable() to DD.getDataFileLocationForTable()

Also removed DD.getDataFileLocation() because it was never used

re: the indexoutofbounds possibility in getDataFileLocationForTable():
the call to getAllDataFileLocationsForTable(table) essentially just loops over dataFileDirectories_ and appends the table name as you suggested.



> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-changes.patch, 0001-cleanup-the-patch-for-a-second-round.patch, 0001-Single-pacth-for-Cassandra-276.patch, 0002-Cassandra-276-no-WS-Diff.patch, jbellis-comments.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731748#action_12731748 ] 

Michael Greene commented on CASSANDRA-276:
------------------------------------------

+1

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-changes.patch, 0001-cleanup-the-patch-for-a-second-round.patch, 0001-Single-pacth-for-Cassandra-276.patch, 0002-Cassandra-276-no-WS-Diff.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Arin Sarkissian (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731206#action_12731206 ] 

Arin Sarkissian commented on CASSANDRA-276:
-------------------------------------------

I got this working yesterday but ran into JUnit test issues which Sammy has since fixed.
Now the nosetests randomly fail... the same test may or may not pass on random runs.

Still looking into it

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Arin Sarkissian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arin Sarkissian updated CASSANDRA-276:
--------------------------------------

    Attachment: 0001-Single-pacth-for-Cassandra-276.patch

a patch for CASSANDRA-276

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-Single-pacth-for-Cassandra-276.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731673#action_12731673 ] 

Michael Greene commented on CASSANDRA-276:
------------------------------------------

Appear to be fixed in 0002-Cassandra-276-no-WS-Diff.patch:
  There's a bunch of trailing whitespace fixes which should be fixed in the final patches that get committed, as discussed on IRC.
  SSTable diff'd incorrectly - did you actually change something in here?


SequenceFile has an odd shoutout: +// HELLO

getColumnFamilyFromFileName can now be easily implemented using split, which is preferred in Sun's javadoc.

Can you wrap the debug logging in if (logger_.isDebugEnabled())?

Passes all unit tests, system tests, and my meager local test suite.

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-Single-pacth-for-Cassandra-276.patch, 0002-Cassandra-276-no-WS-Diff.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731229#action_12731229 ] 

Michael Greene commented on CASSANDRA-276:
------------------------------------------

I'd love to be able to help debug code if you have any available to post.

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Arin Sarkissian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arin Sarkissian updated CASSANDRA-276:
--------------------------------------

    Attachment: 0002-Cassandra-276-no-WS-Diff.patch

new patch without the ws changes

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-Single-pacth-for-Cassandra-276.patch, 0002-Cassandra-276-no-WS-Diff.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Arin Sarkissian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arin Sarkissian updated CASSANDRA-276:
--------------------------------------

    Attachment: 0001-cleanup-the-patch-for-a-second-round.patch

here's a new patch incorporating the changes Michael Greene requested 

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-cleanup-the-patch-for-a-second-round.patch, 0001-Single-pacth-for-Cassandra-276.patch, 0002-Cassandra-276-no-WS-Diff.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Arin Sarkissian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arin Sarkissian updated CASSANDRA-276:
--------------------------------------

    Attachment: 0001-changes.patch

hopefully the final patch

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-changes.patch, 0001-cleanup-the-patch-for-a-second-round.patch, 0001-Single-pacth-for-Cassandra-276.patch, 0002-Cassandra-276-no-WS-Diff.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Arin Sarkissian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arin Sarkissian reassigned CASSANDRA-276:
-----------------------------------------

    Assignee: Arin Sarkissian

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Arin Sarkissian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arin Sarkissian updated CASSANDRA-276:
--------------------------------------

    Comment: was deleted

(was: Here's a patch... enjoy)

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-Single-pacth-for-Cassandra-276.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-276) use subdirectory-per-table for data files

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731713#action_12731713 ] 

Michael Greene commented on CASSANDRA-276:
------------------------------------------

SSTable is still hollerin'

DatabaseDescriptor mixes tabs and spaces --> convert to spaces.

Otherwise, looks solid.

> use subdirectory-per-table for data files
> -----------------------------------------
>
>                 Key: CASSANDRA-276
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-276
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Arin Sarkissian
>         Attachments: 0001-cleanup-the-patch-for-a-second-round.patch, 0001-Single-pacth-for-Cassandra-276.patch, 0002-Cassandra-276-no-WS-Diff.patch
>
>
> it's a little silly to do this in the filename when the FS will give us a heirarchical structure for free.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.