You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremy Hanna (JIRA)" <ji...@apache.org> on 2010/07/26 19:57:18 UTC

[jira] Commented: (CASSANDRA-1066) DatacenterShardStrategy needs enforceable and keyspace based RF

    [ https://issues.apache.org/jira/browse/CASSANDRA-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892392#action_12892392 ] 

Jeremy Hanna commented on CASSANDRA-1066:
-----------------------------------------

There was some discussion on whether the replication strategy should be part of the keyspace metadata.  I'll try to distill my thoughts on why I think it should in here.

Problem:
Currently the DatacenterShardStrategy uses configuration options found in a properties file.  Work has been done to reload this automatically.  However, what happens when 0.7 adds dynamic keyspaces?  A client can't just add a keyspace with DSS.  They would first have to update their DSS configuration file to include settings for the new keyspace.  Then they would refresh that configuration and create their keyspace.  That's pretty onerous for a client to have to do.

Also currently, the replication strategies are separate from the keyspace metadata, even though there is a 1:1 relationship between KSM and the replication strategy.  That results in various utility methods and special casing in StorageService and DatabaseDescriptor to handle these separate quasi singleton replication strategies.  For example, the special cases to init and clean replication strategies when setting and clearing table definitions in DatabaseDescriptor.  In the past we've done it this way because it's worked and because there was no state for a strategy.  Now there is - DSS configuration.

Solution:
I proposed making RS an instance variable of KSM.  It would do in a more direct way what we had previously been doing in a round about way - maintaining their 1:1 relationship more cleanly.  It's been said that the KSM should only contain only storage data.  Currently we already store the replication strategy class.  The configuration options are the only thing that would be added in this scenario.  When serializing and deserializing, we just store the keyspace name, class name, and configuration options (Map<String,String>).  This is immutable data.  The TokenMetadata and Snitch are just references to the current TM and Snitch.  Every time a KSM is deserialized, it just gets the current TM and Snitch along with the other info to create a new RS instance.

Alternatives?
Are there any alternatives to doing it this way?  We could possibly extend what we're doing with the external model for replication strategies so that they would include state.  That would make them external to the KSM but be specific for each KSM (removing the quasi-singleton behavior).  That would be less of a change, but seems more hackish to me.

I would be welcome to other alternatives.  I just think a dynamic/automate-able way to create keyspaces shouldn't need to be handled specially for those using the DSS in 0.7.

> DatacenterShardStrategy needs enforceable and keyspace based RF
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-1066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 1066-changes-patch.txt, 1066.txt
>
>
> Currently, the DatacenterShardStrategy reads in a properties file - datacenters.properties - to get a per-datacenter replication factor.  So any keyspace that is using the DSS in the cluster is using that same properties file to configure its replication factor.  The implementation doesn't take into account the per-keyspace replication factor, but it is assumed that the sum of all the datacenter RF values equals the per-keyspace replication value that is part of the keyspace metadata.
> It seems that an improvement could be two-fold:
> 1. Enforce the replication factor for the keyspace as always equal the sum of all the datacenter RF values.  Otherwise, if they aren't equal, bad things (tm) can happen.
> 2. Make the datacenter RF values part of the keyspace metadata rather than a global value.  Again, currently if any keyspace in the cluster is configured to use DSS, it will be using the global DC RF values found in the properties file.  An improvement could be to instead of having the properties file, configure that on a per keyspace basis.  That would make the cluster more multi-tenant friendly so it could be flexible with multiple keyspaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.