You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Arya Goudarzi (JIRA)" <ji...@apache.org> on 2010/07/29 22:31:20 UTC

[jira] Created: (CASSANDRA-1335) Add Consistency Level for Schema Creation Operations

Add Consistency Level for Schema Creation Operations
----------------------------------------------------

                 Key: CASSANDRA-1335
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1335
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.7 beta 1
         Environment: CentOS 5.2
Trunc
            Reporter: Arya Goudarzi


Currently when applications create Keyspaces and CFs dynamically, user at application level has to make 2-3 calls to Cassandra to verify consistency of the schema as follows for example:

1. system_add_column_family
2. check_schema_agreement (will return a checksum of schema definitions and the nodes agreeing on them, user has to count the result to see if there is one checksum with all nodes meaning agreed or multiple checksums which means disagreed;)

With clusters that have many application servers talking to them with high concurrency, it is possible for result from check_schema_agreement to be inconsistent in different nodes causing application misunderstanding of schema since application is no aware how the schema checksum is calculated. 

One solution that I've thought to add at the application level is to create locks using memcache on CF and KS creation operations so that many clients don't collide. However, I have to loop through check_schema_agreement and store the state and also do describe_keyspace, hence a 3rd call, since I am not sure how the checksum is calculated in order to verify I am not asking another client to create the same CF or KS. This potentially could fall into infinit loop if client calls fail and I have to bind to an application level timeout detection to I don't loop forever. 

I think it would highly make sense to have something like ConsistencyLevel added to schema creation operation to avoid users having to implement their locking and validation at their application level.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1335) Add Consistency Level for Schema Creation Operations

Posted by "Arya Goudarzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893843#action_12893843 ] 

Arya Goudarzi commented on CASSANDRA-1335:
------------------------------------------

I am waiting 1 second for the same client before sending write requests.

In my application, I need to create CFs inside a Keyspace on the fly, so there is a change that two servers at the same time load describe_keyspace and see that the CF_name1 they want to work with does not exist, hence try to create it which produces a race condition. Each of these servers may be talking to different Cassandra nodes in the cluster. In this case when one starts the creation, I put a lock in memcache with CF name to be the key for example. the second colliding server will check if lock exists on memcache. If it exists, then it tries to describe_keyspace to see if it has the definition. If does, it will check agreement and if there is only on entry, then it will remove the lock. 

The problem is that check_schema_agreement returns checksums of different version of schema. Let's say another server is trying to create CF_name2. CF_name1's schema disagreement is colliding with CF_name2's schema agreement. Although they produce different hashes in the result coming from check_schema_agreement but since the client does not know how that hash is calculated, it cannot decide which one is for which change. So, I am getting confused about how to use check_schema_agreement to make sure CF_name1 creation is recognized independently from CF_name2. I though about caching the agreement as lock value and check it against each other but the thing is what if another client tried to create CF_name3? So, I may end up with different version not knowing which one is which.

Is summary, my concern here is, do I have to worry about multiple clients trying to create the same CF on different nodes? I thought I do, and that is why I went through the solution above and though it would be nice if they had consistency level, but if not, please correct me. 

> Add Consistency Level for Schema Creation Operations
> ----------------------------------------------------
>
>                 Key: CASSANDRA-1335
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1335
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>         Environment: CentOS 5.2
> Trunc
>            Reporter: Arya Goudarzi
>
> Currently when applications create Keyspaces and CFs dynamically, user at application level has to make 2-3 calls to Cassandra to verify consistency of the schema as follows for example:
> 1. system_add_column_family
> 2. check_schema_agreement (will return a checksum of schema definitions and the nodes agreeing on them, user has to count the result to see if there is one checksum with all nodes meaning agreed or multiple checksums which means disagreed;)
> With clusters that have many application servers talking to them with high concurrency, it is possible for result from check_schema_agreement to be inconsistent in different nodes causing application misunderstanding of schema since application is no aware how the schema checksum is calculated. 
> One solution that I've thought to add at the application level is to create locks using memcache on CF and KS creation operations so that many clients don't collide. However, I have to loop through check_schema_agreement and store the state and also do describe_keyspace, hence a 3rd call, since I am not sure how the checksum is calculated in order to verify I am not asking another client to create the same CF or KS. This potentially could fall into infinit loop if client calls fail and I have to bind to an application level timeout detection to I don't loop forever. 
> I think it would highly make sense to have something like ConsistencyLevel added to schema creation operation to avoid users having to implement their locking and validation at their application level.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1335) Add Consistency Level for Schema Creation Operations

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893820#action_12893820 ] 

Gary Dusbabek commented on CASSANDRA-1335:
------------------------------------------

It wouldn't be that difficult to make the system_* methods optionally blocking (throw a timeout exception after rpc_timeout_in_ms).

Is your concern that a server under load will not be able to respond to the schema_agreement request and give a false indication that the update hasn't promulgated?  How long are you waiting before checking agreement?  Maybe you should just wait longer.  At the very least, you should wait for rpc_timeout_in_ms.

I kind of like the idea of applications handling this, or just letting the writes/updates fail (as they should when a keyspace/cf doesn't exist, etc.).

> Add Consistency Level for Schema Creation Operations
> ----------------------------------------------------
>
>                 Key: CASSANDRA-1335
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1335
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>         Environment: CentOS 5.2
> Trunc
>            Reporter: Arya Goudarzi
>
> Currently when applications create Keyspaces and CFs dynamically, user at application level has to make 2-3 calls to Cassandra to verify consistency of the schema as follows for example:
> 1. system_add_column_family
> 2. check_schema_agreement (will return a checksum of schema definitions and the nodes agreeing on them, user has to count the result to see if there is one checksum with all nodes meaning agreed or multiple checksums which means disagreed;)
> With clusters that have many application servers talking to them with high concurrency, it is possible for result from check_schema_agreement to be inconsistent in different nodes causing application misunderstanding of schema since application is no aware how the schema checksum is calculated. 
> One solution that I've thought to add at the application level is to create locks using memcache on CF and KS creation operations so that many clients don't collide. However, I have to loop through check_schema_agreement and store the state and also do describe_keyspace, hence a 3rd call, since I am not sure how the checksum is calculated in order to verify I am not asking another client to create the same CF or KS. This potentially could fall into infinit loop if client calls fail and I have to bind to an application level timeout detection to I don't loop forever. 
> I think it would highly make sense to have something like ConsistencyLevel added to schema creation operation to avoid users having to implement their locking and validation at their application level.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.