You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alejandro Perez <sp...@indextank.com> on 2011/04/15 22:39:28 UTC

Schemas diverging while dynamically creating CF.

Hello,

We're testing cassandra for integration with indextank. In this first try,
we're creating one column family for each user. In practice, on the first
run and for the first few documents (a few 100s), a new CF is created, and a
document is immediately added to it. A few (up to 50) requests of this type
are issued in parallel (for different column families).

The end result, and quite repeatable, is having the cluster split with
different schema versions, and they never agree.

Any thoughts?


Thanks,

Spike.

-- 
Alejandro Perez
IndexTank

follow us @indextank <http://twitter.com/indextank> | read our
blog<http://blog.indextank.com/> | subscribe
our user mailing list <http://groups.google.com/group/indextank>

<http://blog.indextank.com/>

Re: Schemas diverging while dynamically creating CF.

Posted by aaron morton <aa...@thelastpickle.com>.
There is a known issue for concurrent schema migrations https://issues.apache.org/jira/browse/CASSANDRA-1391

Once they diverge the I think you can delete the schema by removing the necessary system files and leaving the data files in place, then re-creating the files. 

And yes, you should not be creating lots of column families they are not the same as tables. 

Aaron

On 16 Apr 2011, at 09:13, Alejandro Perez wrote:

> Thanks for the quick response!. I will reconsider the schema.
> 
> However, the problem troubles me somehow. How are schema changes supposed to be done? Should I serialize them, should I halt other cluster operations while I do the schema change? Is this a known problem with cassandra?
> 
> The other question, and I think the more important one for me now: how do I repair the cluster without loosing data once the schemas diverge? Right now the only way I have is erase all data and have the cluster start empty. Should this problem ever happen in production, it's important there's a way to recover the data.
> 
> On Fri, Apr 15, 2011 at 1:57 PM, Dan Hendry <da...@gmail.com> wrote:
> Uh... don’t create a column family per user. Column families are meant to be fairly static; conceptually equivalent to a table in a relational database. Why do you need (or even want) a CF per user? Reconsider your data model, a single column family with an inverted index for a ‘user’ column is probably more what you are looking for. Operationally, the fewer CFs the better.
> 
>  
> Dan
> 
>  
> From: Alejandro Perez [mailto:spike@indextank.com] 
> Sent: April-15-11 16:39
> To: user@cassandra.apache.org
> Cc: Support
> Subject: Schemas diverging while dynamically creating CF.
> 
>  
> Hello,
> 
>  
> We're testing cassandra for integration with indextank. In this first try, we're creating one column family for each user. In practice, on the first run and for the first few documents (a few 100s), a new CF is created, and a document is immediately added to it. A few (up to 50) requests of this type are issued in parallel (for different column families).
> 
>  
> The end result, and quite repeatable, is having the cluster split with different schema versions, and they never agree.
> 
>  
> Any thoughts?
> 
>  
>  
> Thanks,
> 
>  
> Spike.
> 
> 
> --
> 
> Alejandro Perez
> IndexTank
> 
> follow us @indextank | read our blog | subscribe our user mailing list
> 
> 
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.894 / Virus Database: 271.1.1/3574 - Release Date: 04/15/11 02:34:00
> 
> 
> 
> 
> -- 
> Alejandro Perez
> IndexTank
> 
> follow us @indextank | read our blog | subscribe our user mailing list
> 
> 


Re: Schemas diverging while dynamically creating CF.

Posted by Alejandro Perez <sp...@indextank.com>.
Thanks for the quick response!. I will reconsider the schema.

However, the problem troubles me somehow. How are schema changes supposed to
be done? Should I serialize them, should I halt other cluster operations
while I do the schema change? Is this a known problem with cassandra?

The other question, and I think the more important one for me now: how do I
repair the cluster without loosing data once the schemas diverge? Right now
the only way I have is erase all data and have the cluster start empty.
Should this problem ever happen in production, it's important there's a way
to recover the data.

On Fri, Apr 15, 2011 at 1:57 PM, Dan Hendry <da...@gmail.com>wrote:

> Uh... don’t create a column family per user. Column families are meant to
> be fairly static; conceptually equivalent to a table in a relational
> database. Why do you need (or even want) a CF per user? Reconsider your data
> model, a single column family with an inverted index for a ‘user’ column is
> probably more what you are looking for. Operationally, the fewer CFs the
> better.
>
>
>
> Dan
>
>
>
> *From:* Alejandro Perez [mailto:spike@indextank.com]
> *Sent:* April-15-11 16:39
> *To:* user@cassandra.apache.org
> *Cc:* Support
> *Subject:* Schemas diverging while dynamically creating CF.
>
>
>
> Hello,
>
>
>
> We're testing cassandra for integration with indextank. In this first try,
> we're creating one column family for each user. In practice, on the first
> run and for the first few documents (a few 100s), a new CF is created, and a
> document is immediately added to it. A few (up to 50) requests of this type
> are issued in parallel (for different column families).
>
>
>
> The end result, and quite repeatable, is having the cluster split with
> different schema versions, and they never agree.
>
>
>
> Any thoughts?
>
>
>
>
>
> Thanks,
>
>
>
> Spike.
>
>
> --
>
> Alejandro Perez
> IndexTank
>
> follow us @indextank <http://twitter.com/indextank> | read our blog<http://blog.indextank.com/> | subscribe
> our user mailing list <http://groups.google.com/group/indextank>
>
>
> <http://blog.indextank.com/>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.894 / Virus Database: 271.1.1/3574 - Release Date: 04/15/11
> 02:34:00
>



-- 
Alejandro Perez
IndexTank

follow us @indextank <http://twitter.com/indextank> | read our
blog<http://blog.indextank.com/> | subscribe
our user mailing list <http://groups.google.com/group/indextank>

<http://blog.indextank.com/>

RE: Schemas diverging while dynamically creating CF.

Posted by Dan Hendry <da...@gmail.com>.
Uh... don't create a column family per user. Column families are meant to be
fairly static; conceptually equivalent to a table in a relational database.
Why do you need (or even want) a CF per user? Reconsider your data model, a
single column family with an inverted index for a 'user' column is probably
more what you are looking for. Operationally, the fewer CFs the better.

 

Dan

 

From: Alejandro Perez [mailto:spike@indextank.com] 
Sent: April-15-11 16:39
To: user@cassandra.apache.org
Cc: Support
Subject: Schemas diverging while dynamically creating CF.

 

Hello,

 

We're testing cassandra for integration with indextank. In this first try,
we're creating one column family for each user. In practice, on the first
run and for the first few documents (a few 100s), a new CF is created, and a
document is immediately added to it. A few (up to 50) requests of this type
are issued in parallel (for different column families).

 

The end result, and quite repeatable, is having the cluster split with
different schema versions, and they never agree.

 

Any thoughts?

 

 

Thanks,

 

Spike.


-- 

Alejandro Perez
IndexTank

follow us @indextank <http://twitter.com/indextank>  | read our blog
<http://blog.indextank.com/>  | subscribe our user mailing list
<http://groups.google.com/group/indextank> 

 <http://blog.indextank.com/> 


No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.894 / Virus Database: 271.1.1/3574 - Release Date: 04/15/11
02:34:00