You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeremy Stribling <st...@nicira.com> on 2011/01/18 20:14:07 UTC
changing the replication level on the fly
Hi,
I've noticed in the new Cassandra 0.7.0 release that if I have a
keyspace with a replication level of 2, but only one Cassandra node, I
cannot insert anything into the system. Likely this was a bug in the
old release I was using (0.6.8 -- is there a JIRA describing this
problem?). However, this is a problem for our application, as we don't
want to have to predefine the number of nodes, but rather start with one
node, and add nodes as needed.
Ideally, we could start our system with one node, and be able to insert
data just on that one node. Then, when a second node is added, we can
start using that node to store replicas for the keyspace. I know that
0.7.0 has a new operation for updating keyspace properties like
replication level, but in the documentation there is some mention about
having to run manual repair operations after using it. My question is:
what happens if we do not run these repair operations?
Here's what I'd like to do:
1) Start with a single node with autobootstrap=false and replication
level=1.
2) Later, start a second node with autobootstrap=true and join it to the
first.
3) The application detects that there are now two nodes, and issues the
command to pump up the replication level to 2.
4) If it ever drops back down to one node, it will turn the replication
level down again.
If we do not do a repair, will all hell break lose, or will it just be
the case that data inserted when there was only one node will continue
to be unreplicated, but data inserted when there were two nodes will
have two replicas? Thanks,
Jeremy
Re: changing the replication level on the fly
Posted by Jeremy Stribling <st...@nicira.com>.
On 01/18/2011 11:36 AM, Edward Capriolo wrote:
> On Tue, Jan 18, 2011 at 2:14 PM, Jeremy Stribling<st...@nicira.com> wrote:
>
>> Hi,
>>
>> I've noticed in the new Cassandra 0.7.0 release that if I have a keyspace
>> with a replication level of 2, but only one Cassandra node, I cannot insert
>> anything into the system. Likely this was a bug in the old release I was
>> using (0.6.8 -- is there a JIRA describing this problem?). However, this is
>> a problem for our application, as we don't want to have to predefine the
>> number of nodes, but rather start with one node, and add nodes as needed.
>>
>> Ideally, we could start our system with one node, and be able to insert data
>> just on that one node. Then, when a second node is added, we can start
>> using that node to store replicas for the keyspace. I know that 0.7.0 has a
>> new operation for updating keyspace properties like replication level, but
>> in the documentation there is some mention about having to run manual repair
>> operations after using it. My question is: what happens if we do not run
>> these repair operations?
>>
>> Here's what I'd like to do:
>> 1) Start with a single node with autobootstrap=false and replication
>> level=1.
>> 2) Later, start a second node with autobootstrap=true and join it to the
>> first.
>> 3) The application detects that there are now two nodes, and issues the
>> command to pump up the replication level to 2.
>> 4) If it ever drops back down to one node, it will turn the replication
>> level down again.
>>
>> If we do not do a repair, will all hell break lose, or will it just be the
>> case that data inserted when there was only one node will continue to be
>> unreplicated, but data inserted when there were two nodes will have two
>> replicas? Thanks,
>>
>> Jeremy
>>
>>
>>
> If you up your replication Factor and do not repair this is what happens:
>
> READ.QUORUM -> This is safe. Over time all entries that are read will
> be fixed through read repair. Reads will return correct data.
> BUT data never read will never be copied to the new node.
> READ.ONE -> 50% of your reads will return correct data. 50% of your
> Reads will return NO data the first time (based on the server your
> read hits). Then they will be read repaired. Second read will return
> the correct data.
>
> You can extrapolate the complications caused be this if you are add 10
> or 15 nodes over time. You are never really sure if the data from the
> first node got replicated to the second, did the second get replicated
> to the third ? Brian hurting... CAP complicated enough...
>
Thanks. Are you referring only to data that was written at replication
factor 1, or any data?
Re: changing the replication level on the fly
Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Jan 18, 2011 at 2:14 PM, Jeremy Stribling <st...@nicira.com> wrote:
> Hi,
>
> I've noticed in the new Cassandra 0.7.0 release that if I have a keyspace
> with a replication level of 2, but only one Cassandra node, I cannot insert
> anything into the system. Likely this was a bug in the old release I was
> using (0.6.8 -- is there a JIRA describing this problem?). However, this is
> a problem for our application, as we don't want to have to predefine the
> number of nodes, but rather start with one node, and add nodes as needed.
>
> Ideally, we could start our system with one node, and be able to insert data
> just on that one node. Then, when a second node is added, we can start
> using that node to store replicas for the keyspace. I know that 0.7.0 has a
> new operation for updating keyspace properties like replication level, but
> in the documentation there is some mention about having to run manual repair
> operations after using it. My question is: what happens if we do not run
> these repair operations?
>
> Here's what I'd like to do:
> 1) Start with a single node with autobootstrap=false and replication
> level=1.
> 2) Later, start a second node with autobootstrap=true and join it to the
> first.
> 3) The application detects that there are now two nodes, and issues the
> command to pump up the replication level to 2.
> 4) If it ever drops back down to one node, it will turn the replication
> level down again.
>
> If we do not do a repair, will all hell break lose, or will it just be the
> case that data inserted when there was only one node will continue to be
> unreplicated, but data inserted when there were two nodes will have two
> replicas? Thanks,
>
> Jeremy
>
>
If you up your replication Factor and do not repair this is what happens:
READ.QUORUM -> This is safe. Over time all entries that are read will
be fixed through read repair. Reads will return correct data.
BUT data never read will never be copied to the new node.
READ.ONE -> 50% of your reads will return correct data. 50% of your
Reads will return NO data the first time (based on the server your
read hits). Then they will be read repaired. Second read will return
the correct data.
You can extrapolate the complications caused be this if you are add 10
or 15 nodes over time. You are never really sure if the data from the
first node got replicated to the second, did the second get replicated
to the third ? Brian hurting... CAP complicated enough...