You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Bosung Seo <bo...@brightcloud.com> on 2014/10/17 01:17:37 UTC

Cassandra Restore data from snapshots and Different Counts

I upgraded my Cassandra ring and restored data(copying snapshots) from the
old ring. I am currently running the nodetool repair.
I count the tables to check every rows is in the table, but counts have
different values.
It contains 571 rows, and counts are 500, 530, 501, and so on. Should I
wait until nodetool repair is done?

Re: Cassandra Restore data from snapshots and Different Counts

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Oct 22, 2014 at 7:58 AM, Li, George <gu...@pearson.com>
wrote:

> I assume that you are restoring snapshot data onto a new ring with the
> same topology (i.e. if the old ring has n nodes, your new ring has n nodes
> also). I discussed this a consultant from DataStax, and he told me that I
> need to make sure each new node in the new ring need to have the same token
> list as the corresponding old node in the old ring. For example, if you are
> restoring snapshot from old node 1 onto new node 1, you need to make sure
> new node 1's token list is the same as the token list of the old node 1.
> This can be done by the following main steps:
> 1. Run 'nodetool ring' on the old ring to find token list for each old
> node.
> 2. Stop Cassandra in each new node.
> 3. Modify new ring node 1's yaml file so 'initial_token' is the same as
> the token list of old node 1. Also, set auto_bootstrap to false.
>
>
For vnodes, you can use this handy one-liner to get a comma-delimited list
of tokens for the current node :

nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e
's/,$/\n/'

=Rob
http://twitter.com/rcolidba

Re: Cassandra Restore data from snapshots and Different Counts

Posted by "Li, George" <gu...@pearson.com>.
I assume that you are restoring snapshot data onto a new ring with the same
topology (i.e. if the old ring has n nodes, your new ring has n nodes
also). I discussed this a consultant from DataStax, and he told me that I
need to make sure each new node in the new ring need to have the same token
list as the corresponding old node in the old ring. For example, if you are
restoring snapshot from old node 1 onto new node 1, you need to make sure
new node 1's token list is the same as the token list of the old node 1.
This can be done by the following main steps:
1. Run 'nodetool ring' on the old ring to find token list for each old node.
2. Stop Cassandra in each new node.
3. Modify new ring node 1's yaml file so 'initial_token' is the same as the
token list of old node 1. Also, set auto_bootstrap to false.
4. After this is done, start each new node one by one with 2 minutes (not
sure if this is necessary but I was told that Cassandra may have issue if
you start all nodes at once) in between and install your database schema.
5. Copy over snapshot. I also restart all new nodes ones by one with 2
minutes in between afterwards. I am not sure if this restart is necessary
but I was being cautious.
6. Do a nodetool repair on the new ring.
I have used these steps many times and the count always come back identical.
Hope this helps.

George.

On Thu, Oct 16, 2014 at 6:10 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Oct 16, 2014 at 4:17 PM, Bosung Seo <bo...@brightcloud.com>
> wrote:
>
>> I upgraded my Cassandra ring and restored data(copying snapshots) from
>> the old ring. I am currently running the nodetool repair.
>> I count the tables to check every rows is in the table, but counts have
>> different values.
>> It contains 571 rows, and counts are 500, 530, 501, and so on. Should I
>> wait until nodetool repair is done?
>>
>
> Are you able to repro the miscount before the repair? What exact type of
> "count" are you doing?
>
> My conjecture is that the miscounts are probably being caused by the
> nodetool repair. I understand how perverse this statement is.
>
> =Rob
> http://twitter.com/rcolidba
>

Re: Cassandra Restore data from snapshots and Different Counts

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Oct 16, 2014 at 4:17 PM, Bosung Seo <bo...@brightcloud.com> wrote:

> I upgraded my Cassandra ring and restored data(copying snapshots) from the
> old ring. I am currently running the nodetool repair.
> I count the tables to check every rows is in the table, but counts have
> different values.
> It contains 571 rows, and counts are 500, 530, 501, and so on. Should I
> wait until nodetool repair is done?
>

Are you able to repro the miscount before the repair? What exact type of
"count" are you doing?

My conjecture is that the miscounts are probably being caused by the
nodetool repair. I understand how perverse this statement is.

=Rob
http://twitter.com/rcolidba