You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by aaron morton <aa...@thelastpickle.com> on 2013/04/01 01:58:52 UTC

Re: Lost data after expanding cluster c* 1.2.3-1

Please do not rely on colour in your emails, the best way to get your emails accepted by the Apache mail servers is to use plain text. 

> At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
What errors?

> I put for each of them seeds = A ip, and start each with two minutes intervals. 
When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.

> Now the cluster seem to work normally, but i can use the secondary for the moment, the queryanswer are random
run nodetool repair -pr on each node, let it finish before starting the next one. 
if you are using secondary indexes use nodetool rebuild_index to rebuild those. 
Add one node new node to the cluster and confirm everything is ok, then add the remaining ones. 

I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:

> Hi aaron,
> 
> Thanks for reply, i will try to explain what append exactly
> 
> I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with 
> this config --clustername myDSCcluster --totalnodes 4--version community
> 
> Two days after this cluster in production, i saw that the cluster was overload, I wanted to extend it by adding 3 another nodes.
> 
> I create a new cluster with 3 C* [D,E,F]  (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
> 
> And follow the documentation (http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the ring.
> I put for each of them seeds = A ip, and start each with two minutes intervals. 
> 
> At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
> 
> Datacenter: eu-west
> ===================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/
>> Moving
>> --  Address           Load       Tokens  Owns   Host ID                               Rack
>> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
>> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> 
> I saw that the 3 nodes have join the ring but they had no data, i put the website in maintenance and lauch a nodetool repair on
> the 3 new nodes, during 5 hours i see in opcenter the data streamed to the new nodes (very nice :))
> 
> During this time, i write a script to check if all members are present (relative to a copy of members in mysql).
> 
> After data streamed seems to be finish, but i'm not sure because nodetool compactionstats show pending task but nodetool netstats seems to be ok.
> 
> I ran my script to check if the data, but members are still missing.
> 
> I decide to roolback by running nodetool decommission node D, E, F
> 
> I re run my script, all seems to be ok but secondary index have strange behavior, 
> some time the row was returned some times no result.
> 
> the user kais can be retrieve using his key with cassandra-cli but if i use cqlsh :
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:mydatabase>Tracing on;
> When tracing is activate i have this error but not all time
> cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
> unsupported operand type(s) for /: 'NoneType' and 'float'
> 
> 
> NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF 3) on node D was replicated on E and F, that would seem strange because its 3 node was not correctly filled
> 
> Now the cluster seem to work normally, but i can use the secondary for the moment, the query answer are random
> 
> Thanks a lot for any help,
> Kais
> 
> 
> 
> 
> 
> 2013/3/31 aaron morton <aa...@thelastpickle.com>
> First thought is the new nodes were marked as seeds. 
> Next thought is check the logs for errors. 
> 
> You can always run a nodetool repair if you are concerned data is not where you think it should be. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
> 
>> Hi all,
>> 
>> I follow this tutorial for expanding a 4 c* cluster (production) and add 3 new nodes.
>> 
>> Datacenter: eu-west
>> ===================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address           Load       Tokens  Owns   Host ID                               Rack
>> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
>> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>> 
>> The data are not streamed.
>> 
>> Can any one help me, our web site is down.
>> 
>> Thanks a lot,
>> 
>> 
> 
>

Re: Lost data after expanding cluster c* 1.2.3-1

Posted by aaron morton <aa...@thelastpickle.com>.

Sorry can you repost the details of that issue including the CL you are using. 

Aaron

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 12:57 AM, Kais Ahmed <ka...@neteck-fr.com> wrote:

> Thanks aaron,
> 
> I feel that rebuilding indexes went well, but the result of my query (SELECT * FROM userdata WHERE login='kais';) is still emty.
> 
> INFO [Creating index: userdata.userdata_login_idx] 2013-03-30 01:16:33,110 SecondaryIndex.java (line 175) Submitting index build of userdata.userdata_login_idx
> INFO [Creating index: userdata.userdata_login_idx] 2013-03-30 01:34:11,667 SecondaryIndex.java (line 202) Index build of userdata.userdata_login_idx complete
> 
> Thanks,
> 
> 
> 2013/4/9 aaron morton <aa...@thelastpickle.com>
> Look in the logs for messages from the SecondaryIndexManager 
> 
> starts with "Submitting index build of"
> end with "Index build of"
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 7/04/2013, at 12:55 AM, Kais Ahmed <ka...@neteck-fr.com> wrote:
> 
>> hi aaron,
>> 
>> nodetool compactionstats on all nodes return 1 pending task :
>> 
>> ubuntu@app:~$ nodetool compactionstats host
>> pending tasks: 1
>> Active compaction remaining time :        n/a
>> 
>> The command nodetool rebuild_index was launched several days ago.
>> 
>> 2013/4/5 aaron morton <aa...@thelastpickle.com>
>>> but nothing's happening, how can i monitor the progress? and how can i know when it's finished?
>> 
>> check nodetool compacitonstats
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 4/04/2013, at 2:51 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>> 
>>> Hi aaron,
>>> 
>>> I ran the command "nodetool rebuild_index host keyspace cf" on all the nodes, in the log i see :
>>> 
>>> INFO [RMI TCP Connection(5422)-10.34.139.xxx] 2013-04-04 08:31:53,641 ColumnFamilyStore.java (line 558) User Requested secondary index re-build for ...
>>> 
>>> but nothing's happening, how can i monitor the progress? and how can i know when it's finished?
>>> 
>>> Thanks,
>>>  
>>> 
>>> 2013/4/2 aaron morton <aa...@thelastpickle.com>
>>>> The problem come from that i don't put  auto_boostrap to true for the new nodes, not in this documentation (http://www.datastax.com/docs/1.2/install/expand_ami)
>>> auto_bootstrap defaults to True if not specified in the yaml. 
>>> 
>>>> can i do that at any time, or when the cluster are not loaded
>>> Not sure what the question is. 
>>> Both those operations are online operations you can do while the node is processing requests. 
>>>  
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 1/04/2013, at 9:26 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>> 
>>>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
>>>> > What errors?
>>>> The errors was in my side in the application, not cassandra errors
>>>> 
>>>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>>>> > When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.
>>>> Thank you for that advice.
>>>> 
>>>> >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings.
>>>> The problem come from that i don't put  auto_boostrap to true for the new nodes, not in this documentation (http://www.datastax.com/docs/1.2/install/expand_ami)
>>>> 
>>>> >if you are using secondary indexes use nodetool rebuild_index to rebuild those.
>>>> can i do that at any time, or when the cluster are not loaded
>>>> 
>>>> Thanks aaron,
>>>> 
>>>> 2013/4/1 aaron morton <aa...@thelastpickle.com>
>>>> Please do not rely on colour in your emails, the best way to get your emails accepted by the Apache mail servers is to use plain text.
>>>> 
>>>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
>>>> What errors?
>>>> 
>>>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>>>> When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.
>>>> 
>>>> > Now the cluster seem to work normally, but i can use the secondary for the moment, the queryanswer are random
>>>> run nodetool repair -pr on each node, let it finish before starting the next one.
>>>> if you are using secondary indexes use nodetool rebuild_index to rebuild those.
>>>> Add one node new node to the cluster and confirm everything is ok, then add the remaining ones.
>>>> 
>>>> >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings.
>>>> 
>>>> Cheers
>>>> 
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Consultant
>>>> New Zealand
>>>> 
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>>> 
>>>> > Hi aaron,
>>>> >
>>>> > Thanks for reply, i will try to explain what append exactly
>>>> >
>>>> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
>>>> > this config --clustername myDSCcluster --totalnodes 4--version community
>>>> >
>>>> > Two days after this cluster in production, i saw that the cluster was overload, I wanted to extend it by adding 3 another nodes.
>>>> >
>>>> > I create a new cluster with 3 C* [D,E,F]  (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
>>>> >
>>>> > And follow the documentation (http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the ring.
>>>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>>>> >
>>>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
>>>> >
>>>> > Datacenter: eu-west
>>>> > ===================
>>>> > Status=Up/Down
>>>> > |/ State=Normal/Leaving/Joining/
>>>> >> Moving
>>>> >> --  Address           Load       Tokens  Owns   Host ID                               Rack
>>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>>> >
>>>> > I saw that the 3 nodes have join the ring but they had no data, i put the website in maintenance and lauch a nodetool repair on
>>>> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to the new nodes (very nice :))
>>>> >
>>>> > During this time, i write a script to check if all members are present (relative to a copy of members in mysql).
>>>> >
>>>> > After data streamed seems to be finish, but i'm not sure because nodetool compactionstats show pending task but nodetool netstats seems to be ok.
>>>> >
>>>> > I ran my script to check if the data, but members are still missing.
>>>> >
>>>> > I decide to roolback by running nodetool decommission node D, E, F
>>>> >
>>>> > I re run my script, all seems to be ok but secondary index have strange behavior,
>>>> > some time the row was returned some times no result.
>>>> >
>>>> > the user kais can be retrieve using his key with cassandra-cli but if i use cqlsh :
>>>> >
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> >
>>>> >  login
>>>> > ----------------
>>>> >  kais
>>>> >
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> >
>>>> >  login
>>>> > ----------------
>>>> >  kais
>>>> >
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> >
>>>> >  login
>>>> > ----------------
>>>> >  kais
>>>> >
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> >
>>>> >  login
>>>> > ----------------
>>>> >  kais
>>>> >
>>>> > cqlsh:mydatabase>Tracing on;
>>>> > When tracing is activate i have this error but not all time
>>>> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
>>>> > unsupported operand type(s) for /: 'NoneType' and 'float'
>>>> >
>>>> >
>>>> > NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF 3) on node D was replicated on E and F, that would seem strange because its 3 node was not correctly filled
>>>> >
>>>> > Now the cluster seem to work normally, but i can use the secondary for the moment, the query answer are random
>>>> >
>>>> > Thanks a lot for any help,
>>>> > Kais
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > 2013/3/31 aaron morton <aa...@thelastpickle.com>
>>>> > First thought is the new nodes were marked as seeds.
>>>> > Next thought is check the logs for errors.
>>>> >
>>>> > You can always run a nodetool repair if you are concerned data is not where you think it should be.
>>>> >
>>>> > Cheers
>>>> >
>>>> >
>>>> > -----------------
>>>> > Aaron Morton
>>>> > Freelance Cassandra Consultant
>>>> > New Zealand
>>>> >
>>>> > @aaronmorton
>>>> > http://www.thelastpickle.com
>>>> >
>>>> > On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>>> >
>>>> >> Hi all,
>>>> >>
>>>> >> I follow this tutorial for expanding a 4 c* cluster (production) and add 3 new nodes.
>>>> >>
>>>> >> Datacenter: eu-west
>>>> >> ===================
>>>> >> Status=Up/Down
>>>> >> |/ State=Normal/Leaving/Joining/Moving
>>>> >> --  Address           Load       Tokens  Owns   Host ID                               Rack
>>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>>> >>
>>>> >> The data are not streamed.
>>>> >>
>>>> >> Can any one help me, our web site is down.
>>>> >>
>>>> >> Thanks a lot,
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: Lost data after expanding cluster c* 1.2.3-1

Posted by Kais Ahmed <ka...@neteck-fr.com>.

Thanks aaron,

I feel that rebuilding indexes went well, but the result of my query
(SELECT * FROM userdata WHERE login='kais';) is still emty.

INFO [Creating index: userdata.userdata_login_idx] 2013-03-30 01:16:33,110
SecondaryIndex.java (line 175) Submitting index build of
userdata.userdata_login_idx
INFO [Creating index: userdata.userdata_login_idx] 2013-03-30 01:34:11,667
SecondaryIndex.java (line 202) Index build of userdata.userdata_login_idx
complete

Thanks,


2013/4/9 aaron morton <aa...@thelastpickle.com>

> Look in the logs for messages from the SecondaryIndexManager
>
> starts with "Submitting index build of"
> end with "Index build of"
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7/04/2013, at 12:55 AM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>
> hi aaron,
>
> nodetool compactionstats on all nodes return 1 pending task :
>
> ubuntu@app:~$ nodetool compactionstats host
> pending tasks: 1
> Active compaction remaining time :        n/a
>
> The command nodetool rebuild_index was launched several days ago.
>
> 2013/4/5 aaron morton <aa...@thelastpickle.com>
>
>> but nothing's happening, how can i monitor the progress? and how can i
>> know when it's finished?
>>
>>
>> check nodetool compacitonstats
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 4/04/2013, at 2:51 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>
>> Hi aaron,
>>
>> I ran the command "nodetool rebuild_index host keyspace cf" on all the
>> nodes, in the log i see :
>>
>> INFO [RMI TCP Connection(5422)-10.34.139.xxx] 2013-04-04 08:31:53,641
>> ColumnFamilyStore.java (line 558) User Requested secondary index re-build
>> for ...
>>
>> but nothing's happening, how can i monitor the progress? and how can i
>> know when it's finished?
>>
>> Thanks,
>>
>>
>> 2013/4/2 aaron morton <aa...@thelastpickle.com>
>>
>>> The problem come from that i don't put  auto_boostrap to true for the
>>> new nodes, not in this documentation (
>>> http://www.datastax.com/docs/1.2/install/expand_ami)
>>>
>>> auto_bootstrap defaults to True if not specified in the yaml.
>>>
>>> can i do that at any time, or when the cluster are not loaded
>>>
>>> Not sure what the question is.
>>> Both those operations are online operations you can do while the node is
>>> processing requests.
>>>
>>> Cheers
>>>
>>>    -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 1/04/2013, at 9:26 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>>
>>> > At this moment the errors started, we see that members and other data
>>> are gone, at this moment the nodetool status return (in red color the 3 new
>>> nodes)
>>> > What errors?
>>> The errors was in my side in the application, not cassandra errors
>>>
>>> > I put for each of them seeds = A ip, and start each with two minutes
>>> intervals.
>>> > When I'm making changes I tend to change a single node first, confirm
>>> everything is OK and then do a bulk change.
>>> Thank you for that advice.
>>>
>>> >I'm not sure what or why it went wrong, but that should get you to a
>>> stable place. If you have any problems keep an eye on the logs for errors
>>> or warnings.
>>> The problem come from that i don't put  auto_boostrap to true for the
>>> new nodes, not in this documentation (
>>> http://www.datastax.com/docs/1.2/install/expand_ami)
>>>
>>> >if you are using secondary indexes use nodetool rebuild_index to
>>> rebuild those.
>>> can i do that at any time, or when the cluster are not loaded
>>>
>>> Thanks aaron,
>>>
>>> 2013/4/1 aaron morton <aa...@thelastpickle.com>
>>>
>>>> Please do not rely on colour in your emails, the best way to get your
>>>> emails accepted by the Apache mail servers is to use plain text.
>>>>
>>>> > At this moment the errors started, we see that members and other data
>>>> are gone, at this moment the nodetool status return (in red color the 3 new
>>>> nodes)
>>>> What errors?
>>>>
>>>> > I put for each of them seeds = A ip, and start each with two minutes
>>>> intervals.
>>>> When I'm making changes I tend to change a single node first, confirm
>>>> everything is OK and then do a bulk change.
>>>>
>>>> > Now the cluster seem to work normally, but i can use the secondary
>>>> for the moment, the queryanswer are random
>>>> run nodetool repair -pr on each node, let it finish before starting the
>>>> next one.
>>>> if you are using secondary indexes use nodetool rebuild_index to
>>>> rebuild those.
>>>> Add one node new node to the cluster and confirm everything is ok, then
>>>> add the remaining ones.
>>>>
>>>> >I'm not sure what or why it went wrong, but that should get you to a
>>>> stable place. If you have any problems keep an eye on the logs for errors
>>>> or warnings.
>>>>
>>>> Cheers
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Consultant
>>>> New Zealand
>>>>
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>>>
>>>> > Hi aaron,
>>>> >
>>>> > Thanks for reply, i will try to explain what append exactly
>>>> >
>>>> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2
>>>> ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
>>>> > this config --clustername myDSCcluster --totalnodes 4--version
>>>> community
>>>> >
>>>> > Two days after this cluster in production, i saw that the cluster was
>>>> overload, I wanted to extend it by adding 3 another nodes.
>>>> >
>>>> > I create a new cluster with 3 C* [D,E,F]  (
>>>> https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
>>>> >
>>>> > And follow the documentation (
>>>> http://www.datastax.com/docs/1.2/install/expand_ami) for adding them
>>>> in the ring.
>>>> > I put for each of them seeds = A ip, and start each with two minutes
>>>> intervals.
>>>> >
>>>> > At this moment the errors started, we see that members and other data
>>>> are gone, at this moment the nodetool status return (in red color the 3 new
>>>> nodes)
>>>> >
>>>> > Datacenter: eu-west
>>>> > ===================
>>>> > Status=Up/Down
>>>> > |/ State=Normal/Leaving/Joining/
>>>> >> Moving
>>>> >> --  Address           Load       Tokens  Owns   Host ID
>>>>                   Rack
>>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>>>>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>>>>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>>>>  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>>>>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>>>>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>>>>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>>>>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>>> >
>>>> > I saw that the 3 nodes have join the ring but they had no data, i put
>>>> the website in maintenance and lauch a nodetool repair on
>>>> > the 3 new nodes, during 5 hours i see in opcenter the data streamed
>>>> to the new nodes (very nice :))
>>>> >
>>>> > During this time, i write a script to check if all members are
>>>> present (relative to a copy of members in mysql).
>>>> >
>>>> > After data streamed seems to be finish, but i'm not sure because
>>>> nodetool compactionstats show pending task but nodetool netstats seems to
>>>> be ok.
>>>> >
>>>> > I ran my script to check if the data, but members are still missing.
>>>> >
>>>> > I decide to roolback by running nodetool decommission node D, E, F
>>>> >
>>>> > I re run my script, all seems to be ok but secondary index have
>>>> strange behavior,
>>>> > some time the row was returned some times no result.
>>>> >
>>>> > the user kais can be retrieve using his key with cassandra-cli but if
>>>> i use cqlsh :
>>>> >
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> >
>>>> >  login
>>>> > ----------------
>>>> >  kais
>>>> >
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> //empty
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> >
>>>> >  login
>>>> > ----------------
>>>> >  kais
>>>> >
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> >
>>>> >  login
>>>> > ----------------
>>>> >  kais
>>>> >
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> //empty
>>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>>> >
>>>> >  login
>>>> > ----------------
>>>> >  kais
>>>> >
>>>> > cqlsh:mydatabase>Tracing on;
>>>> > When tracing is activate i have this error but not all time
>>>> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
>>>> > unsupported operand type(s) for /: 'NoneType' and 'float'
>>>> >
>>>> >
>>>> > NOTE : When the cluster contained 7 nodes, i see that my table
>>>> userdata (RF 3) on node D was replicated on E and F, that would seem
>>>> strange because its 3 node was not correctly filled
>>>> >
>>>> > Now the cluster seem to work normally, but i can use the secondary
>>>> for the moment, the query answer are random
>>>> >
>>>> > Thanks a lot for any help,
>>>> > Kais
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > 2013/3/31 aaron morton <aa...@thelastpickle.com>
>>>> > First thought is the new nodes were marked as seeds.
>>>> > Next thought is check the logs for errors.
>>>> >
>>>> > You can always run a nodetool repair if you are concerned data is not
>>>> where you think it should be.
>>>> >
>>>> > Cheers
>>>> >
>>>> >
>>>> > -----------------
>>>> > Aaron Morton
>>>> > Freelance Cassandra Consultant
>>>> > New Zealand
>>>> >
>>>> > @aaronmorton
>>>> > http://www.thelastpickle.com
>>>> >
>>>> > On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>>> >
>>>> >> Hi all,
>>>> >>
>>>> >> I follow this tutorial for expanding a 4 c* cluster (production) and
>>>> add 3 new nodes.
>>>> >>
>>>> >> Datacenter: eu-west
>>>> >> ===================
>>>> >> Status=Up/Down
>>>> >> |/ State=Normal/Leaving/Joining/Moving
>>>> >> --  Address           Load       Tokens  Owns   Host ID
>>>>                   Rack
>>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>>>>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>>>>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>>>>  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>>>>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>>>>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>>>>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>>>>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>>> >>
>>>> >> The data are not streamed.
>>>> >>
>>>> >> Can any one help me, our web site is down.
>>>> >>
>>>> >> Thanks a lot,
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Lost data after expanding cluster c* 1.2.3-1

Posted by aaron morton <aa...@thelastpickle.com>.

Look in the logs for messages from the SecondaryIndexManager 

starts with "Submitting index build of"
end with "Index build of"

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/04/2013, at 12:55 AM, Kais Ahmed <ka...@neteck-fr.com> wrote:

> hi aaron,
> 
> nodetool compactionstats on all nodes return 1 pending task :
> 
> ubuntu@app:~$ nodetool compactionstats host
> pending tasks: 1
> Active compaction remaining time :        n/a
> 
> The command nodetool rebuild_index was launched several days ago.
> 
> 2013/4/5 aaron morton <aa...@thelastpickle.com>
>> but nothing's happening, how can i monitor the progress? and how can i know when it's finished?
> 
> check nodetool compacitonstats
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 4/04/2013, at 2:51 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
> 
>> Hi aaron,
>> 
>> I ran the command "nodetool rebuild_index host keyspace cf" on all the nodes, in the log i see :
>> 
>> INFO [RMI TCP Connection(5422)-10.34.139.xxx] 2013-04-04 08:31:53,641 ColumnFamilyStore.java (line 558) User Requested secondary index re-build for ...
>> 
>> but nothing's happening, how can i monitor the progress? and how can i know when it's finished?
>> 
>> Thanks,
>>  
>> 
>> 2013/4/2 aaron morton <aa...@thelastpickle.com>
>>> The problem come from that i don't put  auto_boostrap to true for the new nodes, not in this documentation (http://www.datastax.com/docs/1.2/install/expand_ami)
>> auto_bootstrap defaults to True if not specified in the yaml. 
>> 
>>> can i do that at any time, or when the cluster are not loaded
>> Not sure what the question is. 
>> Both those operations are online operations you can do while the node is processing requests. 
>>  
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 1/04/2013, at 9:26 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>> 
>>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status  return (in red color the 3 new nodes)
>>> > What errors?
>>> The errors was in my side in the application, not cassandra errors
>>> 
>>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>>> > When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.
>>> Thank you for that advice.
>>> 
>>> >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings.
>>> The problem come from that i don't put  auto_boostrap to true for the new nodes, not in this documentation (http://www.datastax.com/docs/1.2/install/expand_ami)
>>> 
>>> >if you are using secondary indexes use nodetool rebuild_index to rebuild those.
>>> can i do that at any time, or when the cluster are not loaded
>>> 
>>> Thanks aaron,
>>> 
>>> 2013/4/1 aaron morton <aa...@thelastpickle.com>
>>> Please do not rely on colour in your emails, the best way to get your emails accepted by the Apache mail servers is to use plain text.
>>> 
>>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
>>> What errors?
>>> 
>>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>>> When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.
>>> 
>>> > Now the cluster seem to work normally, but i can use the secondary for the moment, the queryanswer are random
>>> run nodetool repair -pr on each node, let it finish before starting the next one.
>>> if you are using secondary indexes use nodetool rebuild_index to rebuild those.
>>> Add one node new node to the cluster and confirm everything is ok, then add the remaining ones.
>>> 
>>> >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings.
>>> 
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>> 
>>> > Hi aaron,
>>> >
>>> > Thanks for reply, i will try to explain what append exactly
>>> >
>>> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
>>> > this config --clustername myDSCcluster --totalnodes 4--version community
>>> >
>>> > Two days after this cluster in production, i saw that the cluster was overload, I wanted to extend it by adding 3 another nodes.
>>> >
>>> > I create a new cluster with 3 C* [D,E,F]  (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
>>> >
>>> > And follow the documentation (http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the ring.
>>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>>> >
>>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
>>> >
>>> > Datacenter: eu-west
>>> > ===================
>>> > Status=Up/Down
>>> > |/ State=Normal/Leaving/Joining/
>>> >> Moving
>>> >> --  Address           Load       Tokens  Owns   Host ID                               Rack
>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>> >
>>> > I saw that the 3 nodes have join the ring but they had no data, i put the website in maintenance and lauch a nodetool repair on
>>> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to the new nodes (very nice :))
>>> >
>>> > During this time, i write a script to check if all members are present (relative to a copy of members in mysql).
>>> >
>>> > After data streamed seems to be finish, but i'm not sure because nodetool compactionstats show pending task but nodetool netstats seems to be ok.
>>> >
>>> > I ran my script to check if the data, but members are still missing.
>>> >
>>> > I decide to roolback by running nodetool decommission node D, E, F
>>> >
>>> > I re run my script, all seems to be ok but secondary index have strange behavior,
>>> > some time the row was returned some times no result.
>>> >
>>> > the user kais can be retrieve using his key with cassandra-cli but if i use cqlsh :
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:mydatabase>Tracing on;
>>> > When tracing is activate i have this error but not all time
>>> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
>>> > unsupported operand type(s) for /: 'NoneType' and 'float'
>>> >
>>> >
>>> > NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF 3) on node D was replicated on E and F, that would seem strange because its 3 node was not correctly filled
>>> >
>>> > Now the cluster seem to work normally, but i can use the secondary for the moment, the query answer are random
>>> >
>>> > Thanks a lot for any help,
>>> > Kais
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > 2013/3/31 aaron morton <aa...@thelastpickle.com>
>>> > First thought is the new nodes were marked as seeds.
>>> > Next thought is check the logs for errors.
>>> >
>>> > You can always run a nodetool repair if you are concerned data is not where you think it should be.
>>> >
>>> > Cheers
>>> >
>>> >
>>> > -----------------
>>> > Aaron Morton
>>> > Freelance Cassandra Consultant
>>> > New Zealand
>>> >
>>> > @aaronmorton
>>> > http://www.thelastpickle.com
>>> >
>>> > On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> I follow this tutorial for expanding a 4 c* cluster (production) and add 3 new nodes.
>>> >>
>>> >> Datacenter: eu-west
>>> >> ===================
>>> >> Status=Up/Down
>>> >> |/ State=Normal/Leaving/Joining/Moving
>>> >> --  Address           Load       Tokens  Owns   Host ID                               Rack
>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>> >>
>>> >> The data are not streamed.
>>> >>
>>> >> Can any one help me, our web site is down.
>>> >>
>>> >> Thanks a lot,
>>> >>
>>> >>
>>> >
>>> >
>>> 
>>> 
>> 
>> 
> 
>

Re: Lost data after expanding cluster c* 1.2.3-1

Posted by Kais Ahmed <ka...@neteck-fr.com>.

hi aaron,

nodetool compactionstats on all nodes return 1 pending task :

ubuntu@app:~$ nodetool compactionstats host
pending tasks: 1
Active compaction remaining time :        n/a

The command nodetool rebuild_index was launched several days ago.

2013/4/5 aaron morton <aa...@thelastpickle.com>

> but nothing's happening, how can i monitor the progress? and how can i
> know when it's finished?
>
>
> check nodetool compacitonstats
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/04/2013, at 2:51 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>
> Hi aaron,
>
> I ran the command "nodetool rebuild_index host keyspace cf" on all the
> nodes, in the log i see :
>
> INFO [RMI TCP Connection(5422)-10.34.139.xxx] 2013-04-04 08:31:53,641
> ColumnFamilyStore.java (line 558) User Requested secondary index re-build
> for ...
>
> but nothing's happening, how can i monitor the progress? and how can i
> know when it's finished?
>
> Thanks,
>
>
> 2013/4/2 aaron morton <aa...@thelastpickle.com>
>
>> The problem come from that i don't put  auto_boostrap to true for the new
>> nodes, not in this documentation (
>> http://www.datastax.com/docs/1.2/install/expand_ami)
>>
>> auto_bootstrap defaults to True if not specified in the yaml.
>>
>> can i do that at any time, or when the cluster are not loaded
>>
>> Not sure what the question is.
>> Both those operations are online operations you can do while the node is
>> processing requests.
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 1/04/2013, at 9:26 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>
>> > At this moment the errors started, we see that members and other data
>> are gone, at this moment the nodetool status return (in red color the 3 new
>> nodes)
>> > What errors?
>> The errors was in my side in the application, not cassandra errors
>>
>> > I put for each of them seeds = A ip, and start each with two minutes
>> intervals.
>> > When I'm making changes I tend to change a single node first, confirm
>> everything is OK and then do a bulk change.
>> Thank you for that advice.
>>
>> >I'm not sure what or why it went wrong, but that should get you to a
>> stable place. If you have any problems keep an eye on the logs for errors
>> or warnings.
>> The problem come from that i don't put  auto_boostrap to true for the new
>> nodes, not in this documentation (
>> http://www.datastax.com/docs/1.2/install/expand_ami)
>>
>> >if you are using secondary indexes use nodetool rebuild_index to rebuild
>> those.
>> can i do that at any time, or when the cluster are not loaded
>>
>> Thanks aaron,
>>
>> 2013/4/1 aaron morton <aa...@thelastpickle.com>
>>
>>> Please do not rely on colour in your emails, the best way to get your
>>> emails accepted by the Apache mail servers is to use plain text.
>>>
>>> > At this moment the errors started, we see that members and other data
>>> are gone, at this moment the nodetool status return (in red color the 3 new
>>> nodes)
>>> What errors?
>>>
>>> > I put for each of them seeds = A ip, and start each with two minutes
>>> intervals.
>>> When I'm making changes I tend to change a single node first, confirm
>>> everything is OK and then do a bulk change.
>>>
>>> > Now the cluster seem to work normally, but i can use the secondary for
>>> the moment, the queryanswer are random
>>> run nodetool repair -pr on each node, let it finish before starting the
>>> next one.
>>> if you are using secondary indexes use nodetool rebuild_index to rebuild
>>> those.
>>> Add one node new node to the cluster and confirm everything is ok, then
>>> add the remaining ones.
>>>
>>> >I'm not sure what or why it went wrong, but that should get you to a
>>> stable place. If you have any problems keep an eye on the logs for errors
>>> or warnings.
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>>
>>> > Hi aaron,
>>> >
>>> > Thanks for reply, i will try to explain what append exactly
>>> >
>>> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2
>>> ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
>>> > this config --clustername myDSCcluster --totalnodes 4--version
>>> community
>>> >
>>> > Two days after this cluster in production, i saw that the cluster was
>>> overload, I wanted to extend it by adding 3 another nodes.
>>> >
>>> > I create a new cluster with 3 C* [D,E,F]  (
>>> https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
>>> >
>>> > And follow the documentation (
>>> http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in
>>> the ring.
>>> > I put for each of them seeds = A ip, and start each with two minutes
>>> intervals.
>>> >
>>> > At this moment the errors started, we see that members and other data
>>> are gone, at this moment the nodetool status return (in red color the 3 new
>>> nodes)
>>> >
>>> > Datacenter: eu-west
>>> > ===================
>>> > Status=Up/Down
>>> > |/ State=Normal/Leaving/Joining/
>>> >> Moving
>>> >> --  Address           Load       Tokens  Owns   Host ID
>>>                 Rack
>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>>>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>>>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>>>  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>>>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>>>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>>>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>>>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>> >
>>> > I saw that the 3 nodes have join the ring but they had no data, i put
>>> the website in maintenance and lauch a nodetool repair on
>>> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to
>>> the new nodes (very nice :))
>>> >
>>> > During this time, i write a script to check if all members are present
>>> (relative to a copy of members in mysql).
>>> >
>>> > After data streamed seems to be finish, but i'm not sure because
>>> nodetool compactionstats show pending task but nodetool netstats seems to
>>> be ok.
>>> >
>>> > I ran my script to check if the data, but members are still missing.
>>> >
>>> > I decide to roolback by running nodetool decommission node D, E, F
>>> >
>>> > I re run my script, all seems to be ok but secondary index have
>>> strange behavior,
>>> > some time the row was returned some times no result.
>>> >
>>> > the user kais can be retrieve using his key with cassandra-cli but if
>>> i use cqlsh :
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:mydatabase>Tracing on;
>>> > When tracing is activate i have this error but not all time
>>> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
>>> > unsupported operand type(s) for /: 'NoneType' and 'float'
>>> >
>>> >
>>> > NOTE : When the cluster contained 7 nodes, i see that my table
>>> userdata (RF 3) on node D was replicated on E and F, that would seem
>>> strange because its 3 node was not correctly filled
>>> >
>>> > Now the cluster seem to work normally, but i can use the secondary for
>>> the moment, the query answer are random
>>> >
>>> > Thanks a lot for any help,
>>> > Kais
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > 2013/3/31 aaron morton <aa...@thelastpickle.com>
>>> > First thought is the new nodes were marked as seeds.
>>> > Next thought is check the logs for errors.
>>> >
>>> > You can always run a nodetool repair if you are concerned data is not
>>> where you think it should be.
>>> >
>>> > Cheers
>>> >
>>> >
>>> > -----------------
>>> > Aaron Morton
>>> > Freelance Cassandra Consultant
>>> > New Zealand
>>> >
>>> > @aaronmorton
>>> > http://www.thelastpickle.com
>>> >
>>> > On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> I follow this tutorial for expanding a 4 c* cluster (production) and
>>> add 3 new nodes.
>>> >>
>>> >> Datacenter: eu-west
>>> >> ===================
>>> >> Status=Up/Down
>>> >> |/ State=Normal/Leaving/Joining/Moving
>>> >> --  Address           Load       Tokens  Owns   Host ID
>>>                 Rack
>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>>>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>>>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>>>  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>>>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>>>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>>>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>>>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>> >>
>>> >> The data are not streamed.
>>> >>
>>> >> Can any one help me, our web site is down.
>>> >>
>>> >> Thanks a lot,
>>> >>
>>> >>
>>> >
>>> >
>>>
>>>
>>
>>
>
>

Re: Lost data after expanding cluster c* 1.2.3-1

Posted by aaron morton <aa...@thelastpickle.com>.

> but nothing's happening, how can i monitor the progress? and how can i know when it's finished?

check nodetool compacitonstats

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/04/2013, at 2:51 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:

> Hi aaron,
> 
> I ran the command "nodetool rebuild_index host keyspace cf" on all the nodes, in the log i see :
> 
> INFO [RMI TCP Connection(5422)-10.34.139.xxx] 2013-04-04 08:31:53,641 ColumnFamilyStore.java (line 558) User Requested secondary index re-build for ...
> 
> but nothing's happening, how can i monitor the progress? and how can i know when it's finished?
> 
> Thanks,
>  
> 
> 2013/4/2 aaron morton <aa...@thelastpickle.com>
>> The problem come from that i don't put  auto_boostrap to true for the new nodes, not in this documentation (http://www.datastax.com/docs/1.2/install/expand_ami)
> auto_bootstrap defaults to True if not specified in the yaml. 
> 
>> can i do that at any time, or when the cluster are not loaded
> Not sure what the question is. 
> Both those operations are online operations you can do while the node is processing requests. 
>  
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 1/04/2013, at 9:26 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
> 
>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
>> > What errors?
>> The errors was in my side in the application, not cassandra errors
>> 
>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>> > When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.
>> Thank you for that advice.
>> 
>> >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings.
>> The problem come from that i don't put  auto_boostrap to true for the new nodes, not in this documentation (http://www.datastax.com/docs/1.2/install/expand_ami)
>> 
>> >if you are using secondary indexes use nodetool rebuild_index to rebuild those.
>> can i do that at any time, or when the cluster are not loaded
>> 
>> Thanks aaron,
>> 
>> 2013/4/1 aaron morton <aa...@thelastpickle.com>
>> Please do not rely on colour in your emails, the best way to get your emails accepted by the Apache mail servers is to use plain text.
>> 
>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
>> What errors?
>> 
>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>> When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.
>> 
>> > Now the cluster seem to work normally, but i can use the secondary for the moment, the queryanswer are random
>> run nodetool repair -pr on each node, let it finish before starting the next one.
>> if you are using secondary indexes use nodetool rebuild_index to rebuild those.
>> Add one node new node to the cluster and confirm everything is ok, then add the remaining ones.
>> 
>> >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>> 
>> > Hi aaron,
>> >
>> > Thanks for reply, i will try to explain what append exactly
>> >
>> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
>> > this config --clustername myDSCcluster --totalnodes 4--version community
>> >
>> > Two days after this cluster in production, i saw that the cluster was overload, I wanted to extend it by adding 3 another nodes.
>> >
>> > I create a new cluster with 3 C* [D,E,F]  (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
>> >
>> > And follow the documentation (http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the ring.
>> > I put for each of them seeds = A ip, and start each with two minutes intervals.
>> >
>> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
>> >
>> > Datacenter: eu-west
>> > ===================
>> > Status=Up/Down
>> > |/ State=Normal/Leaving/Joining/
>> >> Moving
>> >> --  Address           Load       Tokens  Owns   Host ID                               Rack
>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>> >
>> > I saw that the 3 nodes have join the ring but they had no data, i put the website in maintenance and lauch a nodetool repair on
>> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to the new nodes (very nice :))
>> >
>> > During this time, i write a script to check if all members are present (relative to a copy of members in mysql).
>> >
>> > After data streamed seems to be finish, but i'm not sure because nodetool compactionstats show pending task but nodetool netstats seems to be ok.
>> >
>> > I ran my script to check if the data, but members are still missing.
>> >
>> > I decide to roolback by running nodetool decommission node D, E, F
>> >
>> > I re run my script, all seems to be ok but secondary index have strange behavior,
>> > some time the row was returned some times no result.
>> >
>> > the user kais can be retrieve using his key with cassandra-cli but if i use cqlsh :
>> >
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>> >
>> >  login
>> > ----------------
>> >  kais
>> >
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>> >
>> >  login
>> > ----------------
>> >  kais
>> >
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>> >
>> >  login
>> > ----------------
>> >  kais
>> >
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>> >
>> >  login
>> > ----------------
>> >  kais
>> >
>> > cqlsh:mydatabase>Tracing on;
>> > When tracing is activate i have this error but not all time
>> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
>> > unsupported operand type(s) for /: 'NoneType' and 'float'
>> >
>> >
>> > NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF 3) on node D was replicated on E and F, that would seem strange because its 3 node was not correctly filled
>> >
>> > Now the cluster seem to work normally, but i can use the secondary for the moment, the query answer are random
>> >
>> > Thanks a lot for any help,
>> > Kais
>> >
>> >
>> >
>> >
>> >
>> > 2013/3/31 aaron morton <aa...@thelastpickle.com>
>> > First thought is the new nodes were marked as seeds.
>> > Next thought is check the logs for errors.
>> >
>> > You can always run a nodetool repair if you are concerned data is not where you think it should be.
>> >
>> > Cheers
>> >
>> >
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Consultant
>> > New Zealand
>> >
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>> >
>> >> Hi all,
>> >>
>> >> I follow this tutorial for expanding a 4 c* cluster (production) and add 3 new nodes.
>> >>
>> >> Datacenter: eu-west
>> >> ===================
>> >> Status=Up/Down
>> >> |/ State=Normal/Leaving/Joining/Moving
>> >> --  Address           Load       Tokens  Owns   Host ID                               Rack
>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>> >>
>> >> The data are not streamed.
>> >>
>> >> Can any one help me, our web site is down.
>> >>
>> >> Thanks a lot,
>> >>
>> >>
>> >
>> >
>> 
>> 
> 
>

Re: Lost data after expanding cluster c* 1.2.3-1

Posted by Kais Ahmed <ka...@neteck-fr.com>.

Hi aaron,

I ran the command "nodetool rebuild_index host keyspace cf" on all the
nodes, in the log i see :

INFO [RMI TCP Connection(5422)-10.34.139.xxx] 2013-04-04 08:31:53,641
ColumnFamilyStore.java (line 558) User Requested secondary index re-build
for ...

but nothing's happening, how can i monitor the progress? and how can i know
when it's finished?

Thanks,


2013/4/2 aaron morton <aa...@thelastpickle.com>

> The problem come from that i don't put  auto_boostrap to true for the new
> nodes, not in this documentation (
> http://www.datastax.com/docs/1.2/install/expand_ami)
>
> auto_bootstrap defaults to True if not specified in the yaml.
>
> can i do that at any time, or when the cluster are not loaded
>
> Not sure what the question is.
> Both those operations are online operations you can do while the node is
> processing requests.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/04/2013, at 9:26 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>
> > At this moment the errors started, we see that members and other data
> are gone, at this moment the nodetool status return (in red color the 3 new
> nodes)
> > What errors?
> The errors was in my side in the application, not cassandra errors
>
> > I put for each of them seeds = A ip, and start each with two minutes
> intervals.
> > When I'm making changes I tend to change a single node first, confirm
> everything is OK and then do a bulk change.
> Thank you for that advice.
>
> >I'm not sure what or why it went wrong, but that should get you to a
> stable place. If you have any problems keep an eye on the logs for errors
> or warnings.
> The problem come from that i don't put  auto_boostrap to true for the new
> nodes, not in this documentation (
> http://www.datastax.com/docs/1.2/install/expand_ami)
>
> >if you are using secondary indexes use nodetool rebuild_index to rebuild
> those.
> can i do that at any time, or when the cluster are not loaded
>
> Thanks aaron,
>
> 2013/4/1 aaron morton <aa...@thelastpickle.com>
>
>> Please do not rely on colour in your emails, the best way to get your
>> emails accepted by the Apache mail servers is to use plain text.
>>
>> > At this moment the errors started, we see that members and other data
>> are gone, at this moment the nodetool status return (in red color the 3 new
>> nodes)
>> What errors?
>>
>> > I put for each of them seeds = A ip, and start each with two minutes
>> intervals.
>> When I'm making changes I tend to change a single node first, confirm
>> everything is OK and then do a bulk change.
>>
>> > Now the cluster seem to work normally, but i can use the secondary for
>> the moment, the queryanswer are random
>> run nodetool repair -pr on each node, let it finish before starting the
>> next one.
>> if you are using secondary indexes use nodetool rebuild_index to rebuild
>> those.
>> Add one node new node to the cluster and confirm everything is ok, then
>> add the remaining ones.
>>
>> >I'm not sure what or why it went wrong, but that should get you to a
>> stable place. If you have any problems keep an eye on the logs for errors
>> or warnings.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>>
>> > Hi aaron,
>> >
>> > Thanks for reply, i will try to explain what append exactly
>> >
>> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2
>> ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
>> > this config --clustername myDSCcluster --totalnodes 4--version community
>> >
>> > Two days after this cluster in production, i saw that the cluster was
>> overload, I wanted to extend it by adding 3 another nodes.
>> >
>> > I create a new cluster with 3 C* [D,E,F]  (
>> https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
>> >
>> > And follow the documentation (
>> http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in
>> the ring.
>> > I put for each of them seeds = A ip, and start each with two minutes
>> intervals.
>> >
>> > At this moment the errors started, we see that members and other data
>> are gone, at this moment the nodetool status return (in red color the 3 new
>> nodes)
>> >
>> > Datacenter: eu-west
>> > ===================
>> > Status=Up/Down
>> > |/ State=Normal/Leaving/Joining/
>> >> Moving
>> >> --  Address           Load       Tokens  Owns   Host ID
>>                 Rack
>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>>  92af17c3-954a-4511-bc90-29a9657623e4  1b
>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>> >
>> > I saw that the 3 nodes have join the ring but they had no data, i put
>> the website in maintenance and lauch a nodetool repair on
>> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to
>> the new nodes (very nice :))
>> >
>> > During this time, i write a script to check if all members are present
>> (relative to a copy of members in mysql).
>> >
>> > After data streamed seems to be finish, but i'm not sure because
>> nodetool compactionstats show pending task but nodetool netstats seems to
>> be ok.
>> >
>> > I ran my script to check if the data, but members are still missing.
>> >
>> > I decide to roolback by running nodetool decommission node D, E, F
>> >
>> > I re run my script, all seems to be ok but secondary index have strange
>> behavior,
>> > some time the row was returned some times no result.
>> >
>> > the user kais can be retrieve using his key with cassandra-cli but if i
>> use cqlsh :
>> >
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>> >
>> >  login
>> > ----------------
>> >  kais
>> >
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>> >
>> >  login
>> > ----------------
>> >  kais
>> >
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>> >
>> >  login
>> > ----------------
>> >  kais
>> >
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>> >
>> >  login
>> > ----------------
>> >  kais
>> >
>> > cqlsh:mydatabase>Tracing on;
>> > When tracing is activate i have this error but not all time
>> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
>> > unsupported operand type(s) for /: 'NoneType' and 'float'
>> >
>> >
>> > NOTE : When the cluster contained 7 nodes, i see that my table userdata
>> (RF 3) on node D was replicated on E and F, that would seem strange because
>> its 3 node was not correctly filled
>> >
>> > Now the cluster seem to work normally, but i can use the secondary for
>> the moment, the query answer are random
>> >
>> > Thanks a lot for any help,
>> > Kais
>> >
>> >
>> >
>> >
>> >
>> > 2013/3/31 aaron morton <aa...@thelastpickle.com>
>> > First thought is the new nodes were marked as seeds.
>> > Next thought is check the logs for errors.
>> >
>> > You can always run a nodetool repair if you are concerned data is not
>> where you think it should be.
>> >
>> > Cheers
>> >
>> >
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Consultant
>> > New Zealand
>> >
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>> >
>> >> Hi all,
>> >>
>> >> I follow this tutorial for expanding a 4 c* cluster (production) and
>> add 3 new nodes.
>> >>
>> >> Datacenter: eu-west
>> >> ===================
>> >> Status=Up/Down
>> >> |/ State=Normal/Leaving/Joining/Moving
>> >> --  Address           Load       Tokens  Owns   Host ID
>>                 Rack
>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>>  92af17c3-954a-4511-bc90-29a9657623e4  1b
>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>> >>
>> >> The data are not streamed.
>> >>
>> >> Can any one help me, our web site is down.
>> >>
>> >> Thanks a lot,
>> >>
>> >>
>> >
>> >
>>
>>
>
>

Re: Lost data after expanding cluster c* 1.2.3-1

Posted by aaron morton <aa...@thelastpickle.com>.

> The problem come from that i don't put  auto_boostrap to true for the new nodes, not in this documentation (http://www.datastax.com/docs/1.2/install/expand_ami)
auto_bootstrap defaults to True if not specified in the yaml. 

> can i do that at any time, or when the cluster are not loaded
Not sure what the question is. 
Both those operations are online operations you can do while the node is processing requests. 
 
Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/04/2013, at 9:26 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:

> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
> > What errors?
> The errors was in my side in the application, not cassandra errors
> 
> > I put for each of them seeds = A ip, and start each with two minutes intervals.
> > When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.
> Thank you for that advice.
> 
> >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings.
> The problem come from that i don't put  auto_boostrap to true for the new nodes, not in this documentation (http://www.datastax.com/docs/1.2/install/expand_ami)
> 
> >if you are using secondary indexes use nodetool rebuild_index to rebuild those.
> can i do that at any time, or when the cluster are not loaded
> 
> Thanks aaron,
> 
> 2013/4/1 aaron morton <aa...@thelastpickle.com>
> Please do not rely on colour in your emails, the best way to get your emails accepted by the Apache mail servers is to use plain text.
> 
> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
> What errors?
> 
> > I put for each of them seeds = A ip, and start each with two minutes intervals.
> When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change.
> 
> > Now the cluster seem to work normally, but i can use the secondary for the moment, the queryanswer are random
> run nodetool repair -pr on each node, let it finish before starting the next one.
> if you are using secondary indexes use nodetool rebuild_index to rebuild those.
> Add one node new node to the cluster and confirm everything is ok, then add the remaining ones.
> 
> >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
> 
> > Hi aaron,
> >
> > Thanks for reply, i will try to explain what append exactly
> >
> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
> > this config --clustername myDSCcluster --totalnodes 4--version community
> >
> > Two days after this cluster in production, i saw that the cluster was overload, I wanted to extend it by adding 3 another nodes.
> >
> > I create a new cluster with 3 C* [D,E,F]  (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
> >
> > And follow the documentation (http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the ring.
> > I put for each of them seeds = A ip, and start each with two minutes intervals.
> >
> > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes)
> >
> > Datacenter: eu-west
> > ===================
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/
> >> Moving
> >> --  Address           Load       Tokens  Owns   Host ID                               Rack
> >> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> >> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> >> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
> >> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> >> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> >> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> >> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> >
> > I saw that the 3 nodes have join the ring but they had no data, i put the website in maintenance and lauch a nodetool repair on
> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to the new nodes (very nice :))
> >
> > During this time, i write a script to check if all members are present (relative to a copy of members in mysql).
> >
> > After data streamed seems to be finish, but i'm not sure because nodetool compactionstats show pending task but nodetool netstats seems to be ok.
> >
> > I ran my script to check if the data, but members are still missing.
> >
> > I decide to roolback by running nodetool decommission node D, E, F
> >
> > I re run my script, all seems to be ok but secondary index have strange behavior,
> > some time the row was returned some times no result.
> >
> > the user kais can be retrieve using his key with cassandra-cli but if i use cqlsh :
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:mydatabase>Tracing on;
> > When tracing is activate i have this error but not all time
> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
> > unsupported operand type(s) for /: 'NoneType' and 'float'
> >
> >
> > NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF 3) on node D was replicated on E and F, that would seem strange because its 3 node was not correctly filled
> >
> > Now the cluster seem to work normally, but i can use the secondary for the moment, the query answer are random
> >
> > Thanks a lot for any help,
> > Kais
> >
> >
> >
> >
> >
> > 2013/3/31 aaron morton <aa...@thelastpickle.com>
> > First thought is the new nodes were marked as seeds.
> > Next thought is check the logs for errors.
> >
> > You can always run a nodetool repair if you are concerned data is not where you think it should be.
> >
> > Cheers
> >
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> >
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
> >
> >> Hi all,
> >>
> >> I follow this tutorial for expanding a 4 c* cluster (production) and add 3 new nodes.
> >>
> >> Datacenter: eu-west
> >> ===================
> >> Status=Up/Down
> >> |/ State=Normal/Leaving/Joining/Moving
> >> --  Address           Load       Tokens  Owns   Host ID                               Rack
> >> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> >> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> >> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4  1b
> >> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> >> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> >> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> >> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> >>
> >> The data are not streamed.
> >>
> >> Can any one help me, our web site is down.
> >>
> >> Thanks a lot,
> >>
> >>
> >
> >
> 
>

Re: Lost data after expanding cluster c* 1.2.3-1

Posted by Kais Ahmed <ka...@neteck-fr.com>.

> At this moment the errors started, we see that members and other data are
gone, at this moment the nodetool status return (in red color the 3 new
nodes)
> What errors?
The errors was in my side in the application, not cassandra errors

> I put for each of them seeds = A ip, and start each with two minutes
intervals.
> When I'm making changes I tend to change a single node first, confirm
everything is OK and then do a bulk change.
Thank you for that advice.

>I'm not sure what or why it went wrong, but that should get you to a
stable place. If you have any problems keep an eye on the logs for errors
or warnings.
The problem come from that i don't put  auto_boostrap to true for the new
nodes, not in this documentation (
http://www.datastax.com/docs/1.2/install/expand_ami)

>if you are using secondary indexes use nodetool rebuild_index to rebuild
those.
can i do that at any time, or when the cluster are not loaded

Thanks aaron,

2013/4/1 aaron morton <aa...@thelastpickle.com>

> Please do not rely on colour in your emails, the best way to get your
> emails accepted by the Apache mail servers is to use plain text.
>
> > At this moment the errors started, we see that members and other data
> are gone, at this moment the nodetool status return (in red color the 3 new
> nodes)
> What errors?
>
> > I put for each of them seeds = A ip, and start each with two minutes
> intervals.
> When I'm making changes I tend to change a single node first, confirm
> everything is OK and then do a bulk change.
>
> > Now the cluster seem to work normally, but i can use the secondary for
> the moment, the queryanswer are random
> run nodetool repair -pr on each node, let it finish before starting the
> next one.
> if you are using secondary indexes use nodetool rebuild_index to rebuild
> those.
> Add one node new node to the cluster and confirm everything is ok, then
> add the remaining ones.
>
> >I'm not sure what or why it went wrong, but that should get you to a
> stable place. If you have any problems keep an eye on the logs for errors
> or warnings.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 31/03/2013, at 10:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
>
> > Hi aaron,
> >
> > Thanks for reply, i will try to explain what append exactly
> >
> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami
> (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
> > this config --clustername myDSCcluster --totalnodes 4--version community
> >
> > Two days after this cluster in production, i saw that the cluster was
> overload, I wanted to extend it by adding 3 another nodes.
> >
> > I create a new cluster with 3 C* [D,E,F]  (
> https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
> >
> > And follow the documentation (
> http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in
> the ring.
> > I put for each of them seeds = A ip, and start each with two minutes
> intervals.
> >
> > At this moment the errors started, we see that members and other data
> are gone, at this moment the nodetool status return (in red color the 3 new
> nodes)
> >
> > Datacenter: eu-west
> > ===================
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/
> >> Moving
> >> --  Address           Load       Tokens  Owns   Host ID
>               Rack
> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>  92af17c3-954a-4511-bc90-29a9657623e4  1b
> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> >
> > I saw that the 3 nodes have join the ring but they had no data, i put
> the website in maintenance and lauch a nodetool repair on
> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to
> the new nodes (very nice :))
> >
> > During this time, i write a script to check if all members are present
> (relative to a copy of members in mysql).
> >
> > After data streamed seems to be finish, but i'm not sure because
> nodetool compactionstats show pending task but nodetool netstats seems to
> be ok.
> >
> > I ran my script to check if the data, but members are still missing.
> >
> > I decide to roolback by running nodetool decommission node D, E, F
> >
> > I re run my script, all seems to be ok but secondary index have strange
> behavior,
> > some time the row was returned some times no result.
> >
> > the user kais can be retrieve using his key with cassandra-cli but if i
> use cqlsh :
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:mydatabase>Tracing on;
> > When tracing is activate i have this error but not all time
> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
> > unsupported operand type(s) for /: 'NoneType' and 'float'
> >
> >
> > NOTE : When the cluster contained 7 nodes, i see that my table userdata
> (RF 3) on node D was replicated on E and F, that would seem strange because
> its 3 node was not correctly filled
> >
> > Now the cluster seem to work normally, but i can use the secondary for
> the moment, the query answer are random
> >
> > Thanks a lot for any help,
> > Kais
> >
> >
> >
> >
> >
> > 2013/3/31 aaron morton <aa...@thelastpickle.com>
> > First thought is the new nodes were marked as seeds.
> > Next thought is check the logs for errors.
> >
> > You can always run a nodetool repair if you are concerned data is not
> where you think it should be.
> >
> > Cheers
> >
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> >
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 29/03/2013, at 8:01 PM, Kais Ahmed <ka...@neteck-fr.com> wrote:
> >
> >> Hi all,
> >>
> >> I follow this tutorial for expanding a 4 c* cluster (production) and
> add 3 new nodes.
> >>
> >> Datacenter: eu-west
> >> ===================
> >> Status=Up/Down
> >> |/ State=Normal/Leaving/Joining/Moving
> >> --  Address           Load       Tokens  Owns   Host ID
>               Rack
> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>  92af17c3-954a-4511-bc90-29a9657623e4  1b
> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> >>
> >> The data are not streamed.
> >>
> >> Can any one help me, our web site is down.
> >>
> >> Thanks a lot,
> >>
> >>
> >
> >
>
>