You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dikang Gu <di...@gmail.com> on 2011/08/05 14:35:27 UTC

How to solve this kind of schema disagreement...

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]


 three different schema versions in the cluster...

-- 
Dikang Gu

0086 - 18611140205

Re: How to solve this kind of schema disagreement...

Posted by aaron morton <aa...@thelastpickle.com>.

I don't have time to look into the reasons for that error, but that does not sound good. It kind of sounds like there are multiple migration chains out there in the cluster. This could come from apply changes to different nodes at the same time. 

Is this a prod system ? If not I would shut it down, wipe all the Schema and Migration SSTables and then apply the schema again one CF at a time (it will take time to read the data). 

If it's a prod system it may need some delicate surgery on the Migrations and Schema CF's. 

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Aug 2011, at 15:41, Dikang Gu wrote:

> And a lot of "not apply" logs.
> 
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,376 DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,376 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,379 DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,379 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,382 DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,382 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.
> 
> -- 
> Dikang Gu
> 0086 - 18611140205
> On Wednesday, August 10, 2011 at 11:35 AM, Dikang Gu wrote:
> 
>> Hi Aaron,
>> 
>> I set the log level to be DEBUG, and find a lot of forceFlush debug info in the log:
>> 
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
>> 
>> What does this mean?
>> 
>> Thanks.
>>  
>> 
>> -- 
>> Dikang Gu
>> 0086 - 18611140205
>> On Wednesday, August 10, 2011 at 6:42 AM, aaron morton wrote:
>> 
>>> um. There has got to be something stopping the migration from completing. 
>>> 
>>> Turn the logging up to DEBUG before starting and look for messages from MigrationManager.java
>>> 
>>> Provide all the log messages from Migration.java on the 1.27 node
>>> 
>>> Cheers
>>> 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 8 Aug 2011, at 15:52, Dikang Gu wrote:
>>> 
>>>> Hi Aaron, 
>>>> 
>>>> I repeat the whole procedure:
>>>> 
>>>> 1. kill the cassandra instance on 1.27.
>>>> 2. rm the data/system/Migrations-g-*
>>>> 3. rm the data/system/Schema-g-*
>>>> 4. bin/cassandra to start the cassandra.
>>>> 
>>>> Now, the migration seems stop and I do not find any error in the system.log yet.
>>>> 
>>>> The ring looks good:
>>>> [root@yun-phy2 apache-cassandra-0.8.1]# bin/nodetool -h192.168.1.27 -p8090 ring
>>>> Address         DC          Rack        Status State   Load            Owns    Token                                       
>>>>                                                                                127605887595351923798765477786913079296     
>>>> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB         25.00%  1                                           
>>>> 192.168.1.25    datacenter1 rack1       Up     Normal  8.54 GB         34.01%  57856537434773737201679995572503935972      
>>>> 192.168.1.27    datacenter1 rack1       Up     Normal  1.78 GB         24.28%  99165710459060760249270263771474737125      
>>>> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB         16.72%  127605887595351923798765477786913079296  
>>>> 
>>>> But the schema still does not correct:
>>>> Cluster Information:
>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>    Schema versions: 
>>>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
>>>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>> 
>>>> The 5a54ebd0-bd90-11e0-0000-9510c23fceff is same as last time…
>>>> 
>>>> And in the log, the last Migration.java log is:
>>>>  INFO [MigrationStage:1] 2011-08-08 11:41:30,293 Migration.java (line 116) Applying migration 5a54ebd0-bd90-11e0-0000-9510c23fceff Add keyspace: SimpleDB_4E38DAA64894A9146100000500000000rep strategy:SimpleStrategy{}durable_writes: true
>>>> 
>>>> Could you explain this? 
>>>> 
>>>> If I change the token given to 1.27 to another one, will it help?
>>>> 
>>>> Thanks.
>>>> 
>>>> -- 
>>>> Dikang Gu
>>>> 0086 - 18611140205
>>>> On Sunday, August 7, 2011 at 4:14 PM, aaron morton wrote:
>>>> 
>>>>> did you check the logs in 1.27 for errors ? 
>>>>> 
>>>>> Could you be seeing this ? https://issues.apache.org/jira/browse/CASSANDRA-2867
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Cassandra Developer
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>> 
>>>>> On 7 Aug 2011, at 16:24, Dikang Gu wrote:
>>>>> 
>>>>>> I restart both nodes, and deleted the shcema* and migration* and restarted them.
>>>>>> 
>>>>>> The current cluster looks like this:
>>>>>> [default@unknown] describe cluster;         
>>>>>> Cluster Information:
>>>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>    Schema versions: 
>>>>>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
>>>>>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>> 
>>>>>> the 1.28 looks good, and the 1.27 still can not get the schema agreement...
>>>>>> 
>>>>>> I have tried several times, even delete all the data on 1.27, and rejoin it as a new node, but it is still unhappy.
>>>>>> 
>>>>>> And the ring looks like this: 
>>>>>> 
>>>>>> Address         DC          Rack        Status State   Load            Owns    Token                                       
>>>>>>                                                                                127605887595351923798765477786913079296     
>>>>>> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB         25.00%  1                                           
>>>>>> 192.168.1.25    datacenter1 rack1       Up     Normal  8.55 GB         34.01%  57856537434773737201679995572503935972     
>>>>>> 192.168.1.27    datacenter1 rack1       Up     Joining 1.81 GB         24.28%  99165710459060760249270263771474737125      
>>>>>> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB         16.72%  127605887595351923798765477786913079296 
>>>>>> 
>>>>>> The 1.27 seems can not join the cluster, and it just hangs there...
>>>>>> 
>>>>>> Any suggestions?
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> 
>>>>>> On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aa...@thelastpickle.com> wrote:
>>>>>> After there restart you what was in the  logs for the 1.27 machine  from the Migration.java logger ? Some of the messages will start with "Applying migration"
>>>>>> 
>>>>>> You should have shut down both of the nodes, then deleted the schema* and migration* system sstables, then restarted one of them and watched to see if it got to schema agreement. 
>>>>>> 
>>>>>> Cheers
>>>>>>   
>>>>>> -----------------
>>>>>> Aaron Morton
>>>>>> Freelance Cassandra Developer
>>>>>> @aaronmorton
>>>>>> http://www.thelastpickle.com
>>>>>> 
>>>>>> On 6 Aug 2011, at 22:56, Dikang Gu wrote:
>>>>>> 
>>>>>>> I have tried this, but the schema still does not agree in the cluster:
>>>>>>> 
>>>>>>> [default@unknown] describe cluster;
>>>>>>> Cluster Information:
>>>>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>>    Schema versions: 
>>>>>>> 	UNREACHABLE: [192.168.1.28]
>>>>>>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>>>>>>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>>> 
>>>>>>> Any other suggestions to solve this?
>>>>>>> 
>>>>>>> Because I have some production data saved in the cassandra cluster, so I can not afford data lost...
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:
>>>>>>>> Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
>>>>>>>> 75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
>>>>>>>> remove the schema* and migration* sstables from both 192.168.1.28 and
>>>>>>>> 192.168.1.27
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2011/8/5 Dikang Gu <di...@gmail.com>:
>>>>>>>> > [default@unknown] describe cluster;
>>>>>>>> > Cluster Information:
>>>>>>>> >    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>>> >    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>>> >    Schema versions:
>>>>>>>> > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
>>>>>>>> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>>>>>>>> > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>>>> >
>>>>>>>> >  three different schema versions in the cluster...
>>>>>>>> > --
>>>>>>>> > Dikang Gu
>>>>>>>> > 0086 - 18611140205
>>>>>>>> >
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Dikang Gu
>>>>>>> 
>>>>>>> 0086 - 18611140205
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Dikang Gu
>>>>>> 
>>>>>> 0086 - 18611140205
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: How to solve this kind of schema disagreement...

Posted by Dikang Gu <di...@gmail.com>.

And a lot of "not apply" logs.

DEBUG [MigrationStage:1] 2011-08-10 11:36:29,376 DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from /192.168.1.9
DEBUG [MigrationStage:1] 2011-08-10 11:36:29,376 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.
DEBUG [MigrationStage:1] 2011-08-10 11:36:29,379 DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from /192.168.1.9
DEBUG [MigrationStage:1] 2011-08-10 11:36:29,379 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.
DEBUG [MigrationStage:1] 2011-08-10 11:36:29,382 DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from /192.168.1.9
DEBUG [MigrationStage:1] 2011-08-10 11:36:29,382 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.


-- 
Dikang Gu
0086 - 18611140205
On Wednesday, August 10, 2011 at 11:35 AM, Dikang Gu wrote: 
> Hi Aaron,
> 
> I set the log level to be DEBUG, and find a lot of forceFlush debug info in the log:
> 
> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
> 
> What does this mean?
> 
> Thanks.
> 
> 
> -- 
> Dikang Gu
> 0086 - 18611140205
> On Wednesday, August 10, 2011 at 6:42 AM, aaron morton wrote:
> > um. There has got to be something stopping the migration from completing. 
> > 
> > Turn the logging up to DEBUG before starting and look for messages from MigrationManager.java
> > 
> > Provide all the log messages from Migration.java on the 1.27 node
> > 
> > Cheers
> > 
> > 
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> > 
> > 
> > 
> > 
> > 
> > On 8 Aug 2011, at 15:52, Dikang Gu wrote:
> > > Hi Aaron, 
> > > 
> > > I repeat the whole procedure:
> > > 
> > > 1. kill the cassandra instance on 1.27.
> > > 2. rm the data/system/Migrations-g-*
> > > 3. rm the data/system/Schema-g-*
> > > 4. bin/cassandra to start the cassandra.
> > > 
> > > Now, the migration seems stop and I do not find any error in the system.log yet.
> > > 
> > > The ring looks good:
> > > [root@yun-phy2 apache-cassandra-0.8.1]# bin/nodetool -h192.168.1.27 -p8090 ring
> > > Address  DC Rack Status State  Load Owns Token 
> > > 127605887595351923798765477786913079296 
> > > 192.168.1.28 datacenter1 rack1  Up  Normal 8.38 GB  25.00% 1 
> > > 192.168.1.25 datacenter1 rack1  Up  Normal 8.54 GB  34.01% 57856537434773737201679995572503935972 
> > > 192.168.1.27 datacenter1 rack1  Up Normal 1.78 GB  24.28% 99165710459060760249270263771474737125 
> > > 192.168.1.9  datacenter1 rack1  Up  Normal 8.75 GB  16.72% 127605887595351923798765477786913079296 
> > > 
> > > 
> > > But the schema still does not correct:
> > > Cluster Information:
> > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > Schema versions: 
> > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
> > > 5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> > > 
> > > 
> > > The 5a54ebd0-bd90-11e0-0000-9510c23fceff is same as last time…
> > > 
> > > And in the log, the last Migration.java log is:
> > > INFO [MigrationStage:1] 2011-08-08 11:41:30,293 Migration.java (line 116) Applying migration 5a54ebd0-bd90-11e0-0000-9510c23fceff Add keyspace: SimpleDB_4E38DAA64894A9146100000500000000rep strategy:SimpleStrategy{}durable_writes: true
> > > 
> > > Could you explain this? 
> > > 
> > > If I change the token given to 1.27 to another one, will it help?
> > > 
> > > Thanks.
> > > -- 
> > > Dikang Gu
> > > 0086 - 18611140205
> > > On Sunday, August 7, 2011 at 4:14 PM, aaron morton wrote:
> > > > did you check the logs in 1.27 for errors ? 
> > > > 
> > > > Could you be seeing this ? https://issues.apache.org/jira/browse/CASSANDRA-2867
> > > > 
> > > > Cheers
> > > > 
> > > > -----------------
> > > > Aaron Morton
> > > > Freelance Cassandra Developer
> > > > @aaronmorton
> > > > http://www.thelastpickle.com
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > On 7 Aug 2011, at 16:24, Dikang Gu wrote:
> > > > > I restart both nodes, and deleted the shcema* and migration* and restarted them.
> > > > > 
> > > > > The current cluster looks like this:
> > > > > [default@unknown] describe cluster; 
> > > > > Cluster Information:
> > > > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > > > Schema versions: 
> > > > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
> > > > > 5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> > > > > 
> > > > > 
> > > > > the 1.28 looks good, and the 1.27 still can not get the schema agreement...
> > > > > 
> > > > > I have tried several times, even delete all the data on 1.27, and rejoin it as a new node, but it is still unhappy. 
> > > > > 
> > > > > And the ring looks like this: 
> > > > > 
> > > > > Address  DC Rack Status State  Load Owns Token 
> > > > > 127605887595351923798765477786913079296 
> > > > > 192.168.1.28 datacenter1 rack1  Up  Normal 8.38 GB  25.00% 1 
> > > > > 192.168.1.25 datacenter1 rack1  Up  Normal 8.55 GB  34.01% 57856537434773737201679995572503935972 
> > > > > 192.168.1.27 datacenter1 rack1  Up Joining 1.81 GB  24.28% 99165710459060760249270263771474737125 
> > > > >  192.168.1.9  datacenter1 rack1  Up  Normal 8.75 GB  16.72% 127605887595351923798765477786913079296 
> > > > > 
> > > > > 
> > > > > The 1.27 seems can not join the cluster, and it just hangs there... 
> > > > > 
> > > > > Any suggestions?
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > 
> > > > > On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aa...@thelastpickle.com> wrote:
> > > > > > After there restart you what was in the logs for the 1.27 machine from the Migration.java logger ? Some of the messages will start with "Applying migration" 
> > > > > > 
> > > > > > You should have shut down both of the nodes, then deleted the schema* and migration* system sstables, then restarted one of them and watched to see if it got to schema agreement. 
> > > > > > 
> > > > > >  Cheers
> > > > > > 
> > > > > > -----------------
> > > > > > Aaron Morton
> > > > > > Freelance Cassandra Developer
> > > > > > @aaronmorton
> > > > > > http://www.thelastpickle.com
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 6 Aug 2011, at 22:56, Dikang Gu wrote:
> > > > > > > I have tried this, but the schema still does not agree in the cluster:
> > > > > > > 
> > > > > > > [default@unknown] describe cluster; 
> > > > > > > Cluster Information:
> > > > > > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > > > > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > > > > > Schema versions: 
> > > > > > > UNREACHABLE: [192.168.1.28]
> > > > > > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> > > > > > > 5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> > > > > > > 
> > > > > > > Any other suggestions to solve this?
> > > > > > > 
> > > > > > > Because I have some production data saved in the cassandra cluster, so I can not afford data lost... 
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:
> > > > > > > >  Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
> > > > > > > >  75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
> > > > > > > >  remove the schema* and migration* sstables from both 192.168.1.28 and
> > > > > > > >  192.168.1.27
> > > > > > > > 
> > > > > > > > 
> > > > > > > >  2011/8/5 Dikang Gu <di...@gmail.com>:
> > > > > > > > > [default@unknown] describe cluster;
> > > > > > > > > Cluster Information:
> > > > > > > > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > > > > > > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > > > > > > > Schema versions:
> > > > > > > > > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
> > > > > > > > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> > > > > > > > > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
> > > > > > > > >
> > > > > > > > > three different schema versions in the cluster...
> > > > > > > > > --
> > > > > > > > > Dikang Gu
> > > > > > > > > 0086 - 18611140205
> > > > > > > > >
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > -- 
> > > > > > > Dikang Gu
> > > > > > > 
> > > > > > > 0086 - 18611140205
> > > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > -- 
> > > > > Dikang Gu
> > > > > 
> > > > > 0086 - 18611140205
> > > > > 
> > > > 
> > > 
> > 
>

Re: How to solve this kind of schema disagreement...

Posted by Dikang Gu <di...@gmail.com>.

Hi Aaron,

I set the log level to be DEBUG, and find a lot of forceFlush debug info in the log:

DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean
DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush requested but everything is clean

What does this mean?

Thanks.


-- 
Dikang Gu
0086 - 18611140205
On Wednesday, August 10, 2011 at 6:42 AM, aaron morton wrote: 
> um. There has got to be something stopping the migration from completing. 
> 
> Turn the logging up to DEBUG before starting and look for messages from MigrationManager.java
> 
> Provide all the log messages from Migration.java on the 1.27 node
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> 
> 
> 
> 
> On 8 Aug 2011, at 15:52, Dikang Gu wrote:
> > Hi Aaron, 
> > 
> > I repeat the whole procedure:
> > 
> > 1. kill the cassandra instance on 1.27.
> > 2. rm the data/system/Migrations-g-*
> > 3. rm the data/system/Schema-g-*
> > 4. bin/cassandra to start the cassandra.
> > 
> > Now, the migration seems stop and I do not find any error in the system.log yet.
> > 
> > The ring looks good:
> > [root@yun-phy2 apache-cassandra-0.8.1]# bin/nodetool -h192.168.1.27 -p8090 ring
> > Address  DC Rack Status State  Load Owns Token 
> > 127605887595351923798765477786913079296 
> > 192.168.1.28 datacenter1 rack1  Up  Normal 8.38 GB  25.00% 1 
> > 192.168.1.25 datacenter1 rack1  Up  Normal 8.54 GB  34.01% 57856537434773737201679995572503935972 
> > 192.168.1.27 datacenter1 rack1  Up Normal 1.78 GB  24.28% 99165710459060760249270263771474737125 
> > 192.168.1.9  datacenter1 rack1  Up  Normal 8.75 GB  16.72% 127605887595351923798765477786913079296 
> > 
> > 
> > But the schema still does not correct:
> > Cluster Information:
> > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > Schema versions: 
> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
> > 5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> > 
> > 
> > The 5a54ebd0-bd90-11e0-0000-9510c23fceff is same as last time…
> > 
> > And in the log, the last Migration.java log is:
> > INFO [MigrationStage:1] 2011-08-08 11:41:30,293 Migration.java (line 116) Applying migration 5a54ebd0-bd90-11e0-0000-9510c23fceff Add keyspace: SimpleDB_4E38DAA64894A9146100000500000000rep strategy:SimpleStrategy{}durable_writes: true
> > 
> > Could you explain this? 
> > 
> > If I change the token given to 1.27 to another one, will it help?
> > 
> > Thanks.
> > -- 
> > Dikang Gu
> > 0086 - 18611140205
> > On Sunday, August 7, 2011 at 4:14 PM, aaron morton wrote:
> > > did you check the logs in 1.27 for errors ? 
> > > 
> > > Could you be seeing this ? https://issues.apache.org/jira/browse/CASSANDRA-2867
> > > 
> > > Cheers
> > > 
> > > -----------------
> > > Aaron Morton
> > > Freelance Cassandra Developer
> > > @aaronmorton
> > > http://www.thelastpickle.com
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On 7 Aug 2011, at 16:24, Dikang Gu wrote:
> > > > I restart both nodes, and deleted the shcema* and migration* and restarted them.
> > > > 
> > > > The current cluster looks like this:
> > > > [default@unknown] describe cluster; 
> > > > Cluster Information:
> > > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > > Schema versions: 
> > > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
> > > > 5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> > > > 
> > > > 
> > > > the 1.28 looks good, and the 1.27 still can not get the schema agreement...
> > > > 
> > > > I have tried several times, even delete all the data on 1.27, and rejoin it as a new node, but it is still unhappy. 
> > > > 
> > > > And the ring looks like this: 
> > > > 
> > > > Address  DC Rack Status State  Load Owns Token 
> > > > 127605887595351923798765477786913079296 
> > > > 192.168.1.28 datacenter1 rack1  Up  Normal 8.38 GB  25.00% 1 
> > > > 192.168.1.25 datacenter1 rack1  Up  Normal 8.55 GB  34.01% 57856537434773737201679995572503935972 
> > > > 192.168.1.27 datacenter1 rack1  Up Joining 1.81 GB  24.28% 99165710459060760249270263771474737125 
> > > >  192.168.1.9  datacenter1 rack1  Up  Normal 8.75 GB  16.72% 127605887595351923798765477786913079296 
> > > > 
> > > > 
> > > > The 1.27 seems can not join the cluster, and it just hangs there... 
> > > > 
> > > > Any suggestions?
> > > > 
> > > > Thanks.
> > > > 
> > > > 
> > > > On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aa...@thelastpickle.com> wrote:
> > > > > After there restart you what was in the logs for the 1.27 machine from the Migration.java logger ? Some of the messages will start with "Applying migration" 
> > > > > 
> > > > > You should have shut down both of the nodes, then deleted the schema* and migration* system sstables, then restarted one of them and watched to see if it got to schema agreement. 
> > > > > 
> > > > >  Cheers
> > > > > 
> > > > > -----------------
> > > > > Aaron Morton
> > > > > Freelance Cassandra Developer
> > > > > @aaronmorton
> > > > > http://www.thelastpickle.com
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > On 6 Aug 2011, at 22:56, Dikang Gu wrote:
> > > > > > I have tried this, but the schema still does not agree in the cluster:
> > > > > > 
> > > > > > [default@unknown] describe cluster; 
> > > > > > Cluster Information:
> > > > > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > > > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > > > > Schema versions: 
> > > > > > UNREACHABLE: [192.168.1.28]
> > > > > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> > > > > > 5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> > > > > > 
> > > > > > Any other suggestions to solve this?
> > > > > > 
> > > > > > Because I have some production data saved in the cassandra cluster, so I can not afford data lost... 
> > > > > > 
> > > > > > Thanks.
> > > > > > On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:
> > > > > > >  Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
> > > > > > >  75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
> > > > > > >  remove the schema* and migration* sstables from both 192.168.1.28 and
> > > > > > >  192.168.1.27
> > > > > > > 
> > > > > > > 
> > > > > > >  2011/8/5 Dikang Gu <di...@gmail.com>:
> > > > > > > > [default@unknown] describe cluster;
> > > > > > > > Cluster Information:
> > > > > > > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > > > > > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > > > > > > Schema versions:
> > > > > > > > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
> > > > > > > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> > > > > > > > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
> > > > > > > >
> > > > > > > > three different schema versions in the cluster...
> > > > > > > > --
> > > > > > > > Dikang Gu
> > > > > > > > 0086 - 18611140205
> > > > > > > >
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > Dikang Gu
> > > > > > 
> > > > > > 0086 - 18611140205
> > > > > > 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > Dikang Gu
> > > > 
> > > > 0086 - 18611140205
> > > > 
> > > 
> > 
>

Re: How to solve this kind of schema disagreement...

Posted by aaron morton <aa...@thelastpickle.com>.

um. There has got to be something stopping the migration from completing. 

Turn the logging up to DEBUG before starting and look for messages from MigrationManager.java

Provide all the log messages from Migration.java on the 1.27 node

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Aug 2011, at 15:52, Dikang Gu wrote:

> Hi Aaron, 
> 
> I repeat the whole procedure:
> 
> 1. kill the cassandra instance on 1.27.
> 2. rm the data/system/Migrations-g-*
> 3. rm the data/system/Schema-g-*
> 4. bin/cassandra to start the cassandra.
> 
> Now, the migration seems stop and I do not find any error in the system.log yet.
> 
> The ring looks good:
> [root@yun-phy2 apache-cassandra-0.8.1]# bin/nodetool -h192.168.1.27 -p8090 ring
> Address         DC          Rack        Status State   Load            Owns    Token                                       
>                                                                                127605887595351923798765477786913079296     
> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB         25.00%  1                                           
> 192.168.1.25    datacenter1 rack1       Up     Normal  8.54 GB         34.01%  57856537434773737201679995572503935972      
> 192.168.1.27    datacenter1 rack1       Up     Normal  1.78 GB         24.28%  99165710459060760249270263771474737125      
> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB         16.72%  127605887595351923798765477786913079296  
> 
> But the schema still does not correct:
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions: 
> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> 
> The 5a54ebd0-bd90-11e0-0000-9510c23fceff is same as last time…
> 
> And in the log, the last Migration.java log is:
>  INFO [MigrationStage:1] 2011-08-08 11:41:30,293 Migration.java (line 116) Applying migration 5a54ebd0-bd90-11e0-0000-9510c23fceff Add keyspace: SimpleDB_4E38DAA64894A9146100000500000000rep strategy:SimpleStrategy{}durable_writes: true
> 
> Could you explain this? 
> 
> If I change the token given to 1.27 to another one, will it help?
> 
> Thanks.
> 
> -- 
> Dikang Gu
> 0086 - 18611140205
> On Sunday, August 7, 2011 at 4:14 PM, aaron morton wrote:
> 
>> did you check the logs in 1.27 for errors ? 
>> 
>> Could you be seeing this ? https://issues.apache.org/jira/browse/CASSANDRA-2867
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 7 Aug 2011, at 16:24, Dikang Gu wrote:
>> 
>>> I restart both nodes, and deleted the shcema* and migration* and restarted them.
>>> 
>>> The current cluster looks like this:
>>> [default@unknown] describe cluster;         
>>> Cluster Information:
>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>    Schema versions: 
>>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
>>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>> 
>>> the 1.28 looks good, and the 1.27 still can not get the schema agreement...
>>> 
>>> I have tried several times, even delete all the data on 1.27, and rejoin it as a new node, but it is still unhappy.
>>> 
>>> And the ring looks like this: 
>>> 
>>> Address         DC          Rack        Status State   Load            Owns    Token                                       
>>>                                                                                127605887595351923798765477786913079296     
>>> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB         25.00%  1                                           
>>> 192.168.1.25    datacenter1 rack1       Up     Normal  8.55 GB         34.01%  57856537434773737201679995572503935972      
>>> 192.168.1.27    datacenter1 rack1       Up     Joining 1.81 GB         24.28%  99165710459060760249270263771474737125      
>>> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB         16.72%  127605887595351923798765477786913079296 
>>> 
>>> The 1.27 seems can not join the cluster, and it just hangs there...
>>> 
>>> Any suggestions?
>>> 
>>> Thanks.
>>> 
>>> 
>>> On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aa...@thelastpickle.com> wrote:
>>> After there restart you what was in the  logs for the 1.27 machine  from the Migration.java logger ? Some of the messages will start with "Applying migration"
>>> 
>>> You should have shut down both of the nodes, then deleted the schema* and migration* system sstables, then restarted one of them and watched to see if it got to schema agreement. 
>>> 
>>> Cheers
>>>   
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 6 Aug 2011, at 22:56, Dikang Gu wrote:
>>> 
>>>> I have tried this, but the schema still does not agree in the cluster:
>>>> 
>>>> [default@unknown] describe cluster;
>>>> Cluster Information:
>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>    Schema versions: 
>>>> 	UNREACHABLE: [192.168.1.28]
>>>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>>>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>> 
>>>> Any other suggestions to solve this?
>>>> 
>>>> Because I have some production data saved in the cassandra cluster, so I can not afford data lost...
>>>> 
>>>> Thanks.
>>>> 
>>>> On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:
>>>>> Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
>>>>> 75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
>>>>> remove the schema* and migration* sstables from both 192.168.1.28 and
>>>>> 192.168.1.27
>>>>> 
>>>>> 
>>>>> 2011/8/5 Dikang Gu <di...@gmail.com>:
>>>>> > [default@unknown] describe cluster;
>>>>> > Cluster Information:
>>>>> >    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>> >    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>> >    Schema versions:
>>>>> > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
>>>>> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>>>>> > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>> >
>>>>> >  three different schema versions in the cluster...
>>>>> > --
>>>>> > Dikang Gu
>>>>> > 0086 - 18611140205
>>>>> >
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Dikang Gu
>>>> 
>>>> 0086 - 18611140205
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Dikang Gu
>>> 
>>> 0086 - 18611140205
>>> 
>> 
>

Re: How to solve this kind of schema disagreement...

Posted by Dikang Gu <di...@gmail.com>.

Hi Aaron, 

I repeat the whole procedure:

1. kill the cassandra instance on 1.27.
2. rm the data/system/Migrations-g-*
3. rm the data/system/Schema-g-*
4. bin/cassandra to start the cassandra.

Now, the migration seems stop and I do not find any error in the system.log yet.

The ring looks good:
[root@yun-phy2 apache-cassandra-0.8.1]# bin/nodetool -h192.168.1.27 -p8090 ring
Address  DC Rack Status State  Load Owns Token 
127605887595351923798765477786913079296 
192.168.1.28 datacenter1 rack1  Up  Normal 8.38 GB  25.00% 1 
192.168.1.25 datacenter1 rack1  Up  Normal 8.54 GB  34.01% 57856537434773737201679995572503935972 
192.168.1.27 datacenter1 rack1  Up Normal 1.78 GB  24.28% 99165710459060760249270263771474737125 
192.168.1.9  datacenter1 rack1  Up  Normal 8.75 GB  16.72% 127605887595351923798765477786913079296 


But the schema still does not correct:
Cluster Information:
Snitch: org.apache.cassandra.locator.SimpleSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions: 
75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]


The 5a54ebd0-bd90-11e0-0000-9510c23fceff is same as last time…

And in the log, the last Migration.java log is:
INFO [MigrationStage:1] 2011-08-08 11:41:30,293 Migration.java (line 116) Applying migration 5a54ebd0-bd90-11e0-0000-9510c23fceff Add keyspace: SimpleDB_4E38DAA64894A9146100000500000000rep strategy:SimpleStrategy{}durable_writes: true

Could you explain this? 

If I change the token given to 1.27 to another one, will it help?

Thanks.
-- 
Dikang Gu
0086 - 18611140205
On Sunday, August 7, 2011 at 4:14 PM, aaron morton wrote: 
> did you check the logs in 1.27 for errors ? 
> 
> Could you be seeing this ? https://issues.apache.org/jira/browse/CASSANDRA-2867
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> 
> 
> 
> 
> On 7 Aug 2011, at 16:24, Dikang Gu wrote:
> > I restart both nodes, and deleted the shcema* and migration* and restarted them.
> > 
> > The current cluster looks like this:
> > [default@unknown] describe cluster; 
> > Cluster Information:
> > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > Schema versions: 
> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
> > 5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> > 
> > 
> > the 1.28 looks good, and the 1.27 still can not get the schema agreement...
> > 
> > I have tried several times, even delete all the data on 1.27, and rejoin it as a new node, but it is still unhappy. 
> > 
> > And the ring looks like this: 
> > 
> > Address  DC Rack Status State  Load Owns Token 
> > 127605887595351923798765477786913079296 
> > 192.168.1.28 datacenter1 rack1  Up  Normal 8.38 GB  25.00% 1 
> > 192.168.1.25 datacenter1 rack1  Up  Normal 8.55 GB  34.01% 57856537434773737201679995572503935972 
> > 192.168.1.27 datacenter1 rack1  Up Joining 1.81 GB  24.28% 99165710459060760249270263771474737125 
> >  192.168.1.9  datacenter1 rack1  Up  Normal 8.75 GB  16.72% 127605887595351923798765477786913079296 
> > 
> > 
> > The 1.27 seems can not join the cluster, and it just hangs there... 
> > 
> > Any suggestions?
> > 
> > Thanks.
> > 
> > 
> > On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aa...@thelastpickle.com> wrote:
> > > After there restart you what was in the logs for the 1.27 machine from the Migration.java logger ? Some of the messages will start with "Applying migration" 
> > > 
> > > You should have shut down both of the nodes, then deleted the schema* and migration* system sstables, then restarted one of them and watched to see if it got to schema agreement. 
> > > 
> > >  Cheers
> > > 
> > > -----------------
> > > Aaron Morton
> > > Freelance Cassandra Developer
> > > @aaronmorton
> > > http://www.thelastpickle.com
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On 6 Aug 2011, at 22:56, Dikang Gu wrote:
> > > > I have tried this, but the schema still does not agree in the cluster:
> > > > 
> > > > [default@unknown] describe cluster; 
> > > > Cluster Information:
> > > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > > Schema versions: 
> > > > UNREACHABLE: [192.168.1.28]
> > > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> > > > 5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> > > > 
> > > > Any other suggestions to solve this?
> > > > 
> > > > Because I have some production data saved in the cassandra cluster, so I can not afford data lost... 
> > > > 
> > > > Thanks.
> > > > On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:
> > > > >  Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
> > > > >  75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
> > > > >  remove the schema* and migration* sstables from both 192.168.1.28 and
> > > > >  192.168.1.27
> > > > > 
> > > > > 
> > > > >  2011/8/5 Dikang Gu <di...@gmail.com>:
> > > > > > [default@unknown] describe cluster;
> > > > > > Cluster Information:
> > > > > > Snitch: org.apache.cassandra.locator.SimpleSnitch
> > > > > > Partitioner: org.apache.cassandra.dht.RandomPartitioner
> > > > > > Schema versions:
> > > > > > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
> > > > > > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> > > > > > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
> > > > > >
> > > > > > three different schema versions in the cluster...
> > > > > > --
> > > > > > Dikang Gu
> > > > > > 0086 - 18611140205
> > > > > >
> > > > > 
> > > > 
> > > > 
> > > > -- 
> > > > Dikang Gu
> > > > 
> > > > 0086 - 18611140205
> > > > 
> > 
> > 
> > 
> > -- 
> > Dikang Gu
> > 
> > 0086 - 18611140205
> > 
>

Re: How to solve this kind of schema disagreement...

Posted by aaron morton <aa...@thelastpickle.com>.

did you check the logs in 1.27 for errors ? 

Could you be seeing this ? https://issues.apache.org/jira/browse/CASSANDRA-2867

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7 Aug 2011, at 16:24, Dikang Gu wrote:

> I restart both nodes, and deleted the shcema* and migration* and restarted them.
> 
> The current cluster looks like this:
> [default@unknown] describe cluster;         
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions: 
> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> 
> the 1.28 looks good, and the 1.27 still can not get the schema agreement...
> 
> I have tried several times, even delete all the data on 1.27, and rejoin it as a new node, but it is still unhappy.
> 
> And the ring looks like this: 
> 
> Address         DC          Rack        Status State   Load            Owns    Token                                       
>                                                                                127605887595351923798765477786913079296     
> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB         25.00%  1                                           
> 192.168.1.25    datacenter1 rack1       Up     Normal  8.55 GB         34.01%  57856537434773737201679995572503935972      
> 192.168.1.27    datacenter1 rack1       Up     Joining 1.81 GB         24.28%  99165710459060760249270263771474737125      
> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB         16.72%  127605887595351923798765477786913079296 
> 
> The 1.27 seems can not join the cluster, and it just hangs there...
> 
> Any suggestions?
> 
> Thanks.
> 
> 
> On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aa...@thelastpickle.com> wrote:
> After there restart you what was in the  logs for the 1.27 machine  from the Migration.java logger ? Some of the messages will start with "Applying migration"
> 
> You should have shut down both of the nodes, then deleted the schema* and migration* system sstables, then restarted one of them and watched to see if it got to schema agreement. 
> 
> Cheers
>   
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 6 Aug 2011, at 22:56, Dikang Gu wrote:
> 
>> I have tried this, but the schema still does not agree in the cluster:
>> 
>> [default@unknown] describe cluster;
>> Cluster Information:
>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>    Schema versions: 
>> 	UNREACHABLE: [192.168.1.28]
>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>> 
>> Any other suggestions to solve this?
>> 
>> Because I have some production data saved in the cassandra cluster, so I can not afford data lost...
>> 
>> Thanks.
>> 
>> On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:
>> Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
>> 75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
>> remove the schema* and migration* sstables from both 192.168.1.28 and
>> 192.168.1.27
>> 
>> 
>> 2011/8/5 Dikang Gu <di...@gmail.com>:
>> > [default@unknown] describe cluster;
>> > Cluster Information:
>> >    Snitch: org.apache.cassandra.locator.SimpleSnitch
>> >    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>> >    Schema versions:
>> > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
>> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>> > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
>> >
>> >  three different schema versions in the cluster...
>> > --
>> > Dikang Gu
>> > 0086 - 18611140205
>> >
>> 
>> 
>> 
>> -- 
>> Dikang Gu
>> 
>> 0086 - 18611140205
>> 
> 
> 
> 
> 
> -- 
> Dikang Gu
> 
> 0086 - 18611140205
>

Re: How to solve this kind of schema disagreement...

Posted by Dikang Gu <di...@gmail.com>.

I restart both nodes, and deleted the shcema* and migration* and restarted
them.

The current cluster looks like this:
[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9,
192.168.1.25]
5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]

the 1.28 looks good, and the 1.27 still can not get the schema agreement...

I have tried several times, even delete all the data on 1.27, and rejoin it
as a new node, but it is still unhappy.

And the ring looks like this:

Address         DC          Rack        Status State   Load            Owns
   Token

   127605887595351923798765477786913079296
192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB
25.00%  1
192.168.1.25    datacenter1 rack1       Up     Normal  8.55 GB
34.01%  57856537434773737201679995572503935972
192.168.1.27    datacenter1 rack1       Up     Joining 1.81 GB
24.28%  99165710459060760249270263771474737125
192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB
16.72%  127605887595351923798765477786913079296

The 1.27 seems can not join the cluster, and it just hangs there...

Any suggestions?

Thanks.


On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aa...@thelastpickle.com>wrote:

> After there restart you what was in the  logs for the 1.27 machine  from
> the Migration.java logger ? Some of the messages will start with "Applying
> migration"
>
> You should have shut down both of the nodes, then deleted the schema* and
> migration* system sstables, then restarted one of them and watched to see if
> it got to schema agreement.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6 Aug 2011, at 22:56, Dikang Gu wrote:
>
> I have tried this, but the schema still does not agree in the cluster:
>
> [default@unknown] describe cluster;
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions:
> UNREACHABLE: [192.168.1.28]
> 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>  5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>
> Any other suggestions to solve this?
>
> Because I have some production data saved in the cassandra cluster, so I
> can not afford data lost...
>
> Thanks.
>
> On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:
>
>> Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
>> 75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
>> remove the schema* and migration* sstables from both 192.168.1.28 and
>> 192.168.1.27
>>
>>
>> 2011/8/5 Dikang Gu <di...@gmail.com>:
>> > [default@unknown] describe cluster;
>> > Cluster Information:
>> >    Snitch: org.apache.cassandra.locator.SimpleSnitch
>> >    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>> >    Schema versions:
>> > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
>> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>> > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
>> >
>> >  three different schema versions in the cluster...
>> > --
>> > Dikang Gu
>> > 0086 - 18611140205
>> >
>>
>
>
>
> --
> Dikang Gu
>
> 0086 - 18611140205
>
>
>


-- 
Dikang Gu

0086 - 18611140205

Re: How to solve this kind of schema disagreement...

Posted by aaron morton <aa...@thelastpickle.com>.

After there restart you what was in the  logs for the 1.27 machine  from the Migration.java logger ? Some of the messages will start with "Applying migration"

You should have shut down both of the nodes, then deleted the schema* and migration* system sstables, then restarted one of them and watched to see if it got to schema agreement. 

Cheers
  
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6 Aug 2011, at 22:56, Dikang Gu wrote:

> I have tried this, but the schema still does not agree in the cluster:
> 
> [default@unknown] describe cluster;
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions: 
> 	UNREACHABLE: [192.168.1.28]
> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
> 
> Any other suggestions to solve this?
> 
> Because I have some production data saved in the cassandra cluster, so I can not afford data lost...
> 
> Thanks.
> 
> On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:
> Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
> 75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
> remove the schema* and migration* sstables from both 192.168.1.28 and
> 192.168.1.27
> 
> 
> 2011/8/5 Dikang Gu <di...@gmail.com>:
> > [default@unknown] describe cluster;
> > Cluster Information:
> >    Snitch: org.apache.cassandra.locator.SimpleSnitch
> >    Partitioner: org.apache.cassandra.dht.RandomPartitioner
> >    Schema versions:
> > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
> >
> >  three different schema versions in the cluster...
> > --
> > Dikang Gu
> > 0086 - 18611140205
> >
> 
> 
> 
> -- 
> Dikang Gu
> 
> 0086 - 18611140205
>

Re: How to solve this kind of schema disagreement...

Posted by Dikang Gu <di...@gmail.com>.

I have tried this, but the schema still does not agree in the cluster:

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
UNREACHABLE: [192.168.1.28]
75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]

Any other suggestions to solve this?

Because I have some production data saved in the cassandra cluster, so I can
not afford data lost...

Thanks.

On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <be...@noisette.ch> wrote:

> Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
> 75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
> remove the schema* and migration* sstables from both 192.168.1.28 and
> 192.168.1.27
>
>
> 2011/8/5 Dikang Gu <di...@gmail.com>:
> > [default@unknown] describe cluster;
> > Cluster Information:
> >    Snitch: org.apache.cassandra.locator.SimpleSnitch
> >    Partitioner: org.apache.cassandra.dht.RandomPartitioner
> >    Schema versions:
> > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
> >
> >  three different schema versions in the cluster...
> > --
> > Dikang Gu
> > 0086 - 18611140205
> >
>



-- 
Dikang Gu

0086 - 18611140205

Re: How to solve this kind of schema disagreement...

Posted by Benoit Perroud <be...@noisette.ch>.

Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
remove the schema* and migration* sstables from both 192.168.1.28 and
192.168.1.27


2011/8/5 Dikang Gu <di...@gmail.com>:
> [default@unknown] describe cluster;
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions:
> 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
> 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
> 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
>
>  three different schema versions in the cluster...
> --
> Dikang Gu
> 0086 - 18611140205
>