You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Edward Sargisson <ed...@globalrelay.net> on 2012/08/23 20:47:38 UTC

Node forgets about most of its column families

Hi all,
I was wondering if anybody had seen the following behaviour before and 
how we might detect it and keep the application running.

We have a 6 node cluster. It seems that one of these nodes forgot about 
all but one of the application column families - possibly after a 
restart. Then, when our application connects using Hector, it can't find 
any data so gives back an exception.

I'm currently running nodetool repair on one of the *other* nodes which 
is taking a very long time to complete. (35mins and counting a load of <9MB)

The  logs from the failing node say:
  INFO [MemoryMeter:1] 2012-08-23 14:59:14,807 Memtable.java (line 213) 
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.1219167666485013 (just-counted was 1.0).  calculation took 28ms for 
1252 columns
  INFO [main] 2012-08-23 14:59:14,949 CommitLogReplayer.java (line 272) 
Finished reading /var/lib/cassandra/commitlog/CommitLog-22654969122258
24.log
  INFO [main] 2012-08-23 14:59:14,950 CommitLogReplayer.java (line 103) 
Skipped 8216 mutations from unknown (probably removed) CF with id 1016
  INFO [main] 2012-08-23 14:59:14,950 CommitLogReplayer.java (line 103) 
Skipped 3013 mutations from unknown (probably removed) CF with id 1017

... and so on.

Hector is saying:
InvalidRequestException(why:unconfigured columnfamily user_conversations)


Thanks for any comments or advice,
Edward

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*--- *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.

Re: Node forgets about most of its column families

Posted by Peter Schuller <pe...@infidyne.com>.

> This is 1.1.X ?

Yeah. Post-1.1 release 1.1, a couple of months old w.r.t. upstream changes.

> Any thoughts on how recent the last schema change was ?

I believe a long time ago. Like weeks or more. Definitely not around
the time of the incident.

The first time I saw this happen after running scrub (triggered due to
upgrade from 0.8 -> 1.1 requiring scrub due to 0.8 bloom filter bug),
but later it happened on a regular restart on a post-upgraded system.

Haven't seen it on a cluster that hasn't been upgraded from 0.8, but
then I only saw it on one cluster so that is not a statistically
significant sample.

> Had the schema started in a pre 1.1X cluster? If so had their been a
> migration change after 1.1 upgrade?

Schema started on 0.8, no change after upgrade to 1.1. This triggered
on the very first node I ever ran scrub on in fact immediately after
the scrub (validation restart for paranoia after scrub), so there was
no opportunity to submit any schema change (due to mixed 0.8/1.1
cluster at the time).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Node forgets about most of its column families

Posted by aaron morton <aa...@thelastpickle.com>.

Thanks Peter. 

This is 1.1.X ?

Any thoughts on how recent the last schema change was ? 
Had the schema started in a pre 1.1X cluster? If so had their been a migration change after 1.1 upgrade?

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/08/2012, at 1:55 PM, Peter Schuller <pe...@infidyne.com> wrote:

> I can confirm having seen this (no time to debug). One method of
> recovery is to jump the node back into the ring with auto_bootstrap
> set to false and an appropriate token set, after deleting system
> tables. That assumes you're willing to have the node take a few bad
> reads until you're able to disablegossip and make other nodes not send
> requests to it. disabling thrift would also be advised, or even
> firewalling it prior to restart.
> 
> -- 
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Node forgets about most of its column families

Posted by Peter Schuller <pe...@infidyne.com>.

I can confirm having seen this (no time to debug). One method of
recovery is to jump the node back into the ring with auto_bootstrap
set to false and an appropriate token set, after deleting system
tables. That assumes you're willing to have the node take a few bad
reads until you're able to disablegossip and make other nodes not send
requests to it. disabling thrift would also be advised, or even
firewalling it prior to restart.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Node forgets about most of its column families

Posted by aaron morton <aa...@thelastpickle.com>.

For those playing along at home Edwards ticket was marked as a dup of

Problem with creating keyspace after drop
https://issues.apache.org/jira/browse/CASSANDRA-4219

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/08/2012, at 4:43 AM, Edward Sargisson <ed...@globalrelay.net> wrote:

> Hi Aaron,
> Thanks for the reply. I've recorded what we know at https://issues.apache.org/jira/browse/CASSANDRA-4583.
> This includes log snippets from two of the nodes from around the time. I don't know what is relevant so they've got everything that was in the system log at the time of the failure and recovery.
> 
> Nodetool crashed but not returning, having nothing appear in the logs and nodetool compactionstats and nodetool netstats indicating that nothing was happening.
> 
> Thanks for your time looking at this.
> 
> Cheers,
> Edward
> 
> 
> On 12-08-29 02:44 AM, aaron morton wrote:
>>> But the following nodetool repair crashes. It has to be stopped and then re-started.
>> How did it crash ?
>> 
>>> Are there any suggestions for logging or similar so that we can get a clue next time this happens.
>> Can you make the logs from #5 available?
>> 
>> If you feel you can describe the situation please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA
>> 
>> Cheers
>> 
>>  
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 29/08/2012, at 8:38 AM, Edward Sargisson <ed...@globalrelay.net> wrote:
>> 
>>> For the record, we just had a recurrence of this. 
>>> This time, when the node (#5) came back it didn't properly rejoin the ring. 
>>> We stopped every node and brought them back one by one to get the ring to link up correctly.
>>> Then, all the even nodes (#2, #4, #6) had out of data schemas.
>>> 
>>> nodetool resetlocalschema works.
>>> But the following nodetool repair crashes. It has to be stopped and then re-started.
>>> 
>>> Are there any suggestions for logging or similar so that we can get a clue next time this happens.
>>> 
>>> Cheers,
>>> Edward
>>> 
>>> 
>>> On 12-08-24 11:18 AM, Edward Sargisson wrote:
>>>> Sadly, I don't think we can get much.
>>>> 
>>>> All I know about the repro is that it was around a node restart. I've just tried that and everything's fine. I see now ERROR level messages in the logs.
>>>> 
>>>> Clearly, some other conditions are required but we don't know them as yet.
>>>> 
>>>> Many thanks,
>>>> Edward
>>>> 
>>>> 
>>>> On 12-08-24 03:29 AM, aaron morton wrote:
>>>>> If this is still a test environment can you try to reproduce the fault ? Or provide some more details on the sequence of events?
>>>>> 
>>>>> If you still have the logs around can you see if any ERROR level messages were logged?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Developer
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>> 
>>>>> On 24/08/2012, at 8:33 AM, Edward Sargisson <ed...@globalrelay.net> wrote:
>>>>> 
>>>>>> Ah, yes, I forgot that bit thanks!
>>>>>> 
>>>>>> 1.1.2 running on Centos.
>>>>>> 
>>>>>> Running nodetool resetlocalschema then nodetool repair fixed the problem but not understanding what happened is a concern.
>>>>>> 
>>>>>> Cheers,
>>>>>> Edward
>>>>>> 
>>>>>> 
>>>>>> On 12-08-23 12:40 PM, Rob Coli wrote:
>>>>>>> On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
>>>>>>> <ed...@globalrelay.net> wrote:
>>>>>>>> I was wondering if anybody had seen the following behaviour before and how
>>>>>>>> we might detect it and keep the application running.
>>>>>>> I don't know the answer to your problem, but anyone who does will want
>>>>>>> to know in what version of Cassandra you are encountering this issue.
>>>>>>> :)
>>>>>>> 
>>>>>>> =Rob
>>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Edward Sargisson
>>>>>> senior java developer
>>>>>> Global Relay
>>>>>> 
>>>>>> edward.sargisson@globalrelay.net
>>>>>> 
>>>>>> 
>>>>>> 866.484.6630 
>>>>>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
>>>>>> 
>>>>>> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. 
>>>>>> 
>>>>>> Ask about Global Relay Message — The Future of Collaboration in the Financial Services World
>>>>>> 
>>>>>> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.
>>>>> 
>>>> 
>>>> -- 
>>>> Edward Sargisson
>>>> senior java developer
>>>> Global Relay
>>>> 
>>>> edward.sargisson@globalrelay.net
>>>> 
>>>> 
>>>> 866.484.6630 
>>>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
>>>> 
>>>> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and                           more. 
>>>> 
>>>> Ask about Global Relay Message — The Future of Collaboration in the Financial Services World
>>>> 
>>>> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.
>>> 
>>> -- 
>>> Edward Sargisson
>>> senior java developer
>>> Global Relay
>>> 
>>> edward.sargisson@globalrelay.net
>>> 
>>> 
>>> 866.484.6630 
>>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
>>> 
>>> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. 
>>> 
>>> Ask about Global Relay Message — The Future of Collaboration in the Financial Services World
>>> 
>>> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.
>> 
> 
> -- 
> Edward Sargisson
> senior java developer
> Global Relay
> 
> edward.sargisson@globalrelay.net
> 
> 
> 866.484.6630 
> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
> 
> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. 
> 
> Ask about Global Relay Message — The Future of Collaboration in the Financial Services World
> 
> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.

Re: Node forgets about most of its column families

Posted by Edward Sargisson <ed...@globalrelay.net>.

Hi Aaron,
Thanks for the reply. I've recorded what we know at 
https://issues.apache.org/jira/browse/CASSANDRA-4583.
This includes log snippets from two of the nodes from around the time. I 
don't know what is relevant so they've got everything that was in the 
system log at the time of the failure and recovery.

Nodetool crashed but not returning, having nothing appear in the logs 
and nodetool compactionstats and nodetool netstats indicating that 
nothing was happening.

Thanks for your time looking at this.

Cheers,
Edward


On 12-08-29 02:44 AM, aaron morton wrote:
>> But the following nodetool repair crashes. It has to be stopped and 
>> then re-started.
> How did it crash ?
>
>> Are there any suggestions for logging or similar so that we can get a 
>> clue next time this happens.
> Can you make the logs from #5 available?
>
> If you feel you can describe the situation please create a ticket on 
> https://issues.apache.org/jira/browse/CASSANDRA
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/08/2012, at 8:38 AM, Edward Sargisson 
> <edward.sargisson@globalrelay.net 
> <ma...@globalrelay.net>> wrote:
>
>> For the record, we just had a recurrence of this.
>> This time, when the node (#5) came back it didn't properly rejoin the 
>> ring.
>> We stopped every node and brought them back one by one to get the 
>> ring to link up correctly.
>> Then, all the even nodes (#2, #4, #6) had out of data schemas.
>>
>> nodetool resetlocalschema works.
>> But the following nodetool repair crashes. It has to be stopped and 
>> then re-started.
>>
>> Are there any suggestions for logging or similar so that we can get a 
>> clue next time this happens.
>>
>> Cheers,
>> Edward
>>
>>
>> On 12-08-24 11:18 AM, Edward Sargisson wrote:
>>> Sadly, I don't think we can get much.
>>>
>>> All I know about the repro is that it was around a node restart. 
>>> I've just tried that and everything's fine. I see now ERROR level 
>>> messages in the logs.
>>>
>>> Clearly, some other conditions are required but we don't know them 
>>> as yet.
>>>
>>> Many thanks,
>>> Edward
>>>
>>>
>>> On 12-08-24 03:29 AM, aaron morton wrote:
>>>> If this is still a test environment can you try to reproduce the 
>>>> fault ? Or provide some more details on the sequence of events?
>>>>
>>>> If you still have the logs around can you see if any ERROR level 
>>>> messages were logged?
>>>>
>>>> Cheers
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com <http://www.thelastpickle.com/>
>>>>
>>>> On 24/08/2012, at 8:33 AM, Edward Sargisson 
>>>> <edward.sargisson@globalrelay.net 
>>>> <ma...@globalrelay.net>> wrote:
>>>>
>>>>> Ah, yes, I forgot that bit thanks!
>>>>>
>>>>> 1.1.2 running on Centos.
>>>>>
>>>>> Running nodetool resetlocalschema then nodetool repair fixed the 
>>>>> problem but not understanding what happened is a concern.
>>>>>
>>>>> Cheers,
>>>>> Edward
>>>>>
>>>>>
>>>>> On 12-08-23 12:40 PM, Rob Coli wrote:
>>>>>> On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
>>>>>> <ed...@globalrelay.net>  wrote:
>>>>>>> I was wondering if anybody had seen the following behaviour before and how
>>>>>>> we might detect it and keep the application running.
>>>>>> I don't know the answer to your problem, but anyone who does will want
>>>>>> to know in what version of Cassandra you are encountering this issue.
>>>>>> :)
>>>>>>
>>>>>> =Rob
>>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> Edward Sargisson
>>>>>
>>>>> senior java developer
>>>>> Global Relay
>>>>>
>>>>> edward.sargisson@globalrelay.net 
>>>>> <ma...@globalrelay.net>
>>>>>
>>>>>
>>>>> *866.484.6630*
>>>>> New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
>>>>> Singapore  (+65.3158.1301)
>>>>>
>>>>> Global Relay Archive supports email, instant messaging, 
>>>>> BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, 
>>>>> LinkedIn, Twitter, Facebook and more.
>>>>>
>>>>>
>>>>> Ask about *Global Relay Message* 
>>>>> <http://www.globalrelay.com/services/message>*— *The Future of 
>>>>> Collaboration in the Financial Services World
>>>>>
>>>>> *
>>>>> *All email sent to or from this address will be retained by Global 
>>>>> Relay’s email archiving system. This message is intended only for 
>>>>> the use of the individual or entity to which it is addressed, and 
>>>>> may contain information that is privileged, confidential, and 
>>>>> exempt from disclosure under applicable law.  Global Relay will 
>>>>> not be liable for any compliance or technical information provided 
>>>>> herein.  All trademarks are the property of their respective owners.
>>>>>
>>>>
>>>
>>> -- 
>>>
>>> Edward Sargisson
>>>
>>> senior java developer
>>> Global Relay
>>>
>>> edward.sargisson@globalrelay.net 
>>> <ma...@globalrelay.net>
>>>
>>>
>>> *866.484.6630*
>>> New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
>>> Singapore (+65.3158.1301)
>>>
>>> Global Relay Archive supports email, instant messaging, BlackBerry, 
>>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
>>> Facebook and more.
>>>
>>>
>>> Ask about *Global Relay Message* 
>>> <http://www.globalrelay.com/services/message>*— *The Future of 
>>> Collaboration in the Financial Services World
>>>
>>> *
>>> *All email sent to or from this address will be retained by Global 
>>> Relay’s email archiving system. This message is intended only for 
>>> the use of the individual or entity to which it is addressed, and 
>>> may contain information that is privileged, confidential, and exempt 
>>> from disclosure under applicable law.  Global Relay will not be 
>>> liable for any compliance or technical information provided herein.  
>>> All trademarks are the property of their respective owners.
>>>
>>
>> -- 
>>
>> Edward Sargisson
>>
>> senior java developer
>> Global Relay
>>
>> edward.sargisson@globalrelay.net 
>> <ma...@globalrelay.net>
>>
>>
>> *866.484.6630*
>> New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
>> Singapore (+65.3158.1301)
>>
>> Global Relay Archive supports email, instant messaging, BlackBerry, 
>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
>> Facebook and more.
>>
>>
>> Ask about *Global Relay Message* 
>> <http://www.globalrelay.com/services/message>*— *The Future of 
>> Collaboration in the Financial Services World
>>
>> *
>> *All email sent to or from this address will be retained by Global 
>> Relay’s email archiving system. This message is intended only for the 
>> use of the individual or entity to which it is addressed, and may 
>> contain information that is privileged, confidential, and exempt from 
>> disclosure under applicable law.  Global Relay will not be liable for 
>> any compliance or technical information provided herein.  All 
>> trademarks are the property of their respective owners.
>>
>

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*— *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.

Re: Node forgets about most of its column families

Posted by aaron morton <aa...@thelastpickle.com>.

> But the following nodetool repair crashes. It has to be stopped and then re-started.
How did it crash ?

> Are there any suggestions for logging or similar so that we can get a clue next time this happens.
Can you make the logs from #5 available?

If you feel you can describe the situation please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA

Cheers

 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/08/2012, at 8:38 AM, Edward Sargisson <ed...@globalrelay.net> wrote:

> For the record, we just had a recurrence of this. 
> This time, when the node (#5) came back it didn't properly rejoin the ring. 
> We stopped every node and brought them back one by one to get the ring to link up correctly.
> Then, all the even nodes (#2, #4, #6) had out of data schemas.
> 
> nodetool resetlocalschema works.
> But the following nodetool repair crashes. It has to be stopped and then re-started.
> 
> Are there any suggestions for logging or similar so that we can get a clue next time this happens.
> 
> Cheers,
> Edward
> 
> 
> On 12-08-24 11:18 AM, Edward Sargisson wrote:
>> Sadly, I don't think we can get much.
>> 
>> All I know about the repro is that it was around a node restart. I've just tried that and everything's fine. I see now ERROR level messages in the logs.
>> 
>> Clearly, some other conditions are required but we don't know them as yet.
>> 
>> Many thanks,
>> Edward
>> 
>> 
>> On 12-08-24 03:29 AM, aaron morton wrote:
>>> If this is still a test environment can you try to reproduce the fault ? Or provide some more details on the sequence of events?
>>> 
>>> If you still have the logs around can you see if any ERROR level messages were logged?
>>> 
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 24/08/2012, at 8:33 AM, Edward Sargisson <ed...@globalrelay.net> wrote:
>>> 
>>>> Ah, yes, I forgot that bit thanks!
>>>> 
>>>> 1.1.2 running on Centos.
>>>> 
>>>> Running nodetool resetlocalschema then nodetool repair fixed the problem but not understanding what happened is a concern.
>>>> 
>>>> Cheers,
>>>> Edward
>>>> 
>>>> 
>>>> On 12-08-23 12:40 PM, Rob Coli wrote:
>>>>> On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
>>>>> <ed...@globalrelay.net> wrote:
>>>>>> I was wondering if anybody had seen the following behaviour before and how
>>>>>> we might detect it and keep the application running.
>>>>> I don't know the answer to your problem, but anyone who does will want
>>>>> to know in what version of Cassandra you are encountering this issue.
>>>>> :)
>>>>> 
>>>>> =Rob
>>>>> 
>>>> 
>>>> -- 
>>>> Edward Sargisson
>>>> senior java developer
>>>> Global Relay
>>>> 
>>>> edward.sargisson@globalrelay.net
>>>> 
>>>> 
>>>> 866.484.6630 
>>>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
>>>> 
>>>> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and                           more. 
>>>> 
>>>> Ask about Global Relay Message — The Future of Collaboration in the Financial Services World
>>>> 
>>>> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.
>>> 
>> 
>> -- 
>> Edward Sargisson
>> senior java developer
>> Global Relay
>> 
>> edward.sargisson@globalrelay.net
>> 
>> 
>> 866.484.6630 
>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
>> 
>> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. 
>> 
>> Ask about Global Relay Message — The Future of Collaboration in the Financial Services World
>> 
>> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.
> 
> -- 
> Edward Sargisson
> senior java developer
> Global Relay
> 
> edward.sargisson@globalrelay.net
> 
> 
> 866.484.6630 
> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
> 
> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. 
> 
> Ask about Global Relay Message — The Future of Collaboration in the Financial Services World
> 
> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.

Re: Node forgets about most of its column families

Posted by Edward Sargisson <ed...@globalrelay.net>.

For the record, we just had a recurrence of this.
This time, when the node (#5) came back it didn't properly rejoin the ring.
We stopped every node and brought them back one by one to get the ring 
to link up correctly.
Then, all the even nodes (#2, #4, #6) had out of data schemas.

nodetool resetlocalschema works.
But the following nodetool repair crashes. It has to be stopped and then 
re-started.

Are there any suggestions for logging or similar so that we can get a 
clue next time this happens.

Cheers,
Edward


On 12-08-24 11:18 AM, Edward Sargisson wrote:
> Sadly, I don't think we can get much.
>
> All I know about the repro is that it was around a node restart. I've 
> just tried that and everything's fine. I see now ERROR level messages 
> in the logs.
>
> Clearly, some other conditions are required but we don't know them as yet.
>
> Many thanks,
> Edward
>
>
> On 12-08-24 03:29 AM, aaron morton wrote:
>> If this is still a test environment can you try to reproduce the 
>> fault ? Or provide some more details on the sequence of events?
>>
>> If you still have the logs around can you see if any ERROR level 
>> messages were logged?
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 24/08/2012, at 8:33 AM, Edward Sargisson 
>> <edward.sargisson@globalrelay.net 
>> <ma...@globalrelay.net>> wrote:
>>
>>> Ah, yes, I forgot that bit thanks!
>>>
>>> 1.1.2 running on Centos.
>>>
>>> Running nodetool resetlocalschema then nodetool repair fixed the 
>>> problem but not understanding what happened is a concern.
>>>
>>> Cheers,
>>> Edward
>>>
>>>
>>> On 12-08-23 12:40 PM, Rob Coli wrote:
>>>> On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
>>>> <ed...@globalrelay.net>  wrote:
>>>>> I was wondering if anybody had seen the following behaviour before and how
>>>>> we might detect it and keep the application running.
>>>> I don't know the answer to your problem, but anyone who does will want
>>>> to know in what version of Cassandra you are encountering this issue.
>>>> :)
>>>>
>>>> =Rob
>>>>
>>>
>>> -- 
>>>
>>> Edward Sargisson
>>>
>>> senior java developer
>>> Global Relay
>>>
>>> edward.sargisson@globalrelay.net 
>>> <ma...@globalrelay.net>
>>>
>>>
>>> *866.484.6630*
>>> New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
>>> Singapore (+65.3158.1301)
>>>
>>> Global Relay Archive supports email, instant messaging, BlackBerry, 
>>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
>>> Facebook and more.
>>>
>>>
>>> Ask about *Global Relay Message* 
>>> <http://www.globalrelay.com/services/message>*— *The Future of 
>>> Collaboration in the Financial Services World
>>>
>>> *
>>> *All email sent to or from this address will be retained by Global 
>>> Relay’s email archiving system. This message is intended only for 
>>> the use of the individual or entity to which it is addressed, and 
>>> may contain information that is privileged, confidential, and exempt 
>>> from disclosure under applicable law.  Global Relay will not be 
>>> liable for any compliance or technical information provided herein.  
>>> All trademarks are the property of their respective owners.
>>>
>>
>
> -- 
>
> Edward Sargisson
>
> senior java developer
> Global Relay
>
> edward.sargisson@globalrelay.net <ma...@globalrelay.net>
>
>
> *866.484.6630*
> New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
> Singapore (+65.3158.1301)
>
> Global Relay Archive supports email, instant messaging, BlackBerry, 
> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
> Facebook and more.
>
>
> Ask about *Global Relay Message* 
> <http://www.globalrelay.com/services/message>*— *The Future of 
> Collaboration in the Financial Services World
>
> *
> *All email sent to or from this address will be retained by Global 
> Relay’s email archiving system. This message is intended only for the 
> use of the individual or entity to which it is addressed, and may 
> contain information that is privileged, confidential, and exempt from 
> disclosure under applicable law.  Global Relay will not be liable for 
> any compliance or technical information provided herein.  All 
> trademarks are the property of their respective owners.
>

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*— *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.

Re: Node forgets about most of its column families

Posted by Edward Sargisson <ed...@globalrelay.net>.

Sadly, I don't think we can get much.

All I know about the repro is that it was around a node restart. I've 
just tried that and everything's fine. I see now ERROR level messages in 
the logs.

Clearly, some other conditions are required but we don't know them as yet.

Many thanks,
Edward


On 12-08-24 03:29 AM, aaron morton wrote:
> If this is still a test environment can you try to reproduce the fault 
> ? Or provide some more details on the sequence of events?
>
> If you still have the logs around can you see if any ERROR level 
> messages were logged?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 24/08/2012, at 8:33 AM, Edward Sargisson 
> <edward.sargisson@globalrelay.net 
> <ma...@globalrelay.net>> wrote:
>
>> Ah, yes, I forgot that bit thanks!
>>
>> 1.1.2 running on Centos.
>>
>> Running nodetool resetlocalschema then nodetool repair fixed the 
>> problem but not understanding what happened is a concern.
>>
>> Cheers,
>> Edward
>>
>>
>> On 12-08-23 12:40 PM, Rob Coli wrote:
>>> On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
>>> <ed...@globalrelay.net>  wrote:
>>>> I was wondering if anybody had seen the following behaviour before and how
>>>> we might detect it and keep the application running.
>>> I don't know the answer to your problem, but anyone who does will want
>>> to know in what version of Cassandra you are encountering this issue.
>>> :)
>>>
>>> =Rob
>>>
>>
>> -- 
>>
>> Edward Sargisson
>>
>> senior java developer
>> Global Relay
>>
>> edward.sargisson@globalrelay.net 
>> <ma...@globalrelay.net>
>>
>>
>> *866.484.6630*
>> New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
>> Singapore (+65.3158.1301)
>>
>> Global Relay Archive supports email, instant messaging, BlackBerry, 
>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
>> Facebook and more.
>>
>>
>> Ask about *Global Relay Message* 
>> <http://www.globalrelay.com/services/message>*— *The Future of 
>> Collaboration in the Financial Services World
>>
>> *
>> *All email sent to or from this address will be retained by Global 
>> Relay’s email archiving system. This message is intended only for the 
>> use of the individual or entity to which it is addressed, and may 
>> contain information that is privileged, confidential, and exempt from 
>> disclosure under applicable law.  Global Relay will not be liable for 
>> any compliance or technical information provided herein.  All 
>> trademarks are the property of their respective owners.
>>
>

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*— *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.

Re: Node forgets about most of its column families

Posted by aaron morton <aa...@thelastpickle.com>.

If this is still a test environment can you try to reproduce the fault ? Or provide some more details on the sequence of events?

If you still have the logs around can you see if any ERROR level messages were logged?

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 8:33 AM, Edward Sargisson <ed...@globalrelay.net> wrote:

> Ah, yes, I forgot that bit thanks!
> 
> 1.1.2 running on Centos.
> 
> Running nodetool resetlocalschema then nodetool repair fixed the problem but not understanding what happened is a concern.
> 
> Cheers,
> Edward
> 
> 
> On 12-08-23 12:40 PM, Rob Coli wrote:
>> On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
>> <ed...@globalrelay.net> wrote:
>>> I was wondering if anybody had seen the following behaviour before and how
>>> we might detect it and keep the application running.
>> I don't know the answer to your problem, but anyone who does will want
>> to know in what version of Cassandra you are encountering this issue.
>> :)
>> 
>> =Rob
>> 
> 
> -- 
> Edward Sargisson
> senior java developer
> Global Relay
> 
> edward.sargisson@globalrelay.net
> 
> 
> 866.484.6630 
> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
> 
> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. 
> 
> Ask about Global Relay Message — The Future of Collaboration in the Financial Services World
> 
> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.

Re: Node forgets about most of its column families

Posted by Edward Sargisson <ed...@globalrelay.net>.

Ah, yes, I forgot that bit thanks!

1.1.2 running on Centos.

Running nodetool resetlocalschema then nodetool repair fixed the problem 
but not understanding what happened is a concern.

Cheers,
Edward

On 12-08-23 12:40 PM, Rob Coli wrote:
> On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
> <ed...@globalrelay.net> wrote:
>> I was wondering if anybody had seen the following behaviour before and how
>> we might detect it and keep the application running.
> I don't know the answer to your problem, but anyone who does will want
> to know in what version of Cassandra you are encountering this issue.
> :)
>
> =Rob
>

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>

*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.

Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*--- *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.

Re: Node forgets about most of its column families

Posted by Rob Coli <rc...@palominodb.com>.

On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
<ed...@globalrelay.net> wrote:
> I was wondering if anybody had seen the following behaviour before and how
> we might detect it and keep the application running.

I don't know the answer to your problem, but anyone who does will want
to know in what version of Cassandra you are encountering this issue.
:)

=Rob

-- 
=Robert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb