You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2013/11/21 11:39:52 UTC

Re: gossip marking all nodes as down when decommissioning one node.

I just experimented the same thing on our 28 m1.xlarge C*1.2.11 cluster.

phi_convict_threshold is default : 8. I will try increasing it to 12 as 12
seems to be the good value :)

That's still weird to see all nodes marked down at once. I never
experimented this before using vnodes...

Alain






2013/10/28 Aaron Morton <aa...@thelastpickle.com>

> >  (2 nodes in each availability zone)
> How many AZ’s ?
>
> > The ec2 instances are m1.large
> I strongly recommend using m1.xlarge with ephemeral disks or a higher spec
> machine.  m1.large is not up to the task.
>
> > Why on earth is the decommissioning of one node causing all the nodes to
> be marked down?
> decommissioning a node causes it to stream it’s data to the remaining
> nodes, which results in them performing compaction. I would guess the low
> power m1.large nodes could not handle the incoming traffic and compaction.
> This probably resulted in GC problems (check the logs), which causes them
> to be marked as down.
>
> > 1) If we set the phi_convict_threshold to 12 or higher the nodes never
> get marked down.
> 12 is a good number on aws.
>
> > 2) or If we set the vnodes to 16 or lower we never see them get marked
> down.
> I would leave this at 256.
> The less vnodes may result in slightly less overhead in repair, but the
> ultimate cause is the choice of HW.
>
> > Is either of these solutions dangerous or better than the other?
> Change the phi and move to m1.xlarge by doing a lift-and-shift. Stop one
> node at a time and copy all it’s data and config to a new node.
>
> > The ultimate cause of the problem appears to be that the
> calculatePendingRanges in StorageService.java is an extremely expensive
> proces
>
> We don’t see issues like this other than on low powered nodes.
>
> Cheers
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 26/10/2013, at 6:14 am, John Pyeatt <jo...@singlewire.com> wrote:
>
> > We are running a 6-node cluster in amazon cloud (2 nodes in each
> availability zone). The ec2 instances are m1.large and we have 256 vnodes
> on each node.
> >
> > We are using Ec2Snitch, NetworkTopologyStrategy and a replication factor
> of 3.
> >
> > When we decommission one node suddenly reads and writes start to fail.
> We are seeing Not Enough Replicas error messages which doesn't make sense
> even though we are doing QUORUM reads/writes because there should still be
> 2 copies of each piece of data in the cluster.
> >
> > Digging deep in the logs we see that the phi_convict_threshold is being
> exceeded so all nodes in the cluster are being marked down for a period of
> approximately 10 seconds.
> >
> > Why on earth is the decommissioning of one node causing all the nodes to
> be marked down?
> >
> > We have two ways to work around this, though we think we have found the
> ultimate cause of the problem.
> > 1) If we set the phi_convict_threshold to 12 or higher the nodes never
> get marked down.
> > 2) or If we set the vnodes to 16 or lower we never see them get marked
> down.
> >
> > Is either of these solutions dangerous or better than the other?
> >
> >
> > The ultimate cause of the problem appears to be that the
> calculatePendingRanges in StorageService.java is an extremely expensive
> process and is running in the same thread pool (GossipTasks) as the
> Gossiper.java code. calculatePendingRanges() runs during state changes of
> nodes (ex. decommissioning). During this time it appears that it is hogging
> the one thread in the GossipTasks thread pool thus causing things to get
> marked down from FailureDetector.java.
> >
> >
> >
> > --
> > John Pyeatt
> > Singlewire Software, LLC
> > www.singlewire.com
> > ------------------
> > 608.661.1184
> > john.pyeatt@singlewire.com
>
>

Re: gossip marking all nodes as down when decommissioning one node.

Posted by Ryan Fowler <ry...@singlewire.com>.

You might be running into CASSANDRA-6244. That ended up being our problem
anyway.

On Thu, Nov 21, 2013 at 9:37 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Nov 21, 2013 at 6:17 PM, Alain RODRIGUEZ <ar...@gmail.com>wrote:
>
>> Oh ! Thanks.
>>
>> Is there any workaround to avoid the problem while waiting for update ?
>>
>
> Per driftx in #cassandra, this is probably *not* 6297 because only a
> single flush is involved. If you haven't, I would consider filing a
> Cassandra JIRA with details.
>
> =Rob
>
>

Re: gossip marking all nodes as down when decommissioning one node.

Posted by Robert Coli <rc...@eventbrite.com>.

On Thu, Nov 21, 2013 at 6:17 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Oh ! Thanks.
>
> Is there any workaround to avoid the problem while waiting for update ?
>

Per driftx in #cassandra, this is probably *not* 6297 because only a single
flush is involved. If you haven't, I would consider filing a Cassandra JIRA
with details.

=Rob

Re: gossip marking all nodes as down when decommissioning one node.

Posted by Tupshin Harper <tu...@tupshin.com>.

Increasing the phi value to 12 can be a partial workaround. It's certainly
not a fix, but it does partially alleviate the issue. Otherwise hang in
there until 1.2.12. Aaron is probably right that this is aggravated on
under powered nodes, but larger nodes can still see these symptoms.

-Tupshin
On Nov 21, 2013 11:18 PM, "Alain RODRIGUEZ" <ar...@gmail.com> wrote:

> Oh ! Thanks.
>
> Is there any workaround to avoid the problem while waiting for update ?
>
>
>
>
> 2013/11/22 Robert Coli <rc...@eventbrite.com>
>
>> On Thu, Nov 21, 2013 at 2:39 AM, Alain RODRIGUEZ <ar...@gmail.com>wrote:
>>
>>> I just experimented the same thing on our 28 m1.xlarge C*1.2.11 cluster.
>>>
>>> phi_convict_threshold is default : 8. I will try increasing it to 12 as
>>> 12 seems to be the good value :)
>>>
>>> That's still weird to see all nodes marked down at once. I never
>>> experimented this before using vnodes...
>>>
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6297 ?
>>
>> "Gossiper blocks when updating tokens and turns node down"
>>
>> =Rob
>>
>
>

Re: gossip marking all nodes as down when decommissioning one node.

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Oh ! Thanks.

Is there any workaround to avoid the problem while waiting for update ?




2013/11/22 Robert Coli <rc...@eventbrite.com>

> On Thu, Nov 21, 2013 at 2:39 AM, Alain RODRIGUEZ <ar...@gmail.com>wrote:
>
>> I just experimented the same thing on our 28 m1.xlarge C*1.2.11 cluster.
>>
>> phi_convict_threshold is default : 8. I will try increasing it to 12 as
>> 12 seems to be the good value :)
>>
>> That's still weird to see all nodes marked down at once. I never
>> experimented this before using vnodes...
>>
>
> https://issues.apache.org/jira/browse/CASSANDRA-6297 ?
>
> "Gossiper blocks when updating tokens and turns node down"
>
> =Rob
>

Re: gossip marking all nodes as down when decommissioning one node.

Posted by Robert Coli <rc...@eventbrite.com>.

On Thu, Nov 21, 2013 at 2:39 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> I just experimented the same thing on our 28 m1.xlarge C*1.2.11 cluster.
>
> phi_convict_threshold is default : 8. I will try increasing it to 12 as 12
> seems to be the good value :)
>
> That's still weird to see all nodes marked down at once. I never
> experimented this before using vnodes...
>

https://issues.apache.org/jira/browse/CASSANDRA-6297 ?

"Gossiper blocks when updating tokens and turns node down"

=Rob