You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Regis Le Bretonnic <r....@meetic-corp.com> on 2021/03/17 12:44:02 UTC

Delay between stop/start cassandra

Hi all,

Following a discussion with our adminsys, I have a very practical question.
We use cassandra proxies (-Dcassandra.join_ring=false) as coordinators for PHP clients (a loooooot of PHP clients).

Our problem is that restarting Cassandra on proxies sometimes fails with the following error :

ERROR [main] 2021-03-16 14:18:46,236 CassandraDaemon.java:803 - Exception encountered during startup
java.lang.RuntimeException: A node with address XXXXXXXXXXXXXXXX/10.120.1.XXX already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

The node mentioned in the ERROR is the one we are restarting... and the start fails. Of course doing a manual start after works fine.
This message doesn't make sense... hostId didn't changed for this proxy (I am sure of me : system.local, IP, hostname, ... nothing changed... just the restart).

What I suppose (we don't all agree about this) is that, as proxies don't have data, they start very quickly. Too quickly for gossip protocol knows that the node was down.

Could this ERROR log be explained if the node is still known as UP by seeds servers if the state of the proxy in gossip protocol is not updated because stop/start is made too quickly ?
If this hypothesis seems possible, what reasonable delay (with technical arguments) should be implemented between stop and start ?
We have ~ 100 proxies and 12 classical Cassandra (4 of them are seeds)...

Thx in advance

RE: Delay between stop/start cassandra

Posted by Regis Le Bretonnic <r....@meetic-corp.com>.

Clear… Thanks for the detailed answer…

De : Jeff Jirsa <jj...@gmail.com>

So, 30s or -Dcassandra.ring_delay_ms= on the command line, but note that this ALSO impacts normal startup/shutdown/expand/shrink/etc type operations, and if you have to ask how to change it, you probably shouldn't.

- Jeff

Re: Delay between stop/start cassandra

Posted by Jeff Jirsa <jj...@gmail.com>.

On 3.11, fat client timeout is QUARANTINE_DELAY / 2 :
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/gms/Gossiper.java#L260

Quarantine delay is StorageService.RING_DELAY * 2;
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/gms/Gossiper.java#L105


Ring delay is:
private static int getRingDelay()
{
String newdelay = System.getProperty("cassandra.ring_delay_ms");
if (newdelay != null)
{
logger.info("Overriding RING_DELAY to {}ms", newdelay);
return Integer.parseInt(newdelay);
}
else
return 30 * 1000;
}

So, 30s or -Dcassandra.ring_delay_ms= on the command line, but note that
this ALSO impacts normal startup/shutdown/expand/shrink/etc type
operations, and if you have to ask how to change it, you probably shouldn't.

- Jeff



On Wed, Mar 17, 2021 at 8:32 AM Regis Le Bretonnic <
r.lebretonnic@meetic-corp.com> wrote:

> Hi Jeff
>
> Thank a lot for your answer.
> The reference to "fat client" is very interesting… On debug log on
> classical node, we have sometimes message like :
>
>
>
> INFO  [GossipTasks:1] 2021-03-17 16:21:01,135 Gossiper.java:894 -
> FatClient /10.120.1.183 has been silent for 30000ms, removing from gossip
>
>
>
> Does it means that the fat client (our proxies) is removed from gossip,
> only after 30 seconds ? In such case, the delay I ask for is 30 sec :-)
> Does someone know if this parameter can be changed ?
>
> PS : yes proxies work really well… we indeed use PHP with FPM. That the
> reason why we have a lot of connections and so need proxies. Basically if
> counting all FPM on all our PHP servers, I’d said 8000 to 10000 clients…
> maybe more.
> Advantages are multiple but basically we had a lot of pressure on
> Cassandra node when restarting all our PHP servers during a new code
> rollout requiring PHP reload for instance (many times per day). Proxies
> saved us. We can continue to talk in private if you want.
>
>
>
> *De :* Jeff Jirsa <jj...@gmail.com>
> *Envoyé :* mercredi 17 mars 2021 14:52
> *À :* cassandra <us...@cassandra.apache.org>
> *Objet :* Re: Delay between stop/start cassandra
>
>
>
> -Dcassandra.join_ring=false is basically a pre-bootstrap phase that says
> "this machine is about to join the cluster, but hasn't yet, so don't give
> it a token"
>
>
>
> It's taking advantage of a stable but non-terminal state to let you do
> things like serve queries without owning data - it's a side effect that
> works, but it's rough because it wasn't exactly built for this purpose. In
> this state, you're considered a "fat client" - your presence exists in the
> ring as a "I'm here, about to join the ring with IP a.b.c.d", and you just
> conveniently decide not to join the ring. If you go away at any time, the
> cluster says "cool, no big deal, they didn't join the ring anyway".
>
>
>
> Your hypothesis is probably mostly right here - it's not so much UP or
> DOWN, it's "still here" or "gone". Because once the instance is DOWN, it
> gets removed because it hadn't finished joining. Once it's removed, it can
> come back and say "Hi, me again, about to join this cluster". But, until
> it's removed as a fat client, when it comes back and says "Hi, me again,
> about to join this cluster", cassandra says "not so fast friend, you're
> already here and we haven't yet given up on you joining".
>
>
>
> Random aside: There are relatively few people on earth who run like this,
> so I'm super interested in knowing how it's working for you. Does the PHP
> client still reconnect on every page load, or does it finally support long
> lived connections / pooling if you're using something like php-fpm or a
> fastcgi pool? Are the coordinators/proxies here just to handle a ridiculous
> number of clients, or is it the cost of connecting that's hurting as you
> blow up the native thread pool on connect for expensive auth?
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Mar 17, 2021 at 5:44 AM Regis Le Bretonnic <
> r.lebretonnic@meetic-corp.com> wrote:
>
> Hi all,
>
>
>
> Following a discussion with our adminsys, I have a very practical question.
>
> We use cassandra proxies (-Dcassandra.join_ring=false) as coordinators for
> PHP clients (a loooooot of PHP clients).
>
>
>
> Our problem is that restarting Cassandra on proxies sometimes fails with
> the following error :
>
>
>
> ERROR [main] 2021-03-16 14:18:46,236 CassandraDaemon.java:803 - Exception
> encountered during startup
> java.lang.RuntimeException: A node with address
> XXXXXXXXXXXXXXXX/10.120.1.XXX already exists, cancelling join. Use
> cassandra.replace_address if you want to replace this node.
>
>
>
> The node mentioned in the ERROR is the one we are restarting… and the
> start fails. Of course doing a manual start after works fine.
>
> This message doesn’t make sense… hostId didn’t changed for this proxy (I
> am sure of me : system.local, IP, hostname, … nothing changed… just the
> restart).
>
>
>
> What I suppose (we don’t all agree about this) is that, as proxies don’t
> have data, they start very quickly. Too quickly for gossip protocol knows
> that the node was down.
>
> Could this ERROR log be explained if the node is still known as UP by
> seeds servers if the state of the proxy in gossip protocol is not updated
> because stop/start is made too quickly ?
>
> If this hypothesis seems possible, what reasonable delay (with technical
> arguments) should be implemented between stop and start ?
> We have ~ 100 proxies and 12 classical Cassandra (4 of them are seeds)…
>
> Thx in advance
>
>
>
>

RE: Delay between stop/start cassandra

Posted by Regis Le Bretonnic <r....@meetic-corp.com>.

Hi Jeff

Thank a lot for your answer.
The reference to "fat client" is very interesting… On debug log on classical node, we have sometimes message like :

INFO [GossipTasks:1] 2021-03-17 16:21:01,135 Gossiper.java:894 - FatClient /10.120.1.183 has been silent for 30000ms, removing from gossip

Does it means that the fat client (our proxies) is removed from gossip, only after 30 seconds ? In such case, the delay I ask for is 30 sec :-)
Does someone know if this parameter can be changed ?

PS : yes proxies work really well… we indeed use PHP with FPM. That the reason why we have a lot of connections and so need proxies. Basically if counting all FPM on all our PHP servers, I’d said 8000 to 10000 clients… maybe more.
Advantages are multiple but basically we had a lot of pressure on Cassandra node when restarting all our PHP servers during a new code rollout requiring PHP reload for instance (many times per day). Proxies saved us. We can continue to talk in private if you want.

De : Jeff Jirsa <jj...@gmail.com>
Envoyé : mercredi 17 mars 2021 14:52
À : cassandra <us...@cassandra.apache.org>
Objet : Re: Delay between stop/start cassandra

-Dcassandra.join_ring=false is basically a pre-bootstrap phase that says "this machine is about to join the cluster, but hasn't yet, so don't give it a token"

It's taking advantage of a stable but non-terminal state to let you do things like serve queries without owning data - it's a side effect that works, but it's rough because it wasn't exactly built for this purpose. In this state, you're considered a "fat client" - your presence exists in the ring as a "I'm here, about to join the ring with IP a.b.c.d", and you just conveniently decide not to join the ring. If you go away at any time, the cluster says "cool, no big deal, they didn't join the ring anyway".

Your hypothesis is probably mostly right here - it's not so much UP or DOWN, it's "still here" or "gone". Because once the instance is DOWN, it gets removed because it hadn't finished joining. Once it's removed, it can come back and say "Hi, me again, about to join this cluster". But, until it's removed as a fat client, when it comes back and says "Hi, me again, about to join this cluster", cassandra says "not so fast friend, you're already here and we haven't yet given up on you joining".

Random aside: There are relatively few people on earth who run like this, so I'm super interested in knowing how it's working for you. Does the PHP client still reconnect on every page load, or does it finally support long lived connections / pooling if you're using something like php-fpm or a fastcgi pool? Are the coordinators/proxies here just to handle a ridiculous number of clients, or is it the cost of connecting that's hurting as you blow up the native thread pool on connect for expensive auth?

On Wed, Mar 17, 2021 at 5:44 AM Regis Le Bretonnic <r....@meetic-corp.com>> wrote:
Hi all,

Following a discussion with our adminsys, I have a very practical question.
We use cassandra proxies (-Dcassandra.join_ring=false) as coordinators for PHP clients (a loooooot of PHP clients).

Our problem is that restarting Cassandra on proxies sometimes fails with the following error :

ERROR [main] 2021-03-16 14:18:46,236 CassandraDaemon.java:803 - Exception encountered during startup
java.lang.RuntimeException: A node with address XXXXXXXXXXXXXXXX/10.120.1.XXX already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

The node mentioned in the ERROR is the one we are restarting… and the start fails. Of course doing a manual start after works fine.
This message doesn’t make sense… hostId didn’t changed for this proxy (I am sure of me : system.local, IP, hostname, … nothing changed… just the restart).

What I suppose (we don’t all agree about this) is that, as proxies don’t have data, they start very quickly. Too quickly for gossip protocol knows that the node was down.
Could this ERROR log be explained if the node is still known as UP by seeds servers if the state of the proxy in gossip protocol is not updated because stop/start is made too quickly ?
If this hypothesis seems possible, what reasonable delay (with technical arguments) should be implemented between stop and start ?
We have ~ 100 proxies and 12 classical Cassandra (4 of them are seeds)…

Thx in advance

Re: Delay between stop/start cassandra

Posted by Jeff Jirsa <jj...@gmail.com>.

-Dcassandra.join_ring=false is basically a pre-bootstrap phase that says
"this machine is about to join the cluster, but hasn't yet, so don't give
it a token"

It's taking advantage of a stable but non-terminal state to let you do
things like serve queries without owning data - it's a side effect that
works, but it's rough because it wasn't exactly built for this purpose. In
this state, you're considered a "fat client" - your presence exists in the
ring as a "I'm here, about to join the ring with IP a.b.c.d", and you just
conveniently decide not to join the ring. If you go away at any time, the
cluster says "cool, no big deal, they didn't join the ring anyway".

Your hypothesis is probably mostly right here - it's not so much UP or
DOWN, it's "still here" or "gone". Because once the instance is DOWN, it
gets removed because it hadn't finished joining. Once it's removed, it can
come back and say "Hi, me again, about to join this cluster". But, until
it's removed as a fat client, when it comes back and says "Hi, me again,
about to join this cluster", cassandra says "not so fast friend, you're
already here and we haven't yet given up on you joining".

Random aside: There are relatively few people on earth who run like this,
so I'm super interested in knowing how it's working for you. Does the PHP
client still reconnect on every page load, or does it finally support long
lived connections / pooling if you're using something like php-fpm or a
fastcgi pool? Are the coordinators/proxies here just to handle a ridiculous
number of clients, or is it the cost of connecting that's hurting as you
blow up the native thread pool on connect for expensive auth?

On Wed, Mar 17, 2021 at 5:44 AM Regis Le Bretonnic <
r.lebretonnic@meetic-corp.com> wrote:

> Hi all,
>
>
>
> Following a discussion with our adminsys, I have a very practical question.
>
> We use cassandra proxies (-Dcassandra.join_ring=false) as coordinators for
> PHP clients (a loooooot of PHP clients).
>
>
>
> Our problem is that restarting Cassandra on proxies sometimes fails with
> the following error :
>
>
>
> ERROR [main] 2021-03-16 14:18:46,236 CassandraDaemon.java:803 - Exception
> encountered during startup
> java.lang.RuntimeException: A node with address
> XXXXXXXXXXXXXXXX/10.120.1.XXX already exists, cancelling join. Use
> cassandra.replace_address if you want to replace this node.
>
>
>
> The node mentioned in the ERROR is the one we are restarting… and the
> start fails. Of course doing a manual start after works fine.
>
> This message doesn’t make sense… hostId didn’t changed for this proxy (I
> am sure of me : system.local, IP, hostname, … nothing changed… just the
> restart).
>
>
>
> What I suppose (we don’t all agree about this) is that, as proxies don’t
> have data, they start very quickly. Too quickly for gossip protocol knows
> that the node was down.
>
> Could this ERROR log be explained if the node is still known as UP by
> seeds servers if the state of the proxy in gossip protocol is not updated
> because stop/start is made too quickly ?
>
> If this hypothesis seems possible, what reasonable delay (with technical
> arguments) should be implemented between stop and start ?
> We have ~ 100 proxies and 12 classical Cassandra (4 of them are seeds)…
>
> Thx in advance
>
>
>