You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joe Obernberger <jo...@gmail.com> on 2021/12/03 20:51:41 UTC

Added node - now queries time out

Hi all - just added a node to an 11 node cluster (4.0.1) and it synced 
up OK, but now all queries are timing out.
This time I made sure the clocks are synced!  :)

Kinda desperate to get this to work again.  What can I check do? Just 
added the .34 node.  One item of concern is the amount of load/data on 
it compared to the others.
I'm running a repair on the new node, but things like select * from 
table, on a table with maybe 100 rows times out.
Help!

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load        Tokens  Owns  Host 
ID                               Rack
UN  172.16.100.45   161.81 GiB  250     ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  172.16.100.251  128.6 GiB   200     ? 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.252  128.44 GiB  200     ? 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  128.43 GiB  200     ? 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   128.79 GiB  200     ? 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   127.47 GiB  200     ? 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  2.19 GiB    4       ? 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  127.74 GiB  200     ? 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   75.89 GiB   120     ? 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  128.3 GiB   200     ? 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
UN  172.16.100.34   29.67 GiB   200     ? 
84219e6d-74ac-4d23-89d0-0bd734d0c09e  rack1

-joe


Re: Added node - now queries time out

Posted by Joe Obernberger <jo...@gmail.com>.
Thank you!

Interestingly as the node was being decommissioned the load/storage 
increased.  Once it was removed, I bounced the entire cluster and now 
it's working.

-Joe

On 12/3/2021 4:31 PM, Bowen Song wrote:
> The load on the new server looks clearly wrong. Are you sure this node 
> has fully bootstraped / rebuilt? If not, the large amount of streaming 
> activity triggered by read repair may be enough to cause timeouts. 
> Please check the new server's log and make sure it did not fail any 
> streaming session when it first joined the cluster. If in doubt, 
> remove the node and re-add it, and keep an eye on the log.
>
> On 03/12/2021 20:51, Joe Obernberger wrote:
>> Hi all - just added a node to an 11 node cluster (4.0.1) and it 
>> synced up OK, but now all queries are timing out.
>> This time I made sure the clocks are synced!  :)
>>
>> Kinda desperate to get this to work again.  What can I check do? Just 
>> added the .34 node.  One item of concern is the amount of load/data 
>> on it compared to the others.
>> I'm running a repair on the new node, but things like select * from 
>> table, on a table with maybe 100 rows times out.
>> Help!
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load        Tokens  Owns  Host 
>> ID                               Rack
>> UN  172.16.100.45   161.81 GiB  250     ? 
>> 07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
>> UN  172.16.100.251  128.6 GiB   200     ? 
>> 660f476c-a124-4ca0-b55f-75efe56370da  rack1
>> UN  172.16.100.252  128.44 GiB  200     ? 
>> e83aa851-69b4-478f-88f6-60e657ea6539  rack1
>> UN  172.16.100.249  128.43 GiB  200     ? 
>> 49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
>> UN  172.16.100.36   128.79 GiB  200     ? 
>> d9702f96-256e-45ae-8e12-69a42712be50  rack1
>> UN  172.16.100.39   127.47 GiB  200     ? 
>> 93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
>> UN  172.16.100.253  2.19 GiB    4       ? 
>> a1a16910-9167-4174-b34b-eb859d36347e  rack1
>> UN  172.16.100.248  127.74 GiB  200     ? 
>> 4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
>> UN  172.16.100.37   75.89 GiB   120     ? 
>> 08a19658-40be-4e55-8709-812b3d4ac750  rack1
>> UN  172.16.100.250  128.3 GiB   200     ? 
>> b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
>> UN  172.16.100.34   29.67 GiB   200     ? 
>> 84219e6d-74ac-4d23-89d0-0bd734d0c09e  rack1
>>
>> -joe
>>
>

Re: Added node - now queries time out

Posted by Joe Obernberger <jo...@gmail.com>.
This worked - decommissioned the node, and re-adding it worked.

If a drive fails on a Cassandra node, what is the process to bring that 
node back up?

-joe

On 12/3/2021 4:31 PM, Bowen Song wrote:
> The load on the new server looks clearly wrong. Are you sure this node 
> has fully bootstraped / rebuilt? If not, the large amount of streaming 
> activity triggered by read repair may be enough to cause timeouts. 
> Please check the new server's log and make sure it did not fail any 
> streaming session when it first joined the cluster. If in doubt, 
> remove the node and re-add it, and keep an eye on the log.
>
> On 03/12/2021 20:51, Joe Obernberger wrote:
>> Hi all - just added a node to an 11 node cluster (4.0.1) and it 
>> synced up OK, but now all queries are timing out.
>> This time I made sure the clocks are synced!  :)
>>
>> Kinda desperate to get this to work again.  What can I check do? Just 
>> added the .34 node.  One item of concern is the amount of load/data 
>> on it compared to the others.
>> I'm running a repair on the new node, but things like select * from 
>> table, on a table with maybe 100 rows times out.
>> Help!
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load        Tokens  Owns  Host 
>> ID                               Rack
>> UN  172.16.100.45   161.81 GiB  250     ? 
>> 07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
>> UN  172.16.100.251  128.6 GiB   200     ? 
>> 660f476c-a124-4ca0-b55f-75efe56370da  rack1
>> UN  172.16.100.252  128.44 GiB  200     ? 
>> e83aa851-69b4-478f-88f6-60e657ea6539  rack1
>> UN  172.16.100.249  128.43 GiB  200     ? 
>> 49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
>> UN  172.16.100.36   128.79 GiB  200     ? 
>> d9702f96-256e-45ae-8e12-69a42712be50  rack1
>> UN  172.16.100.39   127.47 GiB  200     ? 
>> 93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
>> UN  172.16.100.253  2.19 GiB    4       ? 
>> a1a16910-9167-4174-b34b-eb859d36347e  rack1
>> UN  172.16.100.248  127.74 GiB  200     ? 
>> 4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
>> UN  172.16.100.37   75.89 GiB   120     ? 
>> 08a19658-40be-4e55-8709-812b3d4ac750  rack1
>> UN  172.16.100.250  128.3 GiB   200     ? 
>> b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
>> UN  172.16.100.34   29.67 GiB   200     ? 
>> 84219e6d-74ac-4d23-89d0-0bd734d0c09e  rack1
>>
>> -joe
>>
>

Re: Added node - now queries time out

Posted by Bowen Song <bo...@bso.ng>.
The load on the new server looks clearly wrong. Are you sure this node 
has fully bootstraped / rebuilt? If not, the large amount of streaming 
activity triggered by read repair may be enough to cause timeouts. 
Please check the new server's log and make sure it did not fail any 
streaming session when it first joined the cluster. If in doubt, remove 
the node and re-add it, and keep an eye on the log.

On 03/12/2021 20:51, Joe Obernberger wrote:
> Hi all - just added a node to an 11 node cluster (4.0.1) and it synced 
> up OK, but now all queries are timing out.
> This time I made sure the clocks are synced!  :)
>
> Kinda desperate to get this to work again.  What can I check do? Just 
> added the .34 node.  One item of concern is the amount of load/data on 
> it compared to the others.
> I'm running a repair on the new node, but things like select * from 
> table, on a table with maybe 100 rows times out.
> Help!
>
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load        Tokens  Owns  Host 
> ID                               Rack
> UN  172.16.100.45   161.81 GiB  250     ? 
> 07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
> UN  172.16.100.251  128.6 GiB   200     ? 
> 660f476c-a124-4ca0-b55f-75efe56370da  rack1
> UN  172.16.100.252  128.44 GiB  200     ? 
> e83aa851-69b4-478f-88f6-60e657ea6539  rack1
> UN  172.16.100.249  128.43 GiB  200     ? 
> 49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
> UN  172.16.100.36   128.79 GiB  200     ? 
> d9702f96-256e-45ae-8e12-69a42712be50  rack1
> UN  172.16.100.39   127.47 GiB  200     ? 
> 93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
> UN  172.16.100.253  2.19 GiB    4       ? 
> a1a16910-9167-4174-b34b-eb859d36347e  rack1
> UN  172.16.100.248  127.74 GiB  200     ? 
> 4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
> UN  172.16.100.37   75.89 GiB   120     ? 
> 08a19658-40be-4e55-8709-812b3d4ac750  rack1
> UN  172.16.100.250  128.3 GiB   200     ? 
> b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
> UN  172.16.100.34   29.67 GiB   200     ? 
> 84219e6d-74ac-4d23-89d0-0bd734d0c09e  rack1
>
> -joe
>