You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Luca Rondanini <lu...@gmail.com> on 2021/07/19 15:34:11 UTC

R/W timeouts VS number of tables in keyspace

Hi all,

I have a keyspace with almost 900 tables.

Lately I started receiving lots of w/r timeouts (eg
com.datastax.driver.core.exceptions.Read/WriteTimeoutException: Cassandra
timeout during write query at consistency LOCAL_ONE (1 replica were
required but only 0 acknowledged the write).

*I'm even experiencing nodes crashing.*

In the logs I get many warnings like:

WARN  [Service Thread]....GCInspector.java:282 - ConcurrentMarkSweep GC in
4025ms.  CMS Old Ge
n: 2141569800 -> 2116170568; Par Eden Space: 167772160 -> 0; Par Survivor
Space: 20971520 -> 0

WARN  [GossipTasks:1].....FailureDetector.java:288 - Not marking nodes down
due to local pause
of 5038005208 > 5000000000

I know 900 tables is a design error for C* but before a super painful
refactoring I'd like to rule out any configuration problem. Any suggestion?

Thanks a lot,
Luca

Re: R/W timeouts VS number of tables in keyspace

Posted by Yakir Gibraltar <ya...@gmail.com>.

In order to tune GC, you need gc.log or jvm metrics, you can check on
https://gceasy.io/
and see the results before and after the change.

On Mon, Jul 19, 2021 at 7:21 PM Luca Rondanini <lu...@gmail.com>
wrote:

> Thanks Yakir,
>
> I can already experience slow repairs and startups but I'd like to
> stabilize the system before jumping into refactoring (columns are not a
> problem, max 10/cols per table). Do you believe it's a GC problem to cause
> the timeouts and crashes? I'll give it a try and update this post.
>
> Thanks,
> Luca
>
>
>
> On Mon, Jul 19, 2021 at 9:14 AM Yakir Gibraltar <ya...@gmail.com> wrote:
>
>> I recommend rethinking about this design, hard to maintain, slow startup
>> and repair .
>> About GC, try to replace CMS with G1 , see doc :
>> https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/operations/opsTuningGcAbout.html
>> BTW, also many columns may affect performance, see doc:
>> https://thelastpickle.com/blog/2020/12/17/impacts-of-many-columns-in-cassandra-table.html
>>
>> Cheers, Yakir Gibraltar
>>
>>

-- 
*בברכה,*
*יקיר גיברלטר*

Re: R/W timeouts VS number of tables in keyspace

Posted by Luca Rondanini <lu...@gmail.com>.

Thanks Yakir,

I can already experience slow repairs and startups but I'd like to
stabilize the system before jumping into refactoring (columns are not a
problem, max 10/cols per table). Do you believe it's a GC problem to cause
the timeouts and crashes? I'll give it a try and update this post.

Thanks,
Luca

On Mon, Jul 19, 2021 at 9:14 AM Yakir Gibraltar <ya...@gmail.com> wrote:

> I recommend rethinking about this design, hard to maintain, slow startup
> and repair .
> About GC, try to replace CMS with G1 , see doc :
> https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/operations/opsTuningGcAbout.html
> BTW, also many columns may affect performance, see doc:
> https://thelastpickle.com/blog/2020/12/17/impacts-of-many-columns-in-cassandra-table.html
>
> Cheers, Yakir Gibraltar
>
>

Re: R/W timeouts VS number of tables in keyspace

Posted by Yakir Gibraltar <ya...@gmail.com>.

I recommend rethinking about this design, hard to maintain, slow startup
and repair .
About GC, try to replace CMS with G1 , see doc :
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/operations/opsTuningGcAbout.html
BTW, also many columns may affect performance, see doc:
https://thelastpickle.com/blog/2020/12/17/impacts-of-many-columns-in-cassandra-table.html

Cheers, Yakir Gibraltar

Re: R/W timeouts VS number of tables in keyspace

Posted by Erick Ramirez <er...@datastax.com>.

I wanted to add a word of warning that switching to G1 won't necessarily
give you breathing space. In fact, I know it definitely won't.

In your original post, it looked like the node had a very small heap (2GB).
In my experience, you need to allocate at least 8GB of memory to the heap
for production workloads. You might be able to get away with 4GB for apps
with very low traffic but 8GB should really be the minimum. For real
production workloads, 16-24GB is ideal when using CMS. But once you're in
the 20GB+ territory, I recommend switching to G1 since it performs well for
large heap sizes and it is the collector we recommend for heaps between
20-31GB (32GB heap has less addressable objects than 31GB).

It's really important to note that G1 doesn't do well with small heap sizes
and you're better off sticking with CMS in that case. As always, YMMV. I'm
sure others will chime in with their own opinions/experiences. Cheers!

Re: R/W timeouts VS number of tables in keyspace

Posted by Scott Hirleman <sc...@gmail.com>.

I feel like that calls for an anti-pattern -> success blog post Luca 🤣

On Tue, Jul 20, 2021 at 9:17 AM Luca Rondanini <lu...@gmail.com>
wrote:

> Thanks Sean,
>
> I'm switching to G1 in order to gain some time while refactoring. I should
> be able to go down to 4 tables! Yes, the original design was that poor.
>
> Thanks again
>
> On Tue, Jul 20, 2021 at 6:41 AM Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>> Each table in the cluster will have a memtable. This is why you do not
>> want to fracture the memory into 900+ slices. The rule of thumb I have
>> followed is to stay in the low hundreds (maybe 200) tables for the whole
>> cluster. I would be requiring the hard refactoring (or moving tables to
>> different clusters) immediately, since you really need to reduce by at
>> least 700 tables. You are seeing the memory impacts.
>>
>>
>>
>> In addition, in my experience, CMS is much harder to tune. G1GC works
>> well in my use cases without much tuning (or Java-guru level knowledge).
>> However, I don’t think that you will be able to engineer around the 900+
>> tables, no matter which GC you use.
>>
>>
>>
>> Sean Durity – Staff Systems Engineer, Cassandra
>>
>>
>>
>> *From:* Luca Rondanini <lu...@gmail.com>
>> *Sent:* Monday, July 19, 2021 11:34 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] R/W timeouts VS number of tables in keyspace
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I have a keyspace with almost 900 tables.
>>
>>
>>
>> Lately I started receiving lots of w/r timeouts (eg
>> com.datastax.driver.core.exceptions.Read/WriteTimeoutException: Cassandra
>> timeout during write query at consistency LOCAL_ONE (1 replica were
>> required but only 0 acknowledged the write).
>>
>>
>>
>> *I'm even experiencing nodes crashing.*
>>
>>
>>
>> In the logs I get many warnings like:
>>
>>
>>
>> WARN  [Service Thread]....GCInspector.java:282 - ConcurrentMarkSweep GC
>> in 4025ms.  CMS Old Ge
>> n: 2141569800 -> 2116170568; Par Eden Space: 167772160 -> 0; Par Survivor
>> Space: 20971520 -> 0
>>
>>
>> WARN  [GossipTasks:1].....FailureDetector.java:288 - Not marking nodes
>> down due to local pause
>> of 5038005208 > 5000000000
>>
>> I know 900 tables is a design error for C* but before a super painful
>> refactoring I'd like to rule out any configuration problem. Any suggestion?
>>
>>
>>
>> Thanks a lot,
>>
>> Luca
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>

-- 
Scott Hirleman
scott.hirleman@gmail.com

Re: R/W timeouts VS number of tables in keyspace

Posted by Luca Rondanini <lu...@gmail.com>.

Thanks Sean,

I'm switching to G1 in order to gain some time while refactoring. I should
be able to go down to 4 tables! Yes, the original design was that poor.

Thanks again

On Tue, Jul 20, 2021 at 6:41 AM Durity, Sean R <SE...@homedepot.com>
wrote:

> Each table in the cluster will have a memtable. This is why you do not
> want to fracture the memory into 900+ slices. The rule of thumb I have
> followed is to stay in the low hundreds (maybe 200) tables for the whole
> cluster. I would be requiring the hard refactoring (or moving tables to
> different clusters) immediately, since you really need to reduce by at
> least 700 tables. You are seeing the memory impacts.
>
>
>
> In addition, in my experience, CMS is much harder to tune. G1GC works well
> in my use cases without much tuning (or Java-guru level knowledge).
> However, I don’t think that you will be able to engineer around the 900+
> tables, no matter which GC you use.
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Luca Rondanini <lu...@gmail.com>
> *Sent:* Monday, July 19, 2021 11:34 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] R/W timeouts VS number of tables in keyspace
>
>
>
> Hi all,
>
>
>
> I have a keyspace with almost 900 tables.
>
>
>
> Lately I started receiving lots of w/r timeouts (eg
> com.datastax.driver.core.exceptions.Read/WriteTimeoutException: Cassandra
> timeout during write query at consistency LOCAL_ONE (1 replica were
> required but only 0 acknowledged the write).
>
>
>
> *I'm even experiencing nodes crashing.*
>
>
>
> In the logs I get many warnings like:
>
>
>
> WARN  [Service Thread]....GCInspector.java:282 - ConcurrentMarkSweep GC in
> 4025ms.  CMS Old Ge
> n: 2141569800 -> 2116170568; Par Eden Space: 167772160 -> 0; Par Survivor
> Space: 20971520 -> 0
>
>
> WARN  [GossipTasks:1].....FailureDetector.java:288 - Not marking nodes
> down due to local pause
> of 5038005208 > 5000000000
>
> I know 900 tables is a design error for C* but before a super painful
> refactoring I'd like to rule out any configuration problem. Any suggestion?
>
>
>
> Thanks a lot,
>
> Luca
>
>
>
>
>
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: R/W timeouts VS number of tables in keyspace

Posted by "Durity, Sean R" <SE...@homedepot.com>.

Each table in the cluster will have a memtable. This is why you do not want to fracture the memory into 900+ slices. The rule of thumb I have followed is to stay in the low hundreds (maybe 200) tables for the whole cluster. I would be requiring the hard refactoring (or moving tables to different clusters) immediately, since you really need to reduce by at least 700 tables. You are seeing the memory impacts.

In addition, in my experience, CMS is much harder to tune. G1GC works well in my use cases without much tuning (or Java-guru level knowledge). However, I don’t think that you will be able to engineer around the 900+ tables, no matter which GC you use.

Sean Durity – Staff Systems Engineer, Cassandra

From: Luca Rondanini <lu...@gmail.com>
Sent: Monday, July 19, 2021 11:34 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] R/W timeouts VS number of tables in keyspace

Hi all,

I have a keyspace with almost 900 tables.

Lately I started receiving lots of w/r timeouts (eg com.datastax.driver.core.exceptions.Read/WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (1 replica were required but only 0 acknowledged the write).

I'm even experiencing nodes crashing.

In the logs I get many warnings like:

WARN  [Service Thread]....GCInspector.java:282 - ConcurrentMarkSweep GC in 4025ms.  CMS Old Ge
n: 2141569800 -> 2116170568; Par Eden Space: 167772160 -> 0; Par Survivor Space: 20971520 -> 0

WARN  [GossipTasks:1].....FailureDetector.java:288 - Not marking nodes down due to local pause
of 5038005208 > 5000000000
I know 900 tables is a design error for C* but before a super painful refactoring I'd like to rule out any configuration problem. Any suggestion?

Thanks a lot,
Luca

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.