You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Marco Gasparini <ma...@competitoor.com> on 2019/08/26 09:09:50 UTC
read failures and high read latency
hi everybody,
I'm experiencing some read failures and high read latency (watch the
attached picture for more detailes).
- I have a cluster of 6 nodes with 1.5TB of occupied disk space for each
node. Running Cassandra 3.11.4
4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
Each node has spinning disk.
- Some fields from cassandra.yaml configuration:
concurrent_reads: 64
concurrent_writes: 64
concurrent_counter_writes: 64
file_cache_size_in_mb: 2048
memtable_cleanup_threshold: 0.2
memtable_flush_writers: 4
memtable_allocation_type: offheap_objects
- CQL schema and RF:
CREATE KEYSPACE myks WITH replication = {'class':
'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = false;
CREATE TABLE myks.mytable (
id bigint,
type text,
page int,
event_datetime timestamp,
agent text,
portion text,
raw text,
status int,
status_code_pass int,
dom bigint,
reached text,
tt text,
PRIMARY KEY ((id, type), page, event_datetime)
) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 90000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
- I do queries that reads 3 rows at a time where the total data size is
between 5MB and 20MB.
How can I improve the reading performances?
I could stand losing some writing speed in order to improve the reading
speed.
if you need more information, please ask.
Thanks
Marco
[image: grafana_cassandra.png]
Re: read failures and high read latency
Posted by Ahmed Eljami <ah...@gmail.com>.
> Not just saturating the drives - note some of those nodes have only 4GB
ram for max heap.
what do you mean?
Not enough for a production workload.
It's recommended that you allocate 8GB if you use CMS GC, 16GB (or more)
in case of G1 GC.
You can find more details on the blog of TLP written by Jon if you will
plan a tuning of your JVM:
https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
Le mar. 27 août 2019 à 09:11, Marco Gasparini <
marco.gasparini@competitoor.com> a écrit :
> thank you all for answering.
>
> During the peek of workload I'm measuring each nodes statistics, cpu and
> I/O statistics included, and I noticed a lot of time spent for IOWAIT
> (30-40% of the total cpu usage during the peek).
> It seems that the bottleneck is the spinning disk, I'm wondering if I
> could try to modify cassandra's configuration in order to improve the RAM
> utilisation.
>
> > Not just saturating the drives - note some of those nodes have only 4GB
> ram for max heap.
> what do you mean?
>
> > Cassandra performs very poorly with payloads > 1MB. 20MB is WAY too
> big. What you need is a blob / object store, not Cassandra.
> yes, we understood that but we chose Cassandra for other reasons an now we
> need to keep it.
>
> Marco
>
>
> Il giorno lun 26 ago 2019 alle ore 22:21 Jon Haddad <jo...@jonhaddad.com>
> ha scritto:
>
>> Not just saturating the drives - note some of those nodes have only 4GB
>> ram for max heap.
>>
>> Cassandra performs very poorly with payloads > 1MB. 20MB is WAY too
>> big. What you need is a blob / object store, not Cassandra.
>>
>> Jon
>>
>> On Mon, Aug 26, 2019 at 9:45 AM Marc Selwan <ma...@datastax.com>
>> wrote:
>>
>>> * I do queries that reads 3 rows at a time where the total data size is
>>> between 5MB and 20MB*
>>>
>>> There's a good chance you're saturating those drives with payloads like
>>> that. Do you happen to have dashboards or capture IO metrics?
>>>
>>> Best,
>>> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
>>> Twitter <https://twitter.com/MarcSelwan>
>>>
>>> * Quick links | *DataStax <http://www.datastax.com> *| *Training
>>> <http://www.academy.datastax.com> *| *Documentation
>>> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>>> *| *Downloads <http://www.datastax.com/download>
>>>
>>>
>>>
>>> On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
>>> marco.gasparini@competitoor.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> The error is the following:
>>>>
>>>> All host(s) tried for query failed. First host tried,
>>>> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>>>>
>>>>
>>>> in system.log I don't have any exceptions.
>>>>
>>>> I see 4 odds logs :
>>>>
>>>> - every period of time StatusLogger logs the table containing "Pool
>>>> Name Active Pending Completed Blocked All Time
>>>> Blocked"
>>>>
>>>> - log Maximum memory usage reached (2147483648), cannot allocate chunk
>>>> of 1048576
>>>>
>>>> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
>>>> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>>> cross-node dropped latency: 5960 ms
>>>>
>>>> - log Some operations were slow, details available at debug level
>>>>
>>>>
>>>>
>>>>
>>>> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
>>>> inquiallen@gmail.com> ha scritto:
>>>>
>>>>> Hello Marco,
>>>>>
>>>>> May you pls share error, exception logs seen in system.log files in
>>>>> the environment.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>>>>> marco.gasparini@competitoor.com> wrote:
>>>>>
>>>>>> hi everybody,
>>>>>>
>>>>>> I'm experiencing some read failures and high read latency (watch the
>>>>>> attached picture for more detailes).
>>>>>>
>>>>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for
>>>>>> each node. Running Cassandra 3.11.4
>>>>>>
>>>>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>>>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>>>>
>>>>>> Each node has spinning disk.
>>>>>>
>>>>>> - Some fields from cassandra.yaml configuration:
>>>>>>
>>>>>> concurrent_reads: 64
>>>>>> concurrent_writes: 64
>>>>>> concurrent_counter_writes: 64
>>>>>>
>>>>>> file_cache_size_in_mb: 2048
>>>>>>
>>>>>> memtable_cleanup_threshold: 0.2
>>>>>> memtable_flush_writers: 4
>>>>>> memtable_allocation_type: offheap_objects
>>>>>>
>>>>>> - CQL schema and RF:
>>>>>>
>>>>>> CREATE KEYSPACE myks WITH replication = {'class':
>>>>>> 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = false;
>>>>>> CREATE TABLE myks.mytable (
>>>>>> id bigint,
>>>>>> type text,
>>>>>> page int,
>>>>>> event_datetime timestamp,
>>>>>> agent text,
>>>>>> portion text,
>>>>>> raw text,
>>>>>> status int,
>>>>>> status_code_pass int,
>>>>>> dom bigint,
>>>>>> reached text,
>>>>>> tt text,
>>>>>> PRIMARY KEY ((id, type), page, event_datetime)
>>>>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>>>>> AND bloom_filter_fp_chance = 0.01
>>>>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>> AND comment = ''
>>>>>> AND compaction = {'class':
>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>> AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>> AND crc_check_chance = 1.0
>>>>>> AND dclocal_read_repair_chance = 0.1
>>>>>> AND default_time_to_live = 0
>>>>>> AND gc_grace_seconds = 90000
>>>>>> AND max_index_interval = 2048
>>>>>> AND memtable_flush_period_in_ms = 0
>>>>>> AND min_index_interval = 128
>>>>>> AND read_repair_chance = 0.0
>>>>>> AND speculative_retry = '99PERCENTILE';
>>>>>>
>>>>>>
>>>>>> - I do queries that reads 3 rows at a time where the total data size
>>>>>> is between 5MB and 20MB.
>>>>>>
>>>>>>
>>>>>> How can I improve the reading performances?
>>>>>> I could stand losing some writing speed in order to improve the
>>>>>> reading speed.
>>>>>>
>>>>>> if you need more information, please ask.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Marco
>>>>>> [image: grafana_cassandra.png]
>>>>>>
>>>>>
--
Cordialement;
Ahmed ELJAMI
Re: read failures and high read latency
Posted by Marco Gasparini <ma...@competitoor.com>.
thank you all for answering.
During the peek of workload I'm measuring each nodes statistics, cpu and
I/O statistics included, and I noticed a lot of time spent for IOWAIT
(30-40% of the total cpu usage during the peek).
It seems that the bottleneck is the spinning disk, I'm wondering if I could
try to modify cassandra's configuration in order to improve the RAM
utilisation.
> Not just saturating the drives - note some of those nodes have only 4GB
ram for max heap.
what do you mean?
> Cassandra performs very poorly with payloads > 1MB. 20MB is WAY too
big. What you need is a blob / object store, not Cassandra.
yes, we understood that but we chose Cassandra for other reasons an now we
need to keep it.
Marco
Il giorno lun 26 ago 2019 alle ore 22:21 Jon Haddad <jo...@jonhaddad.com> ha
scritto:
> Not just saturating the drives - note some of those nodes have only 4GB
> ram for max heap.
>
> Cassandra performs very poorly with payloads > 1MB. 20MB is WAY too big.
> What you need is a blob / object store, not Cassandra.
>
> Jon
>
> On Mon, Aug 26, 2019 at 9:45 AM Marc Selwan <ma...@datastax.com>
> wrote:
>
>> * I do queries that reads 3 rows at a time where the total data size is
>> between 5MB and 20MB*
>>
>> There's a good chance you're saturating those drives with payloads like
>> that. Do you happen to have dashboards or capture IO metrics?
>>
>> Best,
>> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
>> Twitter <https://twitter.com/MarcSelwan>
>>
>> * Quick links | *DataStax <http://www.datastax.com> *| *Training
>> <http://www.academy.datastax.com> *| *Documentation
>> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>> *| *Downloads <http://www.datastax.com/download>
>>
>>
>>
>> On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
>> marco.gasparini@competitoor.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> The error is the following:
>>>
>>> All host(s) tried for query failed. First host tried,
>>> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>>>
>>>
>>> in system.log I don't have any exceptions.
>>>
>>> I see 4 odds logs :
>>>
>>> - every period of time StatusLogger logs the table containing "Pool Name
>>> Active Pending Completed Blocked All Time
>>> Blocked"
>>>
>>> - log Maximum memory usage reached (2147483648), cannot allocate chunk
>>> of 1048576
>>>
>>> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
>>> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>> cross-node dropped latency: 5960 ms
>>>
>>> - log Some operations were slow, details available at debug level
>>>
>>>
>>>
>>>
>>> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
>>> inquiallen@gmail.com> ha scritto:
>>>
>>>> Hello Marco,
>>>>
>>>> May you pls share error, exception logs seen in system.log files in the
>>>> environment.
>>>>
>>>> Thanks
>>>>
>>>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>>>> marco.gasparini@competitoor.com> wrote:
>>>>
>>>>> hi everybody,
>>>>>
>>>>> I'm experiencing some read failures and high read latency (watch the
>>>>> attached picture for more detailes).
>>>>>
>>>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for
>>>>> each node. Running Cassandra 3.11.4
>>>>>
>>>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>>>
>>>>> Each node has spinning disk.
>>>>>
>>>>> - Some fields from cassandra.yaml configuration:
>>>>>
>>>>> concurrent_reads: 64
>>>>> concurrent_writes: 64
>>>>> concurrent_counter_writes: 64
>>>>>
>>>>> file_cache_size_in_mb: 2048
>>>>>
>>>>> memtable_cleanup_threshold: 0.2
>>>>> memtable_flush_writers: 4
>>>>> memtable_allocation_type: offheap_objects
>>>>>
>>>>> - CQL schema and RF:
>>>>>
>>>>> CREATE KEYSPACE myks WITH replication = {'class':
>>>>> 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = false;
>>>>> CREATE TABLE myks.mytable (
>>>>> id bigint,
>>>>> type text,
>>>>> page int,
>>>>> event_datetime timestamp,
>>>>> agent text,
>>>>> portion text,
>>>>> raw text,
>>>>> status int,
>>>>> status_code_pass int,
>>>>> dom bigint,
>>>>> reached text,
>>>>> tt text,
>>>>> PRIMARY KEY ((id, type), page, event_datetime)
>>>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>>>> AND bloom_filter_fp_chance = 0.01
>>>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>> AND comment = ''
>>>>> AND compaction = {'class':
>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>> AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>> AND crc_check_chance = 1.0
>>>>> AND dclocal_read_repair_chance = 0.1
>>>>> AND default_time_to_live = 0
>>>>> AND gc_grace_seconds = 90000
>>>>> AND max_index_interval = 2048
>>>>> AND memtable_flush_period_in_ms = 0
>>>>> AND min_index_interval = 128
>>>>> AND read_repair_chance = 0.0
>>>>> AND speculative_retry = '99PERCENTILE';
>>>>>
>>>>>
>>>>> - I do queries that reads 3 rows at a time where the total data size
>>>>> is between 5MB and 20MB.
>>>>>
>>>>>
>>>>> How can I improve the reading performances?
>>>>> I could stand losing some writing speed in order to improve the
>>>>> reading speed.
>>>>>
>>>>> if you need more information, please ask.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Marco
>>>>> [image: grafana_cassandra.png]
>>>>>
>>>>
Re: read failures and high read latency
Posted by Jon Haddad <jo...@jonhaddad.com>.
Not just saturating the drives - note some of those nodes have only 4GB ram
for max heap.
Cassandra performs very poorly with payloads > 1MB. 20MB is WAY too big.
What you need is a blob / object store, not Cassandra.
Jon
On Mon, Aug 26, 2019 at 9:45 AM Marc Selwan <ma...@datastax.com>
wrote:
> * I do queries that reads 3 rows at a time where the total data size is
> between 5MB and 20MB*
>
> There's a good chance you're saturating those drives with payloads like
> that. Do you happen to have dashboards or capture IO metrics?
>
> Best,
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> * Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
> *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
> marco.gasparini@competitoor.com> wrote:
>
>> Hi,
>>
>>
>> The error is the following:
>>
>> All host(s) tried for query failed. First host tried,
>> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>>
>>
>> in system.log I don't have any exceptions.
>>
>> I see 4 odds logs :
>>
>> - every period of time StatusLogger logs the table containing "Pool Name
>> Active Pending Completed Blocked All Time
>> Blocked"
>>
>> - log Maximum memory usage reached (2147483648), cannot allocate chunk of
>> 1048576
>>
>> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
>> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>> cross-node dropped latency: 5960 ms
>>
>> - log Some operations were slow, details available at debug level
>>
>>
>>
>>
>> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
>> inquiallen@gmail.com> ha scritto:
>>
>>> Hello Marco,
>>>
>>> May you pls share error, exception logs seen in system.log files in the
>>> environment.
>>>
>>> Thanks
>>>
>>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>>> marco.gasparini@competitoor.com> wrote:
>>>
>>>> hi everybody,
>>>>
>>>> I'm experiencing some read failures and high read latency (watch the
>>>> attached picture for more detailes).
>>>>
>>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for
>>>> each node. Running Cassandra 3.11.4
>>>>
>>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>>
>>>> Each node has spinning disk.
>>>>
>>>> - Some fields from cassandra.yaml configuration:
>>>>
>>>> concurrent_reads: 64
>>>> concurrent_writes: 64
>>>> concurrent_counter_writes: 64
>>>>
>>>> file_cache_size_in_mb: 2048
>>>>
>>>> memtable_cleanup_threshold: 0.2
>>>> memtable_flush_writers: 4
>>>> memtable_allocation_type: offheap_objects
>>>>
>>>> - CQL schema and RF:
>>>>
>>>> CREATE KEYSPACE myks WITH replication = {'class':
>>>> 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = false;
>>>> CREATE TABLE myks.mytable (
>>>> id bigint,
>>>> type text,
>>>> page int,
>>>> event_datetime timestamp,
>>>> agent text,
>>>> portion text,
>>>> raw text,
>>>> status int,
>>>> status_code_pass int,
>>>> dom bigint,
>>>> reached text,
>>>> tt text,
>>>> PRIMARY KEY ((id, type), page, event_datetime)
>>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>>> AND bloom_filter_fp_chance = 0.01
>>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>> AND comment = ''
>>>> AND compaction = {'class':
>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>> AND compression = {'chunk_length_in_kb': '64', 'class':
>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>> AND crc_check_chance = 1.0
>>>> AND dclocal_read_repair_chance = 0.1
>>>> AND default_time_to_live = 0
>>>> AND gc_grace_seconds = 90000
>>>> AND max_index_interval = 2048
>>>> AND memtable_flush_period_in_ms = 0
>>>> AND min_index_interval = 128
>>>> AND read_repair_chance = 0.0
>>>> AND speculative_retry = '99PERCENTILE';
>>>>
>>>>
>>>> - I do queries that reads 3 rows at a time where the total data size is
>>>> between 5MB and 20MB.
>>>>
>>>>
>>>> How can I improve the reading performances?
>>>> I could stand losing some writing speed in order to improve the reading
>>>> speed.
>>>>
>>>> if you need more information, please ask.
>>>>
>>>>
>>>> Thanks
>>>> Marco
>>>> [image: grafana_cassandra.png]
>>>>
>>>
Re: read failures and high read latency
Posted by Marc Selwan <ma...@datastax.com>.
* I do queries that reads 3 rows at a time where the total data size is
between 5MB and 20MB*
There's a good chance you're saturating those drives with payloads like
that. Do you happen to have dashboards or capture IO metrics?
Best,
*Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
Twitter <https://twitter.com/MarcSelwan>
* Quick links | *DataStax <http://www.datastax.com> *| *Training
<http://www.academy.datastax.com> *| *Documentation
<http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
*| *Downloads <http://www.datastax.com/download>
On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
marco.gasparini@competitoor.com> wrote:
> Hi,
>
>
> The error is the following:
>
> All host(s) tried for query failed. First host tried,
> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>
>
> in system.log I don't have any exceptions.
>
> I see 4 odds logs :
>
> - every period of time StatusLogger logs the table containing "Pool Name
> Active Pending Completed Blocked All Time
> Blocked"
>
> - log Maximum memory usage reached (2147483648), cannot allocate chunk of
> 1048576
>
> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
> cross-node dropped latency: 5960 ms
>
> - log Some operations were slow, details available at debug level
>
>
>
>
> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
> inquiallen@gmail.com> ha scritto:
>
>> Hello Marco,
>>
>> May you pls share error, exception logs seen in system.log files in the
>> environment.
>>
>> Thanks
>>
>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>> marco.gasparini@competitoor.com> wrote:
>>
>>> hi everybody,
>>>
>>> I'm experiencing some read failures and high read latency (watch the
>>> attached picture for more detailes).
>>>
>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for each
>>> node. Running Cassandra 3.11.4
>>>
>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>
>>> Each node has spinning disk.
>>>
>>> - Some fields from cassandra.yaml configuration:
>>>
>>> concurrent_reads: 64
>>> concurrent_writes: 64
>>> concurrent_counter_writes: 64
>>>
>>> file_cache_size_in_mb: 2048
>>>
>>> memtable_cleanup_threshold: 0.2
>>> memtable_flush_writers: 4
>>> memtable_allocation_type: offheap_objects
>>>
>>> - CQL schema and RF:
>>>
>>> CREATE KEYSPACE myks WITH replication = {'class':
>>> 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = false;
>>> CREATE TABLE myks.mytable (
>>> id bigint,
>>> type text,
>>> page int,
>>> event_datetime timestamp,
>>> agent text,
>>> portion text,
>>> raw text,
>>> status int,
>>> status_code_pass int,
>>> dom bigint,
>>> reached text,
>>> tt text,
>>> PRIMARY KEY ((id, type), page, event_datetime)
>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>> AND bloom_filter_fp_chance = 0.01
>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>> AND comment = ''
>>> AND compaction = {'class':
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>> 'max_threshold': '32', 'min_threshold': '4'}
>>> AND compression = {'chunk_length_in_kb': '64', 'class':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND crc_check_chance = 1.0
>>> AND dclocal_read_repair_chance = 0.1
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 90000
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 0
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99PERCENTILE';
>>>
>>>
>>> - I do queries that reads 3 rows at a time where the total data size is
>>> between 5MB and 20MB.
>>>
>>>
>>> How can I improve the reading performances?
>>> I could stand losing some writing speed in order to improve the reading
>>> speed.
>>>
>>> if you need more information, please ask.
>>>
>>>
>>> Thanks
>>> Marco
>>> [image: grafana_cassandra.png]
>>>
>>
Re: read failures and high read latency
Posted by Marco Gasparini <ma...@competitoor.com>.
Hi,
The error is the following:
All host(s) tried for query failed. First host tried, xxx.xxx.xxx.xxx:9042:
Host considered as DOWN.
in system.log I don't have any exceptions.
I see 4 odds logs :
- every period of time StatusLogger logs the table containing "Pool Name
Active Pending Completed Blocked All Time
Blocked"
- log Maximum memory usage reached (2147483648), cannot allocate chunk of
1048576
- logDroppedMessages READ messages were dropped in last 5000 ms: 0 internal
and 1 cross node. Mean internal dropped latency: 0 ms and Mean cross-node
dropped latency: 5960 ms
- log Some operations were slow, details available at debug level
Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
inquiallen@gmail.com> ha scritto:
> Hello Marco,
>
> May you pls share error, exception logs seen in system.log files in the
> environment.
>
> Thanks
>
> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
> marco.gasparini@competitoor.com> wrote:
>
>> hi everybody,
>>
>> I'm experiencing some read failures and high read latency (watch the
>> attached picture for more detailes).
>>
>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for each
>> node. Running Cassandra 3.11.4
>>
>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>
>> Each node has spinning disk.
>>
>> - Some fields from cassandra.yaml configuration:
>>
>> concurrent_reads: 64
>> concurrent_writes: 64
>> concurrent_counter_writes: 64
>>
>> file_cache_size_in_mb: 2048
>>
>> memtable_cleanup_threshold: 0.2
>> memtable_flush_writers: 4
>> memtable_allocation_type: offheap_objects
>>
>> - CQL schema and RF:
>>
>> CREATE KEYSPACE myks WITH replication = {'class':
>> 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = false;
>> CREATE TABLE myks.mytable (
>> id bigint,
>> type text,
>> page int,
>> event_datetime timestamp,
>> agent text,
>> portion text,
>> raw text,
>> status int,
>> status_code_pass int,
>> dom bigint,
>> reached text,
>> tt text,
>> PRIMARY KEY ((id, type), page, event_datetime)
>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>> AND bloom_filter_fp_chance = 0.01
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>> AND comment = ''
>> AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>> AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 90000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>>
>> - I do queries that reads 3 rows at a time where the total data size is
>> between 5MB and 20MB.
>>
>>
>> How can I improve the reading performances?
>> I could stand losing some writing speed in order to improve the reading
>> speed.
>>
>> if you need more information, please ask.
>>
>>
>> Thanks
>> Marco
>> [image: grafana_cassandra.png]
>>
>
Re: read failures and high read latency
Posted by Inquistive allen <in...@gmail.com>.
Hello Marco,
May you pls share error, exception logs seen in system.log files in the
environment.
Thanks
On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
marco.gasparini@competitoor.com> wrote:
> hi everybody,
>
> I'm experiencing some read failures and high read latency (watch the
> attached picture for more detailes).
>
> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for each
> node. Running Cassandra 3.11.4
>
> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>
> Each node has spinning disk.
>
> - Some fields from cassandra.yaml configuration:
>
> concurrent_reads: 64
> concurrent_writes: 64
> concurrent_counter_writes: 64
>
> file_cache_size_in_mb: 2048
>
> memtable_cleanup_threshold: 0.2
> memtable_flush_writers: 4
> memtable_allocation_type: offheap_objects
>
> - CQL schema and RF:
>
> CREATE KEYSPACE myks WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = false;
> CREATE TABLE myks.mytable (
> id bigint,
> type text,
> page int,
> event_datetime timestamp,
> agent text,
> portion text,
> raw text,
> status int,
> status_code_pass int,
> dom bigint,
> reached text,
> tt text,
> PRIMARY KEY ((id, type), page, event_datetime)
> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 90000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
>
> - I do queries that reads 3 rows at a time where the total data size is
> between 5MB and 20MB.
>
>
> How can I improve the reading performances?
> I could stand losing some writing speed in order to improve the reading
> speed.
>
> if you need more information, please ask.
>
>
> Thanks
> Marco
> [image: grafana_cassandra.png]
>