You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Marco Gasparini <ma...@competitoor.com> on 2019/08/26 09:09:50 UTC

read failures and high read latency

hi everybody,

I'm experiencing some read failures and high read latency (watch the
attached picture for more detailes).

- I have a cluster of 6 nodes with 1.5TB of occupied disk space for each
node. Running Cassandra 3.11.4

4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.

Each node has spinning disk.

- Some fields from cassandra.yaml configuration:

concurrent_reads: 64
concurrent_writes: 64
concurrent_counter_writes: 64

file_cache_size_in_mb: 2048

memtable_cleanup_threshold: 0.2
memtable_flush_writers: 4
memtable_allocation_type: offheap_objects

- CQL schema and RF:

CREATE KEYSPACE myks WITH replication = {'class':
'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
CREATE TABLE myks.mytable (
    id bigint,
    type text,
    page int,
    event_datetime timestamp,
    agent text,
    portion text,
    raw text,
    status int,
    status_code_pass int,
    dom bigint,
    reached text,
    tt text,
    PRIMARY KEY ((id, type), page, event_datetime)
) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 90000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';


- I do queries that reads 3 rows at a time where the total data size is
between 5MB and 20MB.


How can I improve the reading performances?
I could stand losing some writing speed in order to improve the reading
speed.

if you need more information, please ask.


Thanks
Marco
[image: grafana_cassandra.png]

Re: read failures and high read latency

Posted by Ahmed Eljami <ah...@gmail.com>.
 > Not just saturating the drives - note some of those nodes have only 4GB
ram for max heap.
what do you mean?

Not enough for a production workload.
It's recommended that you allocate 8GB if you use CMS GC, 16GB (or more)
in case of G1 GC.

You can find more details on the blog of TLP written by Jon if you will
plan a tuning of your JVM:
https://thelastpickle.com/blog/2018/04/11/gc-tuning.html



Le mar. 27 août 2019 à 09:11, Marco Gasparini <
marco.gasparini@competitoor.com> a écrit :

> thank you all for answering.
>
> During the peek of workload I'm measuring each nodes statistics, cpu and
> I/O statistics included, and I noticed a lot of time spent for IOWAIT
> (30-40% of the total cpu usage during the peek).
> It seems that the bottleneck is the spinning disk, I'm wondering if I
> could try to modify cassandra's configuration in order to improve the RAM
> utilisation.
>
> > Not just saturating the drives - note some of those nodes have only 4GB
> ram for max heap.
> what do you mean?
>
> > Cassandra performs very poorly with payloads > 1MB.  20MB is WAY too
> big.  What you need is a blob / object store, not Cassandra.
> yes, we understood that but we chose Cassandra for other reasons an now we
> need to keep it.
>
> Marco
>
>
> Il giorno lun 26 ago 2019 alle ore 22:21 Jon Haddad <jo...@jonhaddad.com>
> ha scritto:
>
>> Not just saturating the drives - note some of those nodes have only 4GB
>> ram for max heap.
>>
>> Cassandra performs very poorly with payloads > 1MB.  20MB is WAY too
>> big.  What you need is a blob / object store, not Cassandra.
>>
>> Jon
>>
>> On Mon, Aug 26, 2019 at 9:45 AM Marc Selwan <ma...@datastax.com>
>> wrote:
>>
>>> * I do queries that reads 3 rows at a time where the total data size is
>>> between 5MB and 20MB*
>>>
>>> There's a good chance you're saturating those drives with payloads like
>>> that. Do you happen to have dashboards or capture IO metrics?
>>>
>>> Best,
>>> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
>>> Twitter <https://twitter.com/MarcSelwan>
>>>
>>> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
>>> <http://www.academy.datastax.com> *| *Documentation
>>> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>>>  *| *Downloads <http://www.datastax.com/download>
>>>
>>>
>>>
>>> On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
>>> marco.gasparini@competitoor.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> The error is the following:
>>>>
>>>> All host(s) tried for query failed. First host tried,
>>>> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>>>>
>>>>
>>>> in system.log I don't have any exceptions.
>>>>
>>>> I see 4 odds logs :
>>>>
>>>> - every period of time StatusLogger logs the table containing "Pool
>>>> Name                    Active   Pending      Completed   Blocked  All Time
>>>> Blocked"
>>>>
>>>> - log Maximum memory usage reached (2147483648), cannot allocate chunk
>>>> of 1048576
>>>>
>>>> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
>>>> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>>> cross-node dropped latency: 5960 ms
>>>>
>>>> - log Some operations were slow, details available at debug level
>>>>
>>>>
>>>>
>>>>
>>>> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
>>>> inquiallen@gmail.com> ha scritto:
>>>>
>>>>> Hello Marco,
>>>>>
>>>>> May you pls share error, exception logs seen in system.log files in
>>>>> the environment.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>>>>> marco.gasparini@competitoor.com> wrote:
>>>>>
>>>>>> hi everybody,
>>>>>>
>>>>>> I'm experiencing some read failures and high read latency (watch the
>>>>>> attached picture for more detailes).
>>>>>>
>>>>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for
>>>>>> each node. Running Cassandra 3.11.4
>>>>>>
>>>>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>>>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>>>>
>>>>>> Each node has spinning disk.
>>>>>>
>>>>>> - Some fields from cassandra.yaml configuration:
>>>>>>
>>>>>> concurrent_reads: 64
>>>>>> concurrent_writes: 64
>>>>>> concurrent_counter_writes: 64
>>>>>>
>>>>>> file_cache_size_in_mb: 2048
>>>>>>
>>>>>> memtable_cleanup_threshold: 0.2
>>>>>> memtable_flush_writers: 4
>>>>>> memtable_allocation_type: offheap_objects
>>>>>>
>>>>>> - CQL schema and RF:
>>>>>>
>>>>>> CREATE KEYSPACE myks WITH replication = {'class':
>>>>>> 'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
>>>>>> CREATE TABLE myks.mytable (
>>>>>>     id bigint,
>>>>>>     type text,
>>>>>>     page int,
>>>>>>     event_datetime timestamp,
>>>>>>     agent text,
>>>>>>     portion text,
>>>>>>     raw text,
>>>>>>     status int,
>>>>>>     status_code_pass int,
>>>>>>     dom bigint,
>>>>>>     reached text,
>>>>>>     tt text,
>>>>>>     PRIMARY KEY ((id, type), page, event_datetime)
>>>>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>     AND comment = ''
>>>>>>     AND compaction = {'class':
>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>     AND crc_check_chance = 1.0
>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>     AND default_time_to_live = 0
>>>>>>     AND gc_grace_seconds = 90000
>>>>>>     AND max_index_interval = 2048
>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>     AND min_index_interval = 128
>>>>>>     AND read_repair_chance = 0.0
>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>
>>>>>>
>>>>>> - I do queries that reads 3 rows at a time where the total data size
>>>>>> is between 5MB and 20MB.
>>>>>>
>>>>>>
>>>>>> How can I improve the reading performances?
>>>>>> I could stand losing some writing speed in order to improve the
>>>>>> reading speed.
>>>>>>
>>>>>> if you need more information, please ask.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Marco
>>>>>> [image: grafana_cassandra.png]
>>>>>>
>>>>>

-- 
Cordialement;

Ahmed ELJAMI

Re: read failures and high read latency

Posted by Marco Gasparini <ma...@competitoor.com>.
thank you all for answering.

During the peek of workload I'm measuring each nodes statistics, cpu and
I/O statistics included, and I noticed a lot of time spent for IOWAIT
(30-40% of the total cpu usage during the peek).
It seems that the bottleneck is the spinning disk, I'm wondering if I could
try to modify cassandra's configuration in order to improve the RAM
utilisation.

> Not just saturating the drives - note some of those nodes have only 4GB
ram for max heap.
what do you mean?

> Cassandra performs very poorly with payloads > 1MB.  20MB is WAY too
big.  What you need is a blob / object store, not Cassandra.
yes, we understood that but we chose Cassandra for other reasons an now we
need to keep it.

Marco


Il giorno lun 26 ago 2019 alle ore 22:21 Jon Haddad <jo...@jonhaddad.com> ha
scritto:

> Not just saturating the drives - note some of those nodes have only 4GB
> ram for max heap.
>
> Cassandra performs very poorly with payloads > 1MB.  20MB is WAY too big.
> What you need is a blob / object store, not Cassandra.
>
> Jon
>
> On Mon, Aug 26, 2019 at 9:45 AM Marc Selwan <ma...@datastax.com>
> wrote:
>
>> * I do queries that reads 3 rows at a time where the total data size is
>> between 5MB and 20MB*
>>
>> There's a good chance you're saturating those drives with payloads like
>> that. Do you happen to have dashboards or capture IO metrics?
>>
>> Best,
>> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
>> Twitter <https://twitter.com/MarcSelwan>
>>
>> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
>> <http://www.academy.datastax.com> *| *Documentation
>> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>>  *| *Downloads <http://www.datastax.com/download>
>>
>>
>>
>> On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
>> marco.gasparini@competitoor.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> The error is the following:
>>>
>>> All host(s) tried for query failed. First host tried,
>>> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>>>
>>>
>>> in system.log I don't have any exceptions.
>>>
>>> I see 4 odds logs :
>>>
>>> - every period of time StatusLogger logs the table containing "Pool Name
>>>                    Active   Pending      Completed   Blocked  All Time
>>> Blocked"
>>>
>>> - log Maximum memory usage reached (2147483648), cannot allocate chunk
>>> of 1048576
>>>
>>> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
>>> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>> cross-node dropped latency: 5960 ms
>>>
>>> - log Some operations were slow, details available at debug level
>>>
>>>
>>>
>>>
>>> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
>>> inquiallen@gmail.com> ha scritto:
>>>
>>>> Hello Marco,
>>>>
>>>> May you pls share error, exception logs seen in system.log files in the
>>>> environment.
>>>>
>>>> Thanks
>>>>
>>>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>>>> marco.gasparini@competitoor.com> wrote:
>>>>
>>>>> hi everybody,
>>>>>
>>>>> I'm experiencing some read failures and high read latency (watch the
>>>>> attached picture for more detailes).
>>>>>
>>>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for
>>>>> each node. Running Cassandra 3.11.4
>>>>>
>>>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>>>
>>>>> Each node has spinning disk.
>>>>>
>>>>> - Some fields from cassandra.yaml configuration:
>>>>>
>>>>> concurrent_reads: 64
>>>>> concurrent_writes: 64
>>>>> concurrent_counter_writes: 64
>>>>>
>>>>> file_cache_size_in_mb: 2048
>>>>>
>>>>> memtable_cleanup_threshold: 0.2
>>>>> memtable_flush_writers: 4
>>>>> memtable_allocation_type: offheap_objects
>>>>>
>>>>> - CQL schema and RF:
>>>>>
>>>>> CREATE KEYSPACE myks WITH replication = {'class':
>>>>> 'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
>>>>> CREATE TABLE myks.mytable (
>>>>>     id bigint,
>>>>>     type text,
>>>>>     page int,
>>>>>     event_datetime timestamp,
>>>>>     agent text,
>>>>>     portion text,
>>>>>     raw text,
>>>>>     status int,
>>>>>     status_code_pass int,
>>>>>     dom bigint,
>>>>>     reached text,
>>>>>     tt text,
>>>>>     PRIMARY KEY ((id, type), page, event_datetime)
>>>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>     AND comment = ''
>>>>>     AND compaction = {'class':
>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>     AND crc_check_chance = 1.0
>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>     AND default_time_to_live = 0
>>>>>     AND gc_grace_seconds = 90000
>>>>>     AND max_index_interval = 2048
>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>     AND min_index_interval = 128
>>>>>     AND read_repair_chance = 0.0
>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>
>>>>>
>>>>> - I do queries that reads 3 rows at a time where the total data size
>>>>> is between 5MB and 20MB.
>>>>>
>>>>>
>>>>> How can I improve the reading performances?
>>>>> I could stand losing some writing speed in order to improve the
>>>>> reading speed.
>>>>>
>>>>> if you need more information, please ask.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Marco
>>>>> [image: grafana_cassandra.png]
>>>>>
>>>>

Re: read failures and high read latency

Posted by Jon Haddad <jo...@jonhaddad.com>.
Not just saturating the drives - note some of those nodes have only 4GB ram
for max heap.

Cassandra performs very poorly with payloads > 1MB.  20MB is WAY too big.
What you need is a blob / object store, not Cassandra.

Jon

On Mon, Aug 26, 2019 at 9:45 AM Marc Selwan <ma...@datastax.com>
wrote:

> * I do queries that reads 3 rows at a time where the total data size is
> between 5MB and 20MB*
>
> There's a good chance you're saturating those drives with payloads like
> that. Do you happen to have dashboards or capture IO metrics?
>
> Best,
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
> marco.gasparini@competitoor.com> wrote:
>
>> Hi,
>>
>>
>> The error is the following:
>>
>> All host(s) tried for query failed. First host tried,
>> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>>
>>
>> in system.log I don't have any exceptions.
>>
>> I see 4 odds logs :
>>
>> - every period of time StatusLogger logs the table containing "Pool Name
>>                    Active   Pending      Completed   Blocked  All Time
>> Blocked"
>>
>> - log Maximum memory usage reached (2147483648), cannot allocate chunk of
>> 1048576
>>
>> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
>> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>> cross-node dropped latency: 5960 ms
>>
>> - log Some operations were slow, details available at debug level
>>
>>
>>
>>
>> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
>> inquiallen@gmail.com> ha scritto:
>>
>>> Hello Marco,
>>>
>>> May you pls share error, exception logs seen in system.log files in the
>>> environment.
>>>
>>> Thanks
>>>
>>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>>> marco.gasparini@competitoor.com> wrote:
>>>
>>>> hi everybody,
>>>>
>>>> I'm experiencing some read failures and high read latency (watch the
>>>> attached picture for more detailes).
>>>>
>>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for
>>>> each node. Running Cassandra 3.11.4
>>>>
>>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>>
>>>> Each node has spinning disk.
>>>>
>>>> - Some fields from cassandra.yaml configuration:
>>>>
>>>> concurrent_reads: 64
>>>> concurrent_writes: 64
>>>> concurrent_counter_writes: 64
>>>>
>>>> file_cache_size_in_mb: 2048
>>>>
>>>> memtable_cleanup_threshold: 0.2
>>>> memtable_flush_writers: 4
>>>> memtable_allocation_type: offheap_objects
>>>>
>>>> - CQL schema and RF:
>>>>
>>>> CREATE KEYSPACE myks WITH replication = {'class':
>>>> 'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
>>>> CREATE TABLE myks.mytable (
>>>>     id bigint,
>>>>     type text,
>>>>     page int,
>>>>     event_datetime timestamp,
>>>>     agent text,
>>>>     portion text,
>>>>     raw text,
>>>>     status int,
>>>>     status_code_pass int,
>>>>     dom bigint,
>>>>     reached text,
>>>>     tt text,
>>>>     PRIMARY KEY ((id, type), page, event_datetime)
>>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>>>     AND bloom_filter_fp_chance = 0.01
>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>     AND comment = ''
>>>>     AND compaction = {'class':
>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>     AND crc_check_chance = 1.0
>>>>     AND dclocal_read_repair_chance = 0.1
>>>>     AND default_time_to_live = 0
>>>>     AND gc_grace_seconds = 90000
>>>>     AND max_index_interval = 2048
>>>>     AND memtable_flush_period_in_ms = 0
>>>>     AND min_index_interval = 128
>>>>     AND read_repair_chance = 0.0
>>>>     AND speculative_retry = '99PERCENTILE';
>>>>
>>>>
>>>> - I do queries that reads 3 rows at a time where the total data size is
>>>> between 5MB and 20MB.
>>>>
>>>>
>>>> How can I improve the reading performances?
>>>> I could stand losing some writing speed in order to improve the reading
>>>> speed.
>>>>
>>>> if you need more information, please ask.
>>>>
>>>>
>>>> Thanks
>>>> Marco
>>>> [image: grafana_cassandra.png]
>>>>
>>>

Re: read failures and high read latency

Posted by Marc Selwan <ma...@datastax.com>.
* I do queries that reads 3 rows at a time where the total data size is
between 5MB and 20MB*

There's a good chance you're saturating those drives with payloads like
that. Do you happen to have dashboards or capture IO metrics?

Best,
*Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
Twitter <https://twitter.com/MarcSelwan>

*  Quick links | *DataStax <http://www.datastax.com> *| *Training
<http://www.academy.datastax.com> *| *Documentation
<http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
 *| *Downloads <http://www.datastax.com/download>



On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
marco.gasparini@competitoor.com> wrote:

> Hi,
>
>
> The error is the following:
>
> All host(s) tried for query failed. First host tried,
> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>
>
> in system.log I don't have any exceptions.
>
> I see 4 odds logs :
>
> - every period of time StatusLogger logs the table containing "Pool Name
>                  Active   Pending      Completed   Blocked  All Time
> Blocked"
>
> - log Maximum memory usage reached (2147483648), cannot allocate chunk of
> 1048576
>
> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
> cross-node dropped latency: 5960 ms
>
> - log Some operations were slow, details available at debug level
>
>
>
>
> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
> inquiallen@gmail.com> ha scritto:
>
>> Hello Marco,
>>
>> May you pls share error, exception logs seen in system.log files in the
>> environment.
>>
>> Thanks
>>
>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>> marco.gasparini@competitoor.com> wrote:
>>
>>> hi everybody,
>>>
>>> I'm experiencing some read failures and high read latency (watch the
>>> attached picture for more detailes).
>>>
>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for each
>>> node. Running Cassandra 3.11.4
>>>
>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>
>>> Each node has spinning disk.
>>>
>>> - Some fields from cassandra.yaml configuration:
>>>
>>> concurrent_reads: 64
>>> concurrent_writes: 64
>>> concurrent_counter_writes: 64
>>>
>>> file_cache_size_in_mb: 2048
>>>
>>> memtable_cleanup_threshold: 0.2
>>> memtable_flush_writers: 4
>>> memtable_allocation_type: offheap_objects
>>>
>>> - CQL schema and RF:
>>>
>>> CREATE KEYSPACE myks WITH replication = {'class':
>>> 'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
>>> CREATE TABLE myks.mytable (
>>>     id bigint,
>>>     type text,
>>>     page int,
>>>     event_datetime timestamp,
>>>     agent text,
>>>     portion text,
>>>     raw text,
>>>     status int,
>>>     status_code_pass int,
>>>     dom bigint,
>>>     reached text,
>>>     tt text,
>>>     PRIMARY KEY ((id, type), page, event_datetime)
>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>>     AND bloom_filter_fp_chance = 0.01
>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>     AND comment = ''
>>>     AND compaction = {'class':
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>     AND crc_check_chance = 1.0
>>>     AND dclocal_read_repair_chance = 0.1
>>>     AND default_time_to_live = 0
>>>     AND gc_grace_seconds = 90000
>>>     AND max_index_interval = 2048
>>>     AND memtable_flush_period_in_ms = 0
>>>     AND min_index_interval = 128
>>>     AND read_repair_chance = 0.0
>>>     AND speculative_retry = '99PERCENTILE';
>>>
>>>
>>> - I do queries that reads 3 rows at a time where the total data size is
>>> between 5MB and 20MB.
>>>
>>>
>>> How can I improve the reading performances?
>>> I could stand losing some writing speed in order to improve the reading
>>> speed.
>>>
>>> if you need more information, please ask.
>>>
>>>
>>> Thanks
>>> Marco
>>> [image: grafana_cassandra.png]
>>>
>>

Re: read failures and high read latency

Posted by Marco Gasparini <ma...@competitoor.com>.
Hi,


The error is the following:

All host(s) tried for query failed. First host tried, xxx.xxx.xxx.xxx:9042:
Host considered as DOWN.


in system.log I don't have any exceptions.

I see 4 odds logs :

- every period of time StatusLogger logs the table containing "Pool Name
                 Active   Pending      Completed   Blocked  All Time
Blocked"

- log Maximum memory usage reached (2147483648), cannot allocate chunk of
1048576

- logDroppedMessages READ messages were dropped in last 5000 ms: 0 internal
and 1 cross node. Mean internal dropped latency: 0 ms and Mean cross-node
dropped latency: 5960 ms

- log Some operations were slow, details available at debug level




Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
inquiallen@gmail.com> ha scritto:

> Hello Marco,
>
> May you pls share error, exception logs seen in system.log files in the
> environment.
>
> Thanks
>
> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
> marco.gasparini@competitoor.com> wrote:
>
>> hi everybody,
>>
>> I'm experiencing some read failures and high read latency (watch the
>> attached picture for more detailes).
>>
>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for each
>> node. Running Cassandra 3.11.4
>>
>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>
>> Each node has spinning disk.
>>
>> - Some fields from cassandra.yaml configuration:
>>
>> concurrent_reads: 64
>> concurrent_writes: 64
>> concurrent_counter_writes: 64
>>
>> file_cache_size_in_mb: 2048
>>
>> memtable_cleanup_threshold: 0.2
>> memtable_flush_writers: 4
>> memtable_allocation_type: offheap_objects
>>
>> - CQL schema and RF:
>>
>> CREATE KEYSPACE myks WITH replication = {'class':
>> 'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
>> CREATE TABLE myks.mytable (
>>     id bigint,
>>     type text,
>>     page int,
>>     event_datetime timestamp,
>>     agent text,
>>     portion text,
>>     raw text,
>>     status int,
>>     status_code_pass int,
>>     dom bigint,
>>     reached text,
>>     tt text,
>>     PRIMARY KEY ((id, type), page, event_datetime)
>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>     AND bloom_filter_fp_chance = 0.01
>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>     AND comment = ''
>>     AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>     AND crc_check_chance = 1.0
>>     AND dclocal_read_repair_chance = 0.1
>>     AND default_time_to_live = 0
>>     AND gc_grace_seconds = 90000
>>     AND max_index_interval = 2048
>>     AND memtable_flush_period_in_ms = 0
>>     AND min_index_interval = 128
>>     AND read_repair_chance = 0.0
>>     AND speculative_retry = '99PERCENTILE';
>>
>>
>> - I do queries that reads 3 rows at a time where the total data size is
>> between 5MB and 20MB.
>>
>>
>> How can I improve the reading performances?
>> I could stand losing some writing speed in order to improve the reading
>> speed.
>>
>> if you need more information, please ask.
>>
>>
>> Thanks
>> Marco
>> [image: grafana_cassandra.png]
>>
>

Re: read failures and high read latency

Posted by Inquistive allen <in...@gmail.com>.
Hello Marco,

May you pls share error, exception logs seen in system.log files in the
environment.

Thanks

On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
marco.gasparini@competitoor.com> wrote:

> hi everybody,
>
> I'm experiencing some read failures and high read latency (watch the
> attached picture for more detailes).
>
> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for each
> node. Running Cassandra 3.11.4
>
> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>
> Each node has spinning disk.
>
> - Some fields from cassandra.yaml configuration:
>
> concurrent_reads: 64
> concurrent_writes: 64
> concurrent_counter_writes: 64
>
> file_cache_size_in_mb: 2048
>
> memtable_cleanup_threshold: 0.2
> memtable_flush_writers: 4
> memtable_allocation_type: offheap_objects
>
> - CQL schema and RF:
>
> CREATE KEYSPACE myks WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
> CREATE TABLE myks.mytable (
>     id bigint,
>     type text,
>     page int,
>     event_datetime timestamp,
>     agent text,
>     portion text,
>     raw text,
>     status int,
>     status_code_pass int,
>     dom bigint,
>     reached text,
>     tt text,
>     PRIMARY KEY ((id, type), page, event_datetime)
> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 90000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
>
>
> - I do queries that reads 3 rows at a time where the total data size is
> between 5MB and 20MB.
>
>
> How can I improve the reading performances?
> I could stand losing some writing speed in order to improve the reading
> speed.
>
> if you need more information, please ask.
>
>
> Thanks
> Marco
> [image: grafana_cassandra.png]
>