You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Octavian Rinciog <oc...@gmail.com> on 2018/01/17 17:40:58 UTC

High read rate on hard-disk

Hello!

I am using Cassandra 3.10, on Ubuntu 14.04 and I have a counter
table(RF=1), with the following schema:

CREATE TABLE edges (
    src_id text,
    src_type text,
    source text
    weight counter,
    PRIMARY KEY ((src_id, src_type), source)
) WITH
   compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}

SELECT vs UPDATE requests ratio is 0.001. ( Read Count: 3771000, Write
Count: 3401236000, in one month)

We have Counter Cache enabled:

Counter Cache          : entries 1018782, size 256 MiB, capacity 256
MiB, 2799913189 hits, 3469459479 requests, 0.807 recent hit rate, 7200
save period in seconds

The problem is that our read rate limit on our hard-disk is always
near 30MBps and our write rate limit is near 500KBps.

One example of output of "iostat -x" is

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.06     1.04  263.65    2.04 28832.42   572.53
146.07     0.36    1.35    0.74   81.16   1.27  33.81

Also with iotop, we saw that are about 8 threads that each goes around
3MB/s read rate.

Total DISK READ :      22.73 M/s | Total DISK WRITE :     494.35 K/s
Actual DISK READ:      22.62 M/s | Actual DISK WRITE:     528.57 K/s
  TID  PRIO  USER    DISK READ>  DISK WRITE  SWAPIN      IO    COMMAND
14793 be/4 cassandra 3.061 M/s    0.0010 B/s  0.00 % 93.27 % java
-Dcassandra.fd_max_interval_ms=400

The output of strace on these threads is :

strace -cp 14793
Process 14793 attached
^CProcess 14793 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.85   32.118518          57    567288    256251 futex
  0.15    0.048822           3     15339           write
  0.00    0.000000           0         1           rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00   32.167340                582628    256251 total


Despite that iotop shows that this thread is reading with 3MB/s, there
is no read syscall in strace.

I want to ask if actually the futex is responsible for the read rate
and how can we debug this problem further ?

Btw, there are no compaction tasks in progress and there are no SELECT
queries in progress.

Also, I know that for each update, a lock is obtained[1]

Thank you,

[1]https://apache.googlesource.com/cassandra/+/refs/heads/trunk/src/java/org/apache/cassandra/db/CounterMutation.java#121
-- 
Octavian Rinciog

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: High read rate on hard-disk

Posted by Octavian Rinciog <oc...@gmail.com>.
Hy Alain,

Thank you for your response.

> - Other than the 'lock', Counters perform an implicit read before the write
> operation.

From what I know there is one counter cache[1], that is used to read
the old values of the counters. According to [2], it is used only for
UPDATE requests


> I would say what you are seeing is expected with this use case. Also, I have
> never seen a use case where using RF = 1 is good idea (excepted for some
> testing maybe). Be aware this data is weak and can easily be lost (if it's a
> deliberate choice, ignore my comment). On the bright side, you have no
> entropy / consistency issues or need for repairs with RF = 1 :D.

Yes, indeed RF=1 policy is our choice (basically because we didn't
manage to scale the counter writes very good and we assumed that we
can loose some data)


[1]https://apache.googlesource.com/cassandra/+/refs/heads/trunk/src/java/org/apache/cassandra/db/CounterMutation.java#193
[2]https://issues.apache.org/jira/browse/CASSANDRA-12500?focusedCommentId=15464023&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15464023


2018-01-18 12:51 GMT+02:00 Alain RODRIGUEZ <ar...@gmail.com>:
> Hello Octavian,
>
>>
>>  I have a counter table(RF=1)
>>
>>  SELECT vs UPDATE requests ratio is 0.001. ( Read Count: 3771000, Write
>> Count: 3401236000, in one month)
>>
>> SELECT vs UPDATE requests ratio is 0.001. ( Read Count: 3771000, Write
>> Count: 3401236000, in one month)
>
>
>> The problem is that our read rate limit on our hard-disk is always near
>> 30MBps and our write rate limit is near 500KBps.
>
>
> I did not read all your numbers, but here are the internal details you could
> be missing:
>
> - Other than the 'lock', Counters perform an implicit read before the write
> operation. To increment, you need to know about past value. It was true last
> time I used them, I believe there is no real workaround and it's still the
> case today.
> - Writes do not hit the disk synchronously. Instead of this, they are stored
> in the Memtable and only flushed once, sequentially and efficiently. Then
> compactions manages to merge partitions after, asynchronously.
>
> I would say what you are seeing is expected with this use case. Also, I have
> never seen a use case where using RF = 1 is good idea (excepted for some
> testing maybe). Be aware this data is weak and can easily be lost (if it's a
> deliberate choice, ignore my comment). On the bright side, you have no
> entropy / consistency issues or need for repairs with RF = 1 :D.
>
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2018-01-17 17:40 GMT+00:00 Octavian Rinciog <oc...@gmail.com>:
>>
>> Hello!
>>
>> I am using Cassandra 3.10, on Ubuntu 14.04 and I have a counter
>> table(RF=1), with the following schema:
>>
>> CREATE TABLE edges (
>>     src_id text,
>>     src_type text,
>>     source text
>>     weight counter,
>>     PRIMARY KEY ((src_id, src_type), source)
>> ) WITH
>>    compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>>
>> SELECT vs UPDATE requests ratio is 0.001. ( Read Count: 3771000, Write
>> Count: 3401236000, in one month)
>>
>> We have Counter Cache enabled:
>>
>> Counter Cache          : entries 1018782, size 256 MiB, capacity 256
>> MiB, 2799913189 hits, 3469459479 requests, 0.807 recent hit rate, 7200
>> save period in seconds
>>
>> The problem is that our read rate limit on our hard-disk is always
>> near 30MBps and our write rate limit is near 500KBps.
>>
>> One example of output of "iostat -x" is
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sdb               0.06     1.04  263.65    2.04 28832.42   572.53
>> 146.07     0.36    1.35    0.74   81.16   1.27  33.81
>>
>> Also with iotop, we saw that are about 8 threads that each goes around
>> 3MB/s read rate.
>>
>> Total DISK READ :      22.73 M/s | Total DISK WRITE :     494.35 K/s
>> Actual DISK READ:      22.62 M/s | Actual DISK WRITE:     528.57 K/s
>>   TID  PRIO  USER    DISK READ>  DISK WRITE  SWAPIN      IO    COMMAND
>> 14793 be/4 cassandra 3.061 M/s    0.0010 B/s  0.00 % 93.27 % java
>> -Dcassandra.fd_max_interval_ms=400
>>
>> The output of strace on these threads is :
>>
>> strace -cp 14793
>> Process 14793 attached
>> ^CProcess 14793 detached
>> % time     seconds  usecs/call     calls    errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>>  99.85   32.118518          57    567288    256251 futex
>>   0.15    0.048822           3     15339           write
>>   0.00    0.000000           0         1           rt_sigreturn
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00   32.167340                582628    256251 total
>>
>>
>> Despite that iotop shows that this thread is reading with 3MB/s, there
>> is no read syscall in strace.
>>
>> I want to ask if actually the futex is responsible for the read rate
>> and how can we debug this problem further ?
>>
>> Btw, there are no compaction tasks in progress and there are no SELECT
>> queries in progress.
>>
>> Also, I know that for each update, a lock is obtained[1]
>>
>> Thank you,
>>
>>
>> [1]https://apache.googlesource.com/cassandra/+/refs/heads/trunk/src/java/org/apache/cassandra/db/CounterMutation.java#121
>> --
>> Octavian Rinciog
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>



-- 
Octavian Rinciog

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: High read rate on hard-disk

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hello Octavian,


>  I have a counter table(RF=1)

 SELECT vs UPDATE requests ratio is 0.001. ( Read Count: 3771000, Write Count:
> 3401236000, in one month)

SELECT vs UPDATE requests ratio is 0.001. ( Read Count: 3771000, Write
> Count: 3401236000, in one month)


The problem is that our read rate limit on our hard-disk is always near
> 30MBps and our write rate limit is near 500KBps.


I did not read all your numbers, but here are the internal details you
could be missing:

- Other than the 'lock', Counters perform an implicit read before the write
operation. To increment, you need to know about past value. It was true
last time I used them, I believe there is no real workaround and it's still
the case today.
- Writes do not hit the disk synchronously. Instead of this, they are
stored in the Memtable and only flushed once, sequentially and efficiently.
Then compactions manages to merge partitions after, asynchronously.

I would say what you are seeing is expected with this use case. Also, I
have never seen a use case where using RF = 1 is good idea (excepted for
some testing maybe). Be aware this data is weak and can easily be lost (if
it's a deliberate choice, ignore my comment). On the bright side, you have
no entropy / consistency issues or need for repairs with RF = 1 :D.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-01-17 17:40 GMT+00:00 Octavian Rinciog <oc...@gmail.com>:

> Hello!
>
> I am using Cassandra 3.10, on Ubuntu 14.04 and I have a counter
> table(RF=1), with the following schema:
>
> CREATE TABLE edges (
>     src_id text,
>     src_type text,
>     source text
>     weight counter,
>     PRIMARY KEY ((src_id, src_type), source)
> ) WITH
>    compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
> SELECT vs UPDATE requests ratio is 0.001. ( Read Count: 3771000, Write
> Count: 3401236000, in one month)
>
> We have Counter Cache enabled:
>
> Counter Cache          : entries 1018782, size 256 MiB, capacity 256
> MiB, 2799913189 hits, 3469459479 requests, 0.807 recent hit rate, 7200
> save period in seconds
>
> The problem is that our read rate limit on our hard-disk is always
> near 30MBps and our write rate limit is near 500KBps.
>
> One example of output of "iostat -x" is
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sdb               0.06     1.04  263.65    2.04 28832.42   572.53
> 146.07     0.36    1.35    0.74   81.16   1.27  33.81
>
> Also with iotop, we saw that are about 8 threads that each goes around
> 3MB/s read rate.
>
> Total DISK READ :      22.73 M/s | Total DISK WRITE :     494.35 K/s
> Actual DISK READ:      22.62 M/s | Actual DISK WRITE:     528.57 K/s
>   TID  PRIO  USER    DISK READ>  DISK WRITE  SWAPIN      IO    COMMAND
> 14793 be/4 cassandra 3.061 M/s    0.0010 B/s  0.00 % 93.27 % java
> -Dcassandra.fd_max_interval_ms=400
>
> The output of strace on these threads is :
>
> strace -cp 14793
> Process 14793 attached
> ^CProcess 14793 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  99.85   32.118518          57    567288    256251 futex
>   0.15    0.048822           3     15339           write
>   0.00    0.000000           0         1           rt_sigreturn
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   32.167340                582628    256251 total
>
>
> Despite that iotop shows that this thread is reading with 3MB/s, there
> is no read syscall in strace.
>
> I want to ask if actually the futex is responsible for the read rate
> and how can we debug this problem further ?
>
> Btw, there are no compaction tasks in progress and there are no SELECT
> queries in progress.
>
> Also, I know that for each update, a lock is obtained[1]
>
> Thank you,
>
> [1]https://apache.googlesource.com/cassandra/+/
> refs/heads/trunk/src/java/org/apache/cassandra/db/CounterMutation.java#121
> --
> Octavian Rinciog
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>