You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Pranay akula <pr...@gmail.com> on 2017/07/07 14:26:39 UTC

READ Queries timing out.

Lately i am seeing some select queries timing out, data modelling to blame
for but not in a situation to redo it.

Does increasing heap will help ??

currently using 1GB new_heap, I analysed the GC logs not having any issues
with major GC's .

Using G1GC , does increasing new_heap will help ??

currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
increase heap to lets say 2GB is that effective b/c young GC's will kick in
more frequently  to complete in 500ms right ??


Thanks
Pranay.

RE: READ Queries timing out.

Posted by "Durity, Sean R" <SE...@homedepot.com>.

1 GB heap is very small. Why not try increasing it to 50% of RAM and see if it helps you track down the real issue. It is hard to tune around a bad data model, if that is indeed the issue. Seeing your tables and queries would help.

Sean Durity

From: Pranay akula [mailto:pranay.akula2323@gmail.com]
Sent: Friday, July 07, 2017 11:47 AM
To: user@cassandra.apache.org
Cc: ZAIDI, ASAD A <az...@att.com>
Subject: Re: READ Queries timing out.

Thanks ZAIDI,

Using C++ driver doesn't have tracing with driver so executing those from cqlsh. when i am tracing i am getting below error, i increased --request-timeout to 3600 in cqlsh.

ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
Statement trace did not complete within 10 seconds

The below are cfstats and cfhistograms, i can see  read latency, cell count and Maximum live cells per slice (last five minutes) are high. is there any way to get around this with out changing data model.

Percentile  SSTables     Write Latency      Read Latency          Partition Size        Cell Count
                                         (micros)          (micros)                          (bytes)
50%             1.00             20.00               NaN                                   1331                20
75%             2.00             29.00               NaN                                  6866                86
95%             8.00             60.00               NaN                                126934              1331
98%            10.00            103.00               NaN                               315852              3973
99%            12.00            149.00               NaN                                  545791              8239
Min             0.00              0.00                0.00                                               104                 0
Max            20.00       12730764.00  9773372036884776000.00          74975550             83457

        Read Count: 44514407
        Read Latency: 82.92876612928933 ms.
        Write Count: 3007585812
        Write Latency: 0.07094456590853208 ms.
        Pending Flushes: 0
                SSTable count: 9
                    Space used (live): 66946214374
                    Space used (total): 66946214374
                    Space used by snapshots (total): 0
                    Off heap memory used (total): 33706492
                    SSTable Compression Ratio: 0.5598380206656697
                    Number of keys (estimate): 2483819
                    Memtable cell count: 15008
                    Memtable data size: 330597
                    Memtable off heap memory used: 518502
                    Memtable switch count: 39915
                    Local read count: 44514407
                    Local read latency: 82.929 ms
                    Local write count: 3007585849
                    Local write latency: 0.071 ms
                    Pending flushes: 0
                    Bloom filter false positives: 0
                    Bloom filter false ratio: 0.00000
                    Bloom filter space used: 12623632
                    Bloom filter off heap memory used: 12623560
                    Index summary off heap memory used: 3285614
                    Compression metadata off heap memory used: 17278816
                    Compacted partition minimum bytes: 104
                    Compacted partition maximum bytes: 74975550
                    Compacted partition mean bytes: 27111
                    Average live cells per slice (last five minutes): 388.7486606077893
                    Maximum live cells per slice (last five minutes): 28983.0
                    Average tombstones per slice (last five minutes): 0.0
                    Maximum tombstones per slice (last five minutes): 0.0

Thanks
Pranay.

On Fri, Jul 7, 2017 at 11:16 AM, Thakrar, Jayesh <jt...@conversantmedia.com>> wrote:
Can you provide more details.
E.g. table structure, the app used for the query, the query itself and the error message.

Also get the output of the following commands from your cluster nodes (note that one command uses "." and the other "space" between keyspace and tablename)

nodetool -h <hostname> tablestats <keyspace>.<tablename>
nodetool -h <hostname> tablehistograms <keyspace> <tablename>

Timeouts can happen at the client/application level (which can be tuned) and at the coordinator node level (which too can be tuned).
But again those timeouts are a symptom of something.
It can happen at the client side because of connection pool queue too full (which is likely due to response time from the cluster/coordinate nodes).
And the issues at the cluster side could be due to several reasons.
E.g. your query has to scan through too many tombstones, causing the delay or your query (if using filter).

From: "ZAIDI, ASAD A" <az...@att.com>>
Date: Friday, July 7, 2017 at 9:45 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: RE: READ Queries timing out.

>> I analysed the GC logs not having any issues with major GC's
            If you don’t have issues on GC , than why do you want to [tune] GC parameters ?
Instead focus on why select queries are taking time.. may be take a look on their trace?

From: Pranay akula [mailto:pranay.akula2323@gmail.com<ma...@gmail.com>]
Sent: Friday, July 07, 2017 9:27 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: READ Queries timing out.

Lately i am seeing some select queries timing out, data modelling to blame for but not in a situation to redo it.

Does increasing heap will help ??

currently using 1GB new_heap, I analysed the GC logs not having any issues with major GC's .

Using G1GC , does increasing new_heap will help ??

currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i increase heap to lets say 2GB is that effective b/c young GC's will kick in more frequently  to complete in 500ms right ??

Thanks
Pranay.

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: READ Queries timing out.

Posted by Pranay akula <pr...@gmail.com>.

Thanks ZAIDI,

Using C++ driver doesn't have tracing with driver so executing those from
cqlsh. when i am tracing i am getting below error, i increased
--request-timeout to 3600 in cqlsh.


> ReadTimeout: code=1200 [Coordinator node timed out waiting for replica
> nodes' responses] message="Operation timed out - received only 0
> responses." info={'received_responses': 0, 'required_responses': 1,
> 'consistency': 'ONE'}
> Statement trace did not complete within 10 seconds


The below are cfstats and cfhistograms, i can see  read latency, cell count
and Maximum live cells per slice (last five minutes) are high. is there any
way to get around this with out changing data model.

Percentile  SSTables     Write Latency      Read Latency          Partition
> Size        Cell Count
>                                          (micros)          (micros)
>                    (bytes)
> 50%             1.00             20.00               NaN
>                 1331                20
> 75%             2.00             29.00               NaN
>                6866                86
> 95%             8.00             60.00               NaN
>              126934              1331
> 98%            10.00            103.00               NaN
>             315852              3973
> 99%            12.00            149.00               NaN
>                545791              8239
> Min             0.00              0.00                0.00
>                               104                 0
> Max            20.00       12730764.00  9773372036884776000.00
> 74975550             83457




Read Count: 44514407
> Read Latency: 82.92876612928933 ms.
> Write Count: 3007585812
> Write Latency: 0.07094456590853208 ms.
> Pending Flushes: 0
>         SSTable count: 9
> Space used (live): 66946214374
> Space used (total): 66946214374
> Space used by snapshots (total): 0
> Off heap memory used (total): 33706492
> SSTable Compression Ratio: 0.5598380206656697
> Number of keys (estimate): 2483819
> Memtable cell count: 15008
> Memtable data size: 330597
> Memtable off heap memory used: 518502
> Memtable switch count: 39915
> Local read count: 44514407
> Local read latency: 82.929 ms
> Local write count: 3007585849
> Local write latency: 0.071 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.00000
> Bloom filter space used: 12623632
> Bloom filter off heap memory used: 12623560
> Index summary off heap memory used: 3285614
> Compression metadata off heap memory used: 17278816
> Compacted partition minimum bytes: 104
> Compacted partition maximum bytes: 74975550
> Compacted partition mean bytes: 27111
> Average live cells per slice (last five minutes): 388.7486606077893
> Maximum live cells per slice (last five minutes): 28983.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0



Thanks
Pranay.

On Fri, Jul 7, 2017 at 11:16 AM, Thakrar, Jayesh <
jthakrar@conversantmedia.com> wrote:

> Can you provide more details.
>
> E.g. table structure, the app used for the query, the query itself and the
> error message.
>
>
>
> Also get the output of the following commands from your cluster nodes
> (note that one command uses "." and the other "space" between keyspace and
> tablename)
>
>
>
> nodetool -h <hostname> tablestats <keyspace>.<tablename>
>
> nodetool -h <hostname> tablehistograms <keyspace> <tablename>
>
>
>
> Timeouts can happen at the client/application level (which can be tuned)
> and at the coordinator node level (which too can be tuned).
>
> But again those timeouts are a symptom of something.
>
> It can happen at the client side because of connection pool queue too full
> (which is likely due to response time from the cluster/coordinate nodes).
>
> And the issues at the cluster side could be due to several reasons.
>
> E.g. your query has to scan through too many tombstones, causing the delay
> or your query (if using filter).
>
>
>
> *From: *"ZAIDI, ASAD A" <az...@att.com>
> *Date: *Friday, July 7, 2017 at 9:45 AM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *RE: READ Queries timing out.
>
>
>
> >> I analysed the GC logs not having any issues with major GC's
>
>             If you don’t have issues on GC , than why do you want to
> [tune] GC parameters ?
>
> Instead focus on why select queries are taking time.. may be take a look
> on their trace?
>
>
>
>
>
> *From:* Pranay akula [mailto:pranay.akula2323@gmail.com]
> *Sent:* Friday, July 07, 2017 9:27 AM
> *To:* user@cassandra.apache.org
> *Subject:* READ Queries timing out.
>
>
>
> Lately i am seeing some select queries timing out, data modelling to blame
> for but not in a situation to redo it.
>
>
>
> Does increasing heap will help ??
>
>
>
> currently using 1GB new_heap, I analysed the GC logs not having any issues
> with major GC's .
>
>
>
> Using G1GC , does increasing new_heap will help ??
>
>
>
> currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> increase heap to lets say 2GB is that effective b/c young GC's will kick in
> more frequently  to complete in 500ms right ??
>
>
>
>
>
> Thanks
>
> Pranay.
>

Re: READ Queries timing out.

Posted by "Thakrar, Jayesh" <jt...@conversantmedia.com>.

Can you provide more details.
E.g. table structure, the app used for the query, the query itself and the error message.

Also get the output of the following commands from your cluster nodes (note that one command uses "." and the other "space" between keyspace and tablename)

nodetool -h <hostname> tablestats <keyspace>.<tablename>
nodetool -h <hostname> tablehistograms <keyspace> <tablename>

Timeouts can happen at the client/application level (which can be tuned) and at the coordinator node level (which too can be tuned).
But again those timeouts are a symptom of something.
It can happen at the client side because of connection pool queue too full (which is likely due to response time from the cluster/coordinate nodes).
And the issues at the cluster side could be due to several reasons.
E.g. your query has to scan through too many tombstones, causing the delay or your query (if using filter).

From: "ZAIDI, ASAD A" <az...@att.com>
Date: Friday, July 7, 2017 at 9:45 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: RE: READ Queries timing out.

>> I analysed the GC logs not having any issues with major GC's
            If you don’t have issues on GC , than why do you want to [tune] GC parameters ?
Instead focus on why select queries are taking time.. may be take a look on their trace?


From: Pranay akula [mailto:pranay.akula2323@gmail.com]
Sent: Friday, July 07, 2017 9:27 AM
To: user@cassandra.apache.org
Subject: READ Queries timing out.

Lately i am seeing some select queries timing out, data modelling to blame for but not in a situation to redo it.

Does increasing heap will help ??

currently using 1GB new_heap, I analysed the GC logs not having any issues with major GC's .

Using G1GC , does increasing new_heap will help ??

currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i increase heap to lets say 2GB is that effective b/c young GC's will kick in more frequently  to complete in 500ms right ??


Thanks
Pranay.

Re: READ Queries timing out.

Posted by Pranay akula <pr...@gmail.com>.

Thanks ZAIDI,

The problem is the tracing queries are also getting timed out, so not sure
how to troubleshoot.

Does increasing new_heap will help reads ?? what other param's i can tune,
so that i can identify the issue.


Thanks
Pranay.



On Fri, Jul 7, 2017 at 10:45 AM, ZAIDI, ASAD A <az...@att.com> wrote:

> >> I analysed the GC logs not having any issues with major GC's
>
>             If you don’t have issues on GC , than why do you want to
> [tune] GC parameters ?
>
> Instead focus on why select queries are taking time.. may be take a look
> on their trace?
>
>
>
>
>
> *From:* Pranay akula [mailto:pranay.akula2323@gmail.com]
> *Sent:* Friday, July 07, 2017 9:27 AM
> *To:* user@cassandra.apache.org
> *Subject:* READ Queries timing out.
>
>
>
> Lately i am seeing some select queries timing out, data modelling to blame
> for but not in a situation to redo it.
>
>
>
> Does increasing heap will help ??
>
>
>
> currently using 1GB new_heap, I analysed the GC logs not having any issues
> with major GC's .
>
>
>
> Using G1GC , does increasing new_heap will help ??
>
>
>
> currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> increase heap to lets say 2GB is that effective b/c young GC's will kick in
> more frequently  to complete in 500ms right ??
>
>
>
>
>
> Thanks
>
> Pranay.
>

RE: READ Queries timing out.

Posted by "ZAIDI, ASAD A" <az...@att.com>.

>> I analysed the GC logs not having any issues with major GC's
            If you don’t have issues on GC , than why do you want to [tune] GC parameters ?
Instead focus on why select queries are taking time.. may be take a look on their trace?

From: Pranay akula [mailto:pranay.akula2323@gmail.com]
Sent: Friday, July 07, 2017 9:27 AM
To: user@cassandra.apache.org
Subject: READ Queries timing out.

Lately i am seeing some select queries timing out, data modelling to blame for but not in a situation to redo it.

Does increasing heap will help ??

currently using 1GB new_heap, I analysed the GC logs not having any issues with major GC's .

Using G1GC , does increasing new_heap will help ??

currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i increase heap to lets say 2GB is that effective b/c young GC's will kick in more frequently  to complete in 500ms right ??

Thanks
Pranay.

Re: READ Queries timing out.

Posted by Pranay akula <pr...@gmail.com>.

>
> Increasing new heap size generally helps when you're seeing a lot of
> promotion - if you're not seeing long major GCs, are you seeing a lot of
> promotion from eden to old gen?
> You don't typically set -Xmn (new heap size) when using G1GC


Nope not really

> currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> > increase heap to lets say 2GB is that effective b/c young GC's will kick
> in
> > more frequently  to complete in 500ms right ??
> Min heap size for G1 is probably 16G. You were saying "new_heap", now
> you're just saying heap - do you mean new here, or total heap size?


Apologies, what i mean is new_heap.
>
>
> You tablestats show a max partition size of 75M, which isn't nearly as bad
> as I was expecting (or not nearly as bad as we often see when people ask
> this question). You do occasionally scan a lot of sstables (up to 20?), so
> I'm assuming you have STCS - you may benefit from switching to LCS to try
> to limit the number of sstables you touch on read. Also, if your data isn't
> in memory (if your data set is larger than RAM and reads are random), you
> may benefit from a much lower compression_chunk_size - the default is 64k,
> but 4k or 16k is often much better if you do have to read from disk.


probably i have to try decreasing chunk size as LCS will not suit as we are
write heavy

On Fri, Jul 7, 2017 at 7:13 PM, Jeff Jirsa <jj...@apache.org> wrote:

>
>
> On 2017-07-07 07:26 (-0700), Pranay akula <pr...@gmail.com>
> wrote:
> > Lately i am seeing some select queries timing out, data modelling to
> blame
> > for but not in a situation to redo it.
> >
> > Does increasing heap will help ??
> >
> > currently using 1GB new_heap, I analysed the GC logs not having any
> issues
> > with major GC's .
> >
> > Using G1GC , does increasing new_heap will help ??
> >
>
> Increasing new heap size generally helps when you're seeing a lot of
> promotion - if you're not seeing long major GCs, are you seeing a lot of
> promotion from eden to old gen?
>
> You don't typically set -Xmn (new heap size) when using G1GC
>
>
> > currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> > increase heap to lets say 2GB is that effective b/c young GC's will kick
> in
> > more frequently  to complete in 500ms right ??
>
> Min heap size for G1 is probably 16G. You were saying "new_heap", now
> you're just saying heap - do you mean new here, or total heap size?
>
>
> You tablestats show a max partition size of 75M, which isn't nearly as bad
> as I was expecting (or not nearly as bad as we often see when people ask
> this question). You do occasionally scan a lot of sstables (up to 20?), so
> I'm assuming you have STCS - you may benefit from switching to LCS to try
> to limit the number of sstables you touch on read. Also, if your data isn't
> in memory (if your data set is larger than RAM and reads are random), you
> may benefit from a much lower compression_chunk_size - the default is 64k,
> but 4k or 16k is often much better if you do have to read from disk.
>
> - Jeff
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: READ Queries timing out.

Posted by Jeff Jirsa <jj...@apache.org>.

On 2017-07-07 07:26 (-0700), Pranay akula <pr...@gmail.com> wrote: 
> Lately i am seeing some select queries timing out, data modelling to blame
> for but not in a situation to redo it.
> 
> Does increasing heap will help ??
> 
> currently using 1GB new_heap, I analysed the GC logs not having any issues
> with major GC's .
> 
> Using G1GC , does increasing new_heap will help ??
> 

Increasing new heap size generally helps when you're seeing a lot of promotion - if you're not seeing long major GCs, are you seeing a lot of promotion from eden to old gen? 

You don't typically set -Xmn (new heap size) when using G1GC 

> currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> increase heap to lets say 2GB is that effective b/c young GC's will kick in
> more frequently  to complete in 500ms right ??

Min heap size for G1 is probably 16G. You were saying "new_heap", now you're just saying heap - do you mean new here, or total heap size?

You tablestats show a max partition size of 75M, which isn't nearly as bad as I was expecting (or not nearly as bad as we often see when people ask this question). You do occasionally scan a lot of sstables (up to 20?), so I'm assuming you have STCS - you may benefit from switching to LCS to try to limit the number of sstables you touch on read. Also, if your data isn't in memory (if your data set is larger than RAM and reads are random), you may benefit from a much lower compression_chunk_size - the default is 64k, but 4k or 16k is often much better if you do have to read from disk.

- Jeff

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org