You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Pranay akula <pr...@gmail.com> on 2017/07/07 14:26:39 UTC
READ Queries timing out.
Lately i am seeing some select queries timing out, data modelling to blame
for but not in a situation to redo it.
Does increasing heap will help ??
currently using 1GB new_heap, I analysed the GC logs not having any issues
with major GC's .
Using G1GC , does increasing new_heap will help ??
currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
increase heap to lets say 2GB is that effective b/c young GC's will kick in
more frequently to complete in 500ms right ??
Thanks
Pranay.
RE: READ Queries timing out.
Posted by "Durity, Sean R" <SE...@homedepot.com>.
1 GB heap is very small. Why not try increasing it to 50% of RAM and see if it helps you track down the real issue. It is hard to tune around a bad data model, if that is indeed the issue. Seeing your tables and queries would help.
Sean Durity
From: Pranay akula [mailto:pranay.akula2323@gmail.com]
Sent: Friday, July 07, 2017 11:47 AM
To: user@cassandra.apache.org
Cc: ZAIDI, ASAD A <az...@att.com>
Subject: Re: READ Queries timing out.
Thanks ZAIDI,
Using C++ driver doesn't have tracing with driver so executing those from cqlsh. when i am tracing i am getting below error, i increased --request-timeout to 3600 in cqlsh.
ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
Statement trace did not complete within 10 seconds
The below are cfstats and cfhistograms, i can see read latency, cell count and Maximum live cells per slice (last five minutes) are high. is there any way to get around this with out changing data model.
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 1.00 20.00 NaN 1331 20
75% 2.00 29.00 NaN 6866 86
95% 8.00 60.00 NaN 126934 1331
98% 10.00 103.00 NaN 315852 3973
99% 12.00 149.00 NaN 545791 8239
Min 0.00 0.00 0.00 104 0
Max 20.00 12730764.00 9773372036884776000.00 74975550 83457
Read Count: 44514407
Read Latency: 82.92876612928933 ms.
Write Count: 3007585812
Write Latency: 0.07094456590853208 ms.
Pending Flushes: 0
SSTable count: 9
Space used (live): 66946214374
Space used (total): 66946214374
Space used by snapshots (total): 0
Off heap memory used (total): 33706492
SSTable Compression Ratio: 0.5598380206656697
Number of keys (estimate): 2483819
Memtable cell count: 15008
Memtable data size: 330597
Memtable off heap memory used: 518502
Memtable switch count: 39915
Local read count: 44514407
Local read latency: 82.929 ms
Local write count: 3007585849
Local write latency: 0.071 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 12623632
Bloom filter off heap memory used: 12623560
Index summary off heap memory used: 3285614
Compression metadata off heap memory used: 17278816
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 74975550
Compacted partition mean bytes: 27111
Average live cells per slice (last five minutes): 388.7486606077893
Maximum live cells per slice (last five minutes): 28983.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0
Thanks
Pranay.
On Fri, Jul 7, 2017 at 11:16 AM, Thakrar, Jayesh <jt...@conversantmedia.com>> wrote:
Can you provide more details.
E.g. table structure, the app used for the query, the query itself and the error message.
Also get the output of the following commands from your cluster nodes (note that one command uses "." and the other "space" between keyspace and tablename)
nodetool -h <hostname> tablestats <keyspace>.<tablename>
nodetool -h <hostname> tablehistograms <keyspace> <tablename>
Timeouts can happen at the client/application level (which can be tuned) and at the coordinator node level (which too can be tuned).
But again those timeouts are a symptom of something.
It can happen at the client side because of connection pool queue too full (which is likely due to response time from the cluster/coordinate nodes).
And the issues at the cluster side could be due to several reasons.
E.g. your query has to scan through too many tombstones, causing the delay or your query (if using filter).
From: "ZAIDI, ASAD A" <az...@att.com>>
Date: Friday, July 7, 2017 at 9:45 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: RE: READ Queries timing out.
>> I analysed the GC logs not having any issues with major GC's
If you don’t have issues on GC , than why do you want to [tune] GC parameters ?
Instead focus on why select queries are taking time.. may be take a look on their trace?
From: Pranay akula [mailto:pranay.akula2323@gmail.com<ma...@gmail.com>]
Sent: Friday, July 07, 2017 9:27 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: READ Queries timing out.
Lately i am seeing some select queries timing out, data modelling to blame for but not in a situation to redo it.
Does increasing heap will help ??
currently using 1GB new_heap, I analysed the GC logs not having any issues with major GC's .
Using G1GC , does increasing new_heap will help ??
currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i increase heap to lets say 2GB is that effective b/c young GC's will kick in more frequently to complete in 500ms right ??
Thanks
Pranay.
________________________________
The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: READ Queries timing out.
Posted by Pranay akula <pr...@gmail.com>.
Thanks ZAIDI,
Using C++ driver doesn't have tracing with driver so executing those from
cqlsh. when i am tracing i am getting below error, i increased
--request-timeout to 3600 in cqlsh.
> ReadTimeout: code=1200 [Coordinator node timed out waiting for replica
> nodes' responses] message="Operation timed out - received only 0
> responses." info={'received_responses': 0, 'required_responses': 1,
> 'consistency': 'ONE'}
> Statement trace did not complete within 10 seconds
The below are cfstats and cfhistograms, i can see read latency, cell count
and Maximum live cells per slice (last five minutes) are high. is there any
way to get around this with out changing data model.
Percentile SSTables Write Latency Read Latency Partition
> Size Cell Count
> (micros) (micros)
> (bytes)
> 50% 1.00 20.00 NaN
> 1331 20
> 75% 2.00 29.00 NaN
> 6866 86
> 95% 8.00 60.00 NaN
> 126934 1331
> 98% 10.00 103.00 NaN
> 315852 3973
> 99% 12.00 149.00 NaN
> 545791 8239
> Min 0.00 0.00 0.00
> 104 0
> Max 20.00 12730764.00 9773372036884776000.00
> 74975550 83457
Read Count: 44514407
> Read Latency: 82.92876612928933 ms.
> Write Count: 3007585812
> Write Latency: 0.07094456590853208 ms.
> Pending Flushes: 0
> SSTable count: 9
> Space used (live): 66946214374
> Space used (total): 66946214374
> Space used by snapshots (total): 0
> Off heap memory used (total): 33706492
> SSTable Compression Ratio: 0.5598380206656697
> Number of keys (estimate): 2483819
> Memtable cell count: 15008
> Memtable data size: 330597
> Memtable off heap memory used: 518502
> Memtable switch count: 39915
> Local read count: 44514407
> Local read latency: 82.929 ms
> Local write count: 3007585849
> Local write latency: 0.071 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.00000
> Bloom filter space used: 12623632
> Bloom filter off heap memory used: 12623560
> Index summary off heap memory used: 3285614
> Compression metadata off heap memory used: 17278816
> Compacted partition minimum bytes: 104
> Compacted partition maximum bytes: 74975550
> Compacted partition mean bytes: 27111
> Average live cells per slice (last five minutes): 388.7486606077893
> Maximum live cells per slice (last five minutes): 28983.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
Thanks
Pranay.
On Fri, Jul 7, 2017 at 11:16 AM, Thakrar, Jayesh <
jthakrar@conversantmedia.com> wrote:
> Can you provide more details.
>
> E.g. table structure, the app used for the query, the query itself and the
> error message.
>
>
>
> Also get the output of the following commands from your cluster nodes
> (note that one command uses "." and the other "space" between keyspace and
> tablename)
>
>
>
> nodetool -h <hostname> tablestats <keyspace>.<tablename>
>
> nodetool -h <hostname> tablehistograms <keyspace> <tablename>
>
>
>
> Timeouts can happen at the client/application level (which can be tuned)
> and at the coordinator node level (which too can be tuned).
>
> But again those timeouts are a symptom of something.
>
> It can happen at the client side because of connection pool queue too full
> (which is likely due to response time from the cluster/coordinate nodes).
>
> And the issues at the cluster side could be due to several reasons.
>
> E.g. your query has to scan through too many tombstones, causing the delay
> or your query (if using filter).
>
>
>
> *From: *"ZAIDI, ASAD A" <az...@att.com>
> *Date: *Friday, July 7, 2017 at 9:45 AM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *RE: READ Queries timing out.
>
>
>
> >> I analysed the GC logs not having any issues with major GC's
>
> If you don’t have issues on GC , than why do you want to
> [tune] GC parameters ?
>
> Instead focus on why select queries are taking time.. may be take a look
> on their trace?
>
>
>
>
>
> *From:* Pranay akula [mailto:pranay.akula2323@gmail.com]
> *Sent:* Friday, July 07, 2017 9:27 AM
> *To:* user@cassandra.apache.org
> *Subject:* READ Queries timing out.
>
>
>
> Lately i am seeing some select queries timing out, data modelling to blame
> for but not in a situation to redo it.
>
>
>
> Does increasing heap will help ??
>
>
>
> currently using 1GB new_heap, I analysed the GC logs not having any issues
> with major GC's .
>
>
>
> Using G1GC , does increasing new_heap will help ??
>
>
>
> currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> increase heap to lets say 2GB is that effective b/c young GC's will kick in
> more frequently to complete in 500ms right ??
>
>
>
>
>
> Thanks
>
> Pranay.
>
Re: READ Queries timing out.
Posted by "Thakrar, Jayesh" <jt...@conversantmedia.com>.
Can you provide more details.
E.g. table structure, the app used for the query, the query itself and the error message.
Also get the output of the following commands from your cluster nodes (note that one command uses "." and the other "space" between keyspace and tablename)
nodetool -h <hostname> tablestats <keyspace>.<tablename>
nodetool -h <hostname> tablehistograms <keyspace> <tablename>
Timeouts can happen at the client/application level (which can be tuned) and at the coordinator node level (which too can be tuned).
But again those timeouts are a symptom of something.
It can happen at the client side because of connection pool queue too full (which is likely due to response time from the cluster/coordinate nodes).
And the issues at the cluster side could be due to several reasons.
E.g. your query has to scan through too many tombstones, causing the delay or your query (if using filter).
From: "ZAIDI, ASAD A" <az...@att.com>
Date: Friday, July 7, 2017 at 9:45 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: RE: READ Queries timing out.
>> I analysed the GC logs not having any issues with major GC's
If you don’t have issues on GC , than why do you want to [tune] GC parameters ?
Instead focus on why select queries are taking time.. may be take a look on their trace?
From: Pranay akula [mailto:pranay.akula2323@gmail.com]
Sent: Friday, July 07, 2017 9:27 AM
To: user@cassandra.apache.org
Subject: READ Queries timing out.
Lately i am seeing some select queries timing out, data modelling to blame for but not in a situation to redo it.
Does increasing heap will help ??
currently using 1GB new_heap, I analysed the GC logs not having any issues with major GC's .
Using G1GC , does increasing new_heap will help ??
currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i increase heap to lets say 2GB is that effective b/c young GC's will kick in more frequently to complete in 500ms right ??
Thanks
Pranay.
Re: READ Queries timing out.
Posted by Pranay akula <pr...@gmail.com>.
Thanks ZAIDI,
The problem is the tracing queries are also getting timed out, so not sure
how to troubleshoot.
Does increasing new_heap will help reads ?? what other param's i can tune,
so that i can identify the issue.
Thanks
Pranay.
On Fri, Jul 7, 2017 at 10:45 AM, ZAIDI, ASAD A <az...@att.com> wrote:
> >> I analysed the GC logs not having any issues with major GC's
>
> If you don’t have issues on GC , than why do you want to
> [tune] GC parameters ?
>
> Instead focus on why select queries are taking time.. may be take a look
> on their trace?
>
>
>
>
>
> *From:* Pranay akula [mailto:pranay.akula2323@gmail.com]
> *Sent:* Friday, July 07, 2017 9:27 AM
> *To:* user@cassandra.apache.org
> *Subject:* READ Queries timing out.
>
>
>
> Lately i am seeing some select queries timing out, data modelling to blame
> for but not in a situation to redo it.
>
>
>
> Does increasing heap will help ??
>
>
>
> currently using 1GB new_heap, I analysed the GC logs not having any issues
> with major GC's .
>
>
>
> Using G1GC , does increasing new_heap will help ??
>
>
>
> currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> increase heap to lets say 2GB is that effective b/c young GC's will kick in
> more frequently to complete in 500ms right ??
>
>
>
>
>
> Thanks
>
> Pranay.
>
RE: READ Queries timing out.
Posted by "ZAIDI, ASAD A" <az...@att.com>.
>> I analysed the GC logs not having any issues with major GC's
If you don’t have issues on GC , than why do you want to [tune] GC parameters ?
Instead focus on why select queries are taking time.. may be take a look on their trace?
From: Pranay akula [mailto:pranay.akula2323@gmail.com]
Sent: Friday, July 07, 2017 9:27 AM
To: user@cassandra.apache.org
Subject: READ Queries timing out.
Lately i am seeing some select queries timing out, data modelling to blame for but not in a situation to redo it.
Does increasing heap will help ??
currently using 1GB new_heap, I analysed the GC logs not having any issues with major GC's .
Using G1GC , does increasing new_heap will help ??
currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i increase heap to lets say 2GB is that effective b/c young GC's will kick in more frequently to complete in 500ms right ??
Thanks
Pranay.
Re: READ Queries timing out.
Posted by Pranay akula <pr...@gmail.com>.
>
> Increasing new heap size generally helps when you're seeing a lot of
> promotion - if you're not seeing long major GCs, are you seeing a lot of
> promotion from eden to old gen?
> You don't typically set -Xmn (new heap size) when using G1GC
Nope not really
> currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> > increase heap to lets say 2GB is that effective b/c young GC's will kick
> in
> > more frequently to complete in 500ms right ??
> Min heap size for G1 is probably 16G. You were saying "new_heap", now
> you're just saying heap - do you mean new here, or total heap size?
Apologies, what i mean is new_heap.
>
>
> You tablestats show a max partition size of 75M, which isn't nearly as bad
> as I was expecting (or not nearly as bad as we often see when people ask
> this question). You do occasionally scan a lot of sstables (up to 20?), so
> I'm assuming you have STCS - you may benefit from switching to LCS to try
> to limit the number of sstables you touch on read. Also, if your data isn't
> in memory (if your data set is larger than RAM and reads are random), you
> may benefit from a much lower compression_chunk_size - the default is 64k,
> but 4k or 16k is often much better if you do have to read from disk.
probably i have to try decreasing chunk size as LCS will not suit as we are
write heavy
On Fri, Jul 7, 2017 at 7:13 PM, Jeff Jirsa <jj...@apache.org> wrote:
>
>
> On 2017-07-07 07:26 (-0700), Pranay akula <pr...@gmail.com>
> wrote:
> > Lately i am seeing some select queries timing out, data modelling to
> blame
> > for but not in a situation to redo it.
> >
> > Does increasing heap will help ??
> >
> > currently using 1GB new_heap, I analysed the GC logs not having any
> issues
> > with major GC's .
> >
> > Using G1GC , does increasing new_heap will help ??
> >
>
> Increasing new heap size generally helps when you're seeing a lot of
> promotion - if you're not seeing long major GCs, are you seeing a lot of
> promotion from eden to old gen?
>
> You don't typically set -Xmn (new heap size) when using G1GC
>
>
> > currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> > increase heap to lets say 2GB is that effective b/c young GC's will kick
> in
> > more frequently to complete in 500ms right ??
>
> Min heap size for G1 is probably 16G. You were saying "new_heap", now
> you're just saying heap - do you mean new here, or total heap size?
>
>
> You tablestats show a max partition size of 75M, which isn't nearly as bad
> as I was expecting (or not nearly as bad as we often see when people ask
> this question). You do occasionally scan a lot of sstables (up to 20?), so
> I'm assuming you have STCS - you may benefit from switching to LCS to try
> to limit the number of sstables you touch on read. Also, if your data isn't
> in memory (if your data set is larger than RAM and reads are random), you
> may benefit from a much lower compression_chunk_size - the default is 64k,
> but 4k or 16k is often much better if you do have to read from disk.
>
> - Jeff
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>
Re: READ Queries timing out.
Posted by Jeff Jirsa <jj...@apache.org>.
On 2017-07-07 07:26 (-0700), Pranay akula <pr...@gmail.com> wrote:
> Lately i am seeing some select queries timing out, data modelling to blame
> for but not in a situation to redo it.
>
> Does increasing heap will help ??
>
> currently using 1GB new_heap, I analysed the GC logs not having any issues
> with major GC's .
>
> Using G1GC , does increasing new_heap will help ??
>
Increasing new heap size generally helps when you're seeing a lot of promotion - if you're not seeing long major GCs, are you seeing a lot of promotion from eden to old gen?
You don't typically set -Xmn (new heap size) when using G1GC
> currently using JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500", even if i
> increase heap to lets say 2GB is that effective b/c young GC's will kick in
> more frequently to complete in 500ms right ??
Min heap size for G1 is probably 16G. You were saying "new_heap", now you're just saying heap - do you mean new here, or total heap size?
You tablestats show a max partition size of 75M, which isn't nearly as bad as I was expecting (or not nearly as bad as we often see when people ask this question). You do occasionally scan a lot of sstables (up to 20?), so I'm assuming you have STCS - you may benefit from switching to LCS to try to limit the number of sstables you touch on read. Also, if your data isn't in memory (if your data set is larger than RAM and reads are random), you may benefit from a much lower compression_chunk_size - the default is 64k, but 4k or 16k is often much better if you do have to read from disk.
- Jeff
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org