You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dan Kuebrich <da...@gmail.com> on 2011/02/04 20:13:55 UTC

read latency in cassandra

Hi all,

It often takes more than two seconds to load:

- one row of ~450 events comprising ~600k
- cluster size of 1
- client is pycassa 1.04
- timeout on recv
- cold read (I believe)
- load generally < 0.5 on a 4-core machine, 2 EC2 instance store drives for
cassandra
- cpu wait generally < 1%

Often the following sequence occurs:

1. First attempt times out after 2 sec
2. Second attempt loads fine on immediate retry

So, I assume it's an issue about cache miss and going to disk.  Is 2 seconds
the normal "I went to disk" latency for cassandra?  What should we look to
tune, if anything? I don't think keeping everything in-memory is an option
for us given dataset size and access pattern (hot set is stuff being
currently written, stuff being accessed is likely to be older).

I didn't notice this problem with cassandra 0.6.8 and pycassa 0.3.

Thanks,
dan

Re: read latency in cassandra

Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 4, 2011 at 11:13 AM, Dan Kuebrich <da...@gmail.com> wrote:
> Is 2 seconds the normal "I went to disk" latency for cassandra?

Cassandra exposes metrics on a per-CF basis which indicate latency.
This includes both cache hits and misses, as well as requests for rows
which do not exist. It does NOT include an assortment of other latency
causing things, like thrift.

If you see two seconds of latency from the perspective of your
application, you should compare it to the latency numbers Cassandra
reports. If you are getting timed-out exceptions, that does seem
relatively likely to be the cold cache "I went to disk" case, and the
Cassandra latency numbers should reflect that.

=Rob

Re: read latency in cassandra

Posted by Dan Kuebrich <da...@gmail.com>.
Thanks for the probing questions.  Just got around to looking into this a
bit more and discovered it's not exactly what I thought.  It's actually a
single multiget_slice with a fairly slow response time.

I've answered as many of the questions as I can, and have also included a
reply to Rob's statements below.  In short, I don't fully understand the
read latency metrics from JMX but they don't seem to line up well with my
client-side latency measurements.

All numeric stats are from a dev environment which has smaller use+data
volume than production but still exhibits long multiget_slice waits leading
to client timeouts.

Responses inline:

On Fri, Feb 4, 2011 at 3:02 PM, aaron morton <aa...@thelastpickle.com>wrote:

> What operation are you calling ? Are you trying to read the entire row
> back?
>
It's a multiget_slice of 10-200 rows from a single CF.  Reading two columns
for each row, which is the entire row entry.


> How many SSTables do you have for the CF? Does your data have a lot of
> overwrites ? Have you modified the default compaction settings ?

Compaction settings are default AFAIK.  However is currently a 1-node
cluster, so it's running with gc_grace_seconds=0. If overwrite means an
update (?), yes--though the data will always be the same, we do write the
same entry repeatedly as a way of refreshing its TTL.  For some entries,
this may be multiple times per minute.

LiveDiskSpaceUsed: 1927107766
LiveSSTableCount: 6


> Do you have row cache enabled ?
>
No.  Given the fairly large set of data and occasional random access, I
doubt that row cache will be a good solution to this problem?  Here's the
settings:
ColumnFamily: Backtraces
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period: 0.0/0
      Key cache size / save period: 200000.0/3600
      Memtable thresholds: 0.3/64/60
      GC grace seconds: 0
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Built indexes: []

 Can you use JConsole to check the read latency for the CF?
>

Used a jython JMX script I found on the mailing list.  Output is below,
though I'm not sure how to interpret the histograms.  I take it
RecentReadLatency is the very last observed request?  If so, this was for a
multiget of 99 items.  Or is that for one particular get in the multiget?

objectname=
org.apache.cassandra.db:type=ColumnFamilies,keyspace=Tracelytics,columnfamily=Backtraces
PendingTasks: 0
MemtableFlushAfterMins: 60
MemtableOperationsInMillions: 0.3
MinRowSize: 310
MaxRowSize: 43388628
MeanRowSize: 1848
ColumnFamilyName: Backtraces
MemtableColumnsCount: 53880
MemtableDataSize: 51052251
MemtableSwitchCount: 470
RecentSSTablesPerReadHistogram: array('l',[34L, 0L, 6L, 12L, 10L, 101L,
106L, 160L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L])
SSTablesPerReadHistogram: array('l',[1747L, 2196L, 1319L, 578L, 1300L,
5815L, 2108L, 1649L, 13L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L])
ReadCount: 16725
RecentReadLatencyMicros: 133399.0536130536
LifetimeReadLatencyHistogramMicros: array('l',[0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 38L, 52L, 57L, 50L, 92L, 166L, 289L, 283L, 318L, 223L, 137L,
108L, 65L, 57L, 287L, 549L, 408L, 357L, 538L, 734L, 883L, 548L, 478L, 329L,
244L, 226L, 168L, 262L, 291L, 290L, 355L, 611L, 965L, 836L, 694L, 381L,
136L, 137L, 178L, 326L, 181L, 140L, 188L, 151L, 146L, 146L, 105L, 130L,
102L, 106L, 130L, 135L, 144L, 149L, 173L, 201L, 191L, 188L, 208L, 203L,
155L, 122L, 74L, 45L, 26L, 18L, 15L, 5L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L])
RecentReadLatencyHistogramMicros: array('l',[0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 2L, 2L, 2L, 8L, 8L, 5L, 4L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 4L, 0L, 0L, 0L, 4L, 9L, 3L, 4L, 8L, 13L, 24L, 19L, 32L, 51L, 22L,
17L, 7L, 1L, 1L, 1L, 1L, 4L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 5L, 12L, 9L, 11L,
8L, 8L, 15L, 13L, 20L, 12L, 15L, 13L, 7L, 5L, 5L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L])
TotalReadLatencyMicros: 1121884414
WriteCount: 10944316
TotalWriteLatencyMicros: 386100317
RecentWriteLatencyMicros: 34.614922206506364
LifetimeWriteLatencyHistogramMicros: array('l',[0L, 9L, 1181L, 5950L,
12524L, 19751L, 27521L, 34963L, 96032L, 158925L, 278830L, 615547L, 664807L,
951370L, 1334996L, 1652907L, 1745019L, 1517550L, 1091827L, 520153L, 164867L,
34155L, 6750L, 3239L, 1890L, 909L, 534L, 398L, 285L, 232L, 179L, 159L, 133L,
130L, 85L, 71L, 64L, 42L, 40L, 27L, 40L, 40L, 33L, 15L, 8L, 12L, 8L, 7L, 7L,
11L, 10L, 8L, 5L, 7L, 5L, 12L, 19L, 11L, 5L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L])
RecentWriteLatencyHistogramMicros: array('l',[0L, 0L, 0L, 0L, 0L, 2L, 3L,
3L, 17L, 127L, 228L, 526L, 544L, 967L, 1521L, 1953L, 2177L, 1801L, 1043L,
333L, 47L, 6L, 7L, 1L, 2L, 2L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L])
TotalDiskSpaceUsed: 12194409539
LiveDiskSpaceUsed: 1936759116
LiveSSTableCount: 7
BloomFilterFalsePositives: 2
RecentBloomFilterFalsePositives: 1
BloomFilterFalseRatio: 7.037297677691766E-4
RecentBloomFilterFalseRatio: 0.0
MinimumCompactionThreshold: 4
MaximumCompactionThreshold: 32
MemtableThroughputInMB: 64
EstimatedRowSizeHistogram: array('l',[0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 2L, 1799L, 1192L, 37786L, 196298L, 346833L, 154421L, 87274L, 33783L,
11927L, 7079L, 4770L, 9197L, 18893L, 8010L, 2984L, 1856L, 1332L, 988L, 719L,
845L, 782L, 504L, 725L, 315L, 257L, 290L, 307L, 205L, 139L, 142L, 92L, 103L,
90L, 84L, 112L, 44L, 76L, 50L, 24L, 37L, 15L, 31L, 23L, 9L, 6L, 17L, 8L, 3L,
5L, 10L, 3L, 0L, 0L, 8L, 2L, 2L, 0L, 1L, 1L, 0L])
EstimatedColumnCountHistogram: array('l',[0L, 0L, 3430L, 350443L, 28549L,
15586L, 9720L, 7111L, 9914L, 8150L, 4278L, 6069L, 2684L, 4458L, 2978L,
1677L, 1516L, 1207L, 1417L, 1528L, 901L, 801L, 765L, 790L, 437L, 627L, 476L,
369L, 309L, 415L, 576L, 516L, 34038L, 105418L, 229613L, 37927L, 38270L,
7787L, 2703L, 1769L, 1128L, 1093L, 1439L, 567L, 514L, 444L, 231L, 193L,
132L, 218L, 251L, 177L, 191L, 40L, 59L, 81L, 86L, 60L, 45L, 52L, 36L, 45L,
33L, 14L, 8L, 21L, 21L, 24L, 10L, 3L, 9L, 5L, 18L, 5L, 3L, 15L, 0L, 1L, 3L,
1L, 1L, 0L, 0L, 8L, 2L, 2L, 0L, 0L, 0L, 0L])
BuiltIndexes: []


Re: Rob's comments:

Cassandra exposes metrics on a per-CF basis which indicate latency.
This includes both cache hits and misses, as well as requests for rows
which do not exist. It does NOT include an assortment of other latency
causing things, like thrift.
If you see two seconds of latency from the perspective of your
application, you should compare it to the latency numbers Cassandra
reports. If you are getting timed-out exceptions, that does seem
relatively likely to be the cold cache "I went to disk" case, and the
Cassandra latency numbers should reflect that.


On the client side, I measured ~2 seconds elapsed for the 99-item multiget,
after which the client (pycassa) timed it out.  The timing is measured at
the thrift client level below pycassa, so it should avoid library internals.
 I'm not sure how to compare this to the jmx latency report.

Apologies for being so verbose!

dan

Sorry for all the questions, the answer to your initial question is "mmm,
> that does not sound right. It will depend on...."
>
> Aaron
>
> On 5 Feb 2011, at 08:13, Dan Kuebrich wrote:
>
> > Hi all,
> >
> > It often takes more than two seconds to load:
> >
> > - one row of ~450 events comprising ~600k
> > - cluster size of 1
> > - client is pycassa 1.04
> > - timeout on recv
> > - cold read (I believe)
> > - load generally < 0.5 on a 4-core machine, 2 EC2 instance store drives
> for cassandra
> > - cpu wait generally < 1%
> >
> > Often the following sequence occurs:
> >
> > 1. First attempt times out after 2 sec
> > 2. Second attempt loads fine on immediate retry
> >
> > So, I assume it's an issue about cache miss and going to disk.  Is 2
> seconds the normal "I went to disk" latency for cassandra?  What should we
> look to tune, if anything? I don't think keeping everything in-memory is an
> option for us given dataset size and access pattern (hot set is stuff being
> currently written, stuff being accessed is likely to be older).
> >
> > I didn't notice this problem with cassandra 0.6.8 and pycassa 0.3.
> >
> > Thanks,
> > dan
>
>

Re: read latency in cassandra

Posted by aaron morton <aa...@thelastpickle.com>.
What operation are you calling ? Are you trying to read the entire row back?

How many SSTables do you have for the CF? Does your data have a lot of overwrites ? Have you modified the default compaction settings ?

Do you have row cache enabled ? 

How long does the second request take ?

Can you use JConsole to check the read latency for the CF?

Sorry for all the questions, the answer to your initial question is "mmm, that does not sound right. It will depend on...."

Aaron

On 5 Feb 2011, at 08:13, Dan Kuebrich wrote:

> Hi all,
> 
> It often takes more than two seconds to load:
> 
> - one row of ~450 events comprising ~600k
> - cluster size of 1
> - client is pycassa 1.04
> - timeout on recv
> - cold read (I believe)
> - load generally < 0.5 on a 4-core machine, 2 EC2 instance store drives for cassandra
> - cpu wait generally < 1%
> 
> Often the following sequence occurs:
> 
> 1. First attempt times out after 2 sec
> 2. Second attempt loads fine on immediate retry
> 
> So, I assume it's an issue about cache miss and going to disk.  Is 2 seconds the normal "I went to disk" latency for cassandra?  What should we look to tune, if anything? I don't think keeping everything in-memory is an option for us given dataset size and access pattern (hot set is stuff being currently written, stuff being accessed is likely to be older).
> 
> I didn't notice this problem with cassandra 0.6.8 and pycassa 0.3.
> 
> Thanks,
> dan