You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Edmond Lau <ed...@ooyala.com> on 2009/10/20 02:30:42 UTC

repeated timeouts on quorum reads

Whenever I try to do a quorum read on a row with a particularly large
supercolumn with get_slice under high load, cassandra throws timeouts.
 The reads for that row repeatedly fail until load decreases, but
smaller reads still succeed during that time.  bin/nodeprobe info
shows that the read latency for the column family spikes to up 6-8
seconds.  I've run into this issue since I started to play with
cassandra, but thought that it might go away with beefier nodes.  I've
since gotten more powerful machines, but the timeouts still happen.

Some details:
- cassandra 0.4.1
- 5 nodes, each with 12-core 800MHz with 8GB RAM, 5GB heap size
- replication factor of 3
- RandomPartitioner
- row that fails has a supercolumn with ~30K subcolumns, ~500 bytes
per cell, ~15MB total
- my failed quorum read lists 500 columns to read in the get_slice
call, but the same happens even when I read 100.

The nodes either timeout with 0 or 1 responses (2 of 3 required for a
quorum read):

ERROR [pool-1-thread-24] 2009-10-20 00:07:43,851 Cassandra.java (line
679) Internal error processing get_slice
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Operation timed out - received only 0 responses from  .
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.util.concurrent.TimeoutException: Operation timed out
- received only 0 responses from  .
        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        ... 9 more

ERROR [pool-1-thread-32] 2009-10-19 23:47:21,045 Cassandra.java (line
679) Internal error processing get_slice
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Operation timed out - received only 1 responses from
172.16.129.75:7000 .
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.util.concurrent.TimeoutException: Operation timed out
- received only 1 responses from 172.16.129.75:7000 .
        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        ... 9 more

Any ideas what the issue might be?

Edmond

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
Thanks for the help Jonathan.  Given that the current implementation
isn't optimized for large supercolumns and given that the current
thrift api doesn't support slicing a set of columns across multiple
supercolumns of the same row anyway, I agree that I'd be better off
just folding my supercolumns into separate row keys.

That's actually what my colleague has been doing for his HBase data
model since HBase doesn't have supercolumns; we're currently
evaluating Cassandra and HBase to see which one we should
productionize.

Edmond

On Thu, Oct 22, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Okay, so the fundamental problem is that deserializing a supercolumn
> with 30k subcolumns is really really slow. (Like we say on
> http://wiki.apache.org/cassandra/CassandraLimitations, "avoid a data
> model that requires large numbers of subcolumns.")
>
> But we were also being needlessly inefficient after deserialization;
> I've attached a patch (against trunk) to
> https://issues.apache.org/jira/browse/CASSANDRA-510.  This gives a
> 30-50% improvement in my tests.
>
> You're looking for more like an order of magnitude improvement though,
> so I would say splitting each supercolumn off into its own row is
> probably the way to go.
>
> -Jonathan
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
Okay, so the fundamental problem is that deserializing a supercolumn
with 30k subcolumns is really really slow. (Like we say on
http://wiki.apache.org/cassandra/CassandraLimitations, "avoid a data
model that requires large numbers of subcolumns.")

But we were also being needlessly inefficient after deserialization;
I've attached a patch (against trunk) to
https://issues.apache.org/jira/browse/CASSANDRA-510.  This gives a
30-50% improvement in my tests.

You're looking for more like an order of magnitude improvement though,
so I would say splitting each supercolumn off into its own row is
probably the way to go.

-Jonathan

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
      <ColumnFamily Name="movie" ColumnType="Super"
                    CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" />

On Thu, Oct 22, 2009 at 10:05 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> What is the CF definition for movie? the part from your config file like
>      <ColumnFamily CompareWith="BytesType" Name="Movie"/>
>
> On Thu, Oct 22, 2009 at 12:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> On Thu, Oct 22, 2009 at 10:59 AM, Edmond Lau <ed...@ooyala.com> wrote:
>>> Try:
>>>
>>> keyspace: "Analytics"
>>> key: "o:movie"
>>> column family: "movie"
>>> super column: "all"
>>>
>>> I was able to get timeouts with a few concurrent get_slice() calls
>>> (3-4 with debug logging, 6 without) on the super column, with start =
>>> "", finish = "", and count = 1.
>>>
>>> Edmond
>>>
>>> On Thu, Oct 22, 2009 at 7:42 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> What is your columnfamily definition?  What query should I test with?
>>>>
>>>> On Wed, Oct 21, 2009 at 7:33 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>> On Tue, Oct 20, 2009 at 2:03 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>> Okay, so we have 2 problems:
>>>>>>
>>>>>>  - the read is simply too slow
>>>>>>  - we're doing unnecessary(?) read repairs
>>>>>>
>>>>>> For the first I think it's going to be easiest if you can export the
>>>>>> row in question so we can do some profiling.  (Eric will have a tool
>>>>>> to do this soon.)
>>>>>
>>>>> Ok - I've created a cassandra setup with just the row in question and
>>>>> tarred up the data + storage-conf.xml for the three nodes:
>>>>>
>>>>> http://demo.ooyala.com/cassandra/cassandra-1.tgz
>>>>> http://demo.ooyala.com/cassandra/cassandra-2.tgz
>>>>> http://demo.ooyala.com/cassandra/cassandra-3.tgz
>>>>>
>>>>>>
>>>>>> For the second, how sure are you that the repair is unnecessary?  Does
>>>>>> it repeat the repair when you do the query again after letting the
>>>>>> callbacks complete?
>>>>>
>>>>> I didn't do any writes as I was debugging the issue whereas I was
>>>>> reading the same data a whole bunch of times, so the repair should've
>>>>> been unnecessary.
>>>>>
>>>>> Thanks for your help,
>>>>> Edmond
>>>>>
>>>>>>
>>>>>> -Jonathan
>>>>>>
>>>>>> On Mon, Oct 19, 2009 at 11:38 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>>>> A single local read with debug logging takes 3-4 secs on the node with
>>>>>>> 3 data files.  It actually takes roughly as long on the other nodes as
>>>>>>> well (I was incorrect in saying that they were fast).
>>>>>>>
>>>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.java
>>>>>>> (line 219) get_slice
>>>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java
>>>>>>> (line 468) weakreadlocal reading
>>>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>>>> superColumnName='[B@26e00675', columnName='null')', start='',
>>>>>>> finish='', reversed=false, count=1)
>>>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.java
>>>>>>> (line 115) collecting 1000036:false:515@1255978226411394
>>>>>>> DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383
>>>>>>> ConsistencyManager.java (line 168) Reading consistency digest for
>>>>>>> o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000]
>>>>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332
>>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>>> from 2926@172.16.129.71:7000
>>>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675
>>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>>> from 2926@172.16.129.72:7000
>>>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677
>>>>>>> ConsistencyManager.java (line 96) Performing read repair for o:movie
>>>>>>> to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000]
>>>>>>>
>>>>>>> Also, it's unclear why there's a read repair going on above, since I
>>>>>>> haven't made any updates.
>>>>>>>
>>>>>>> Doing a quorum read on the node with 3 data files takes 7 seconds w/
>>>>>>> debug logging and leads to a DigestMismatchException, again even
>>>>>>> though I haven't made any updates.
>>>>>>>
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.java
>>>>>>> (line 219) get_slice
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>>>>>> (line 374) strongread reading data for
>>>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>>>> finish='', reversed=false, count=1) from 3675@172.16.129.75:7000
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>>>>>> (line 381) strongread reading digest for
>>>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>>>> finish='', reversed=false, count=1) from 3676@172.16.129.71:7000
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java
>>>>>>> (line 381) strongread reading digest for
>>>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>>>> finish='', reversed=false, count=1) from 3676@172.16.129.72:7000
>>>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994
>>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000
>>>>>>> DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107
>>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000
>>>>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.java
>>>>>>> (line 115) collecting 1000036:false:515@1255978226411394
>>>>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.java
>>>>>>> (line 100) Read key o:movie; sending response to
>>>>>>> 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>>>>>> DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788
>>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789
>>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>>>> ms.
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790
>>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>>>> ms.
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791
>>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>>>>>> ms.
>>>>>>>  INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java
>>>>>>> (line 410) DigestMismatchException: o:movie
>>>>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953
>>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>>> from 3690@172.16.129.71:7000
>>>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130
>>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>>> from 3690@172.16.129.72:7000
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>>>>>> ms.
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>>>> ms.
>>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>>>> ReadResponseResolver.java (line 149) resolve: 1 ms.
>>>>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235
>>>>>>> SliceQueryFilter.java (line 115) collecting
>>>>>>> 1000036:false:515@1255978226411394
>>>>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.java
>>>>>>> (line 100) Read key o:movie; sending response to
>>>>>>> 3690@172.16.129.75:7000
>>>>>>>
>>>>>>> Edmond
>>>>>>>
>>>>>>> On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>>>> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>>>>>> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>>>>>>>>>
>>>>>>>>>> Does it still take ~8s if you direct a CL.ONE query at one of the
>>>>>>>>>> nodes you know has the data (i.e., a local read)?
>>>>>>>>>
>>>>>>>>> Local reads return quickly, but if you look at the logs below, there's
>>>>>>>>> 6 secs from when cassandra prints that it's reading a consistency
>>>>>>>>> digest to when it processes a callback.
>>>>>>>>
>>>>>>>> That's from when it sends the digest command, to when it gets a callback.
>>>>>>>>
>>>>>>>> Something doesn't add up.  If computing &sending a digest response
>>>>>>>> takes 6s, either reading the row or computing the digest must be
>>>>>>>> taking substantial time.  But if local reads are fast, it's not
>>>>>>>> reading the row that is slow, and if resolve is fast (from your logs,
>>>>>>>> it is 2ms) then it's not computing the digest that is slow.
>>>>>>>>
>>>>>>>> Try doing a local read from the node with 3 data files, see if that is slower.
>>>>>>>>
>>>>>>>> If you still see a fast read there, add some more debug logging in
>>>>>>>> ReadVerbHandler.java, where it says something like (I'm going from
>>>>>>>> trunk here but 0.4.1 is similar)
>>>>>>>>
>>>>>>>>            Row row = readCommand.getRow(table);
>>>>>>>>            ReadResponse readResponse;
>>>>>>>>            if (readCommand.isDigestQuery())
>>>>>>>>            {
>>>>>>>>                readResponse = new ReadResponse(row.digest());
>>>>>>>>
>>>>>>>> time the getRow and the row.digest() calls and log how long those take
>>>>>>>> in the quorum scenario.
>>>>>>>>
>>>>>>>> -Jonathan
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
What is the CF definition for movie? the part from your config file like
      <ColumnFamily CompareWith="BytesType" Name="Movie"/>

On Thu, Oct 22, 2009 at 12:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Thu, Oct 22, 2009 at 10:59 AM, Edmond Lau <ed...@ooyala.com> wrote:
>> Try:
>>
>> keyspace: "Analytics"
>> key: "o:movie"
>> column family: "movie"
>> super column: "all"
>>
>> I was able to get timeouts with a few concurrent get_slice() calls
>> (3-4 with debug logging, 6 without) on the super column, with start =
>> "", finish = "", and count = 1.
>>
>> Edmond
>>
>> On Thu, Oct 22, 2009 at 7:42 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> What is your columnfamily definition?  What query should I test with?
>>>
>>> On Wed, Oct 21, 2009 at 7:33 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>> On Tue, Oct 20, 2009 at 2:03 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> Okay, so we have 2 problems:
>>>>>
>>>>>  - the read is simply too slow
>>>>>  - we're doing unnecessary(?) read repairs
>>>>>
>>>>> For the first I think it's going to be easiest if you can export the
>>>>> row in question so we can do some profiling.  (Eric will have a tool
>>>>> to do this soon.)
>>>>
>>>> Ok - I've created a cassandra setup with just the row in question and
>>>> tarred up the data + storage-conf.xml for the three nodes:
>>>>
>>>> http://demo.ooyala.com/cassandra/cassandra-1.tgz
>>>> http://demo.ooyala.com/cassandra/cassandra-2.tgz
>>>> http://demo.ooyala.com/cassandra/cassandra-3.tgz
>>>>
>>>>>
>>>>> For the second, how sure are you that the repair is unnecessary?  Does
>>>>> it repeat the repair when you do the query again after letting the
>>>>> callbacks complete?
>>>>
>>>> I didn't do any writes as I was debugging the issue whereas I was
>>>> reading the same data a whole bunch of times, so the repair should've
>>>> been unnecessary.
>>>>
>>>> Thanks for your help,
>>>> Edmond
>>>>
>>>>>
>>>>> -Jonathan
>>>>>
>>>>> On Mon, Oct 19, 2009 at 11:38 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>>> A single local read with debug logging takes 3-4 secs on the node with
>>>>>> 3 data files.  It actually takes roughly as long on the other nodes as
>>>>>> well (I was incorrect in saying that they were fast).
>>>>>>
>>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.java
>>>>>> (line 219) get_slice
>>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java
>>>>>> (line 468) weakreadlocal reading
>>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>>> superColumnName='[B@26e00675', columnName='null')', start='',
>>>>>> finish='', reversed=false, count=1)
>>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.java
>>>>>> (line 115) collecting 1000036:false:515@1255978226411394
>>>>>> DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383
>>>>>> ConsistencyManager.java (line 168) Reading consistency digest for
>>>>>> o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000]
>>>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332
>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>> from 2926@172.16.129.71:7000
>>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675
>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>> from 2926@172.16.129.72:7000
>>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677
>>>>>> ConsistencyManager.java (line 96) Performing read repair for o:movie
>>>>>> to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000]
>>>>>>
>>>>>> Also, it's unclear why there's a read repair going on above, since I
>>>>>> haven't made any updates.
>>>>>>
>>>>>> Doing a quorum read on the node with 3 data files takes 7 seconds w/
>>>>>> debug logging and leads to a DigestMismatchException, again even
>>>>>> though I haven't made any updates.
>>>>>>
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.java
>>>>>> (line 219) get_slice
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>>>>> (line 374) strongread reading data for
>>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>>> finish='', reversed=false, count=1) from 3675@172.16.129.75:7000
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>>>>> (line 381) strongread reading digest for
>>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>>> finish='', reversed=false, count=1) from 3676@172.16.129.71:7000
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java
>>>>>> (line 381) strongread reading digest for
>>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>>> finish='', reversed=false, count=1) from 3676@172.16.129.72:7000
>>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994
>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000
>>>>>> DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107
>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000
>>>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.java
>>>>>> (line 115) collecting 1000036:false:515@1255978226411394
>>>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.java
>>>>>> (line 100) Read key o:movie; sending response to
>>>>>> 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>>>>> DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788
>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789
>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>>> ms.
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790
>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>>> ms.
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791
>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>>>>> ms.
>>>>>>  INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java
>>>>>> (line 410) DigestMismatchException: o:movie
>>>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953
>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>> from 3690@172.16.129.71:7000
>>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130
>>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>>> from 3690@172.16.129.72:7000
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>>>>> ms.
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>>> ms.
>>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>>> ReadResponseResolver.java (line 149) resolve: 1 ms.
>>>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235
>>>>>> SliceQueryFilter.java (line 115) collecting
>>>>>> 1000036:false:515@1255978226411394
>>>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.java
>>>>>> (line 100) Read key o:movie; sending response to
>>>>>> 3690@172.16.129.75:7000
>>>>>>
>>>>>> Edmond
>>>>>>
>>>>>> On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>>> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>>>>> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>>>>>>>>
>>>>>>>>> Does it still take ~8s if you direct a CL.ONE query at one of the
>>>>>>>>> nodes you know has the data (i.e., a local read)?
>>>>>>>>
>>>>>>>> Local reads return quickly, but if you look at the logs below, there's
>>>>>>>> 6 secs from when cassandra prints that it's reading a consistency
>>>>>>>> digest to when it processes a callback.
>>>>>>>
>>>>>>> That's from when it sends the digest command, to when it gets a callback.
>>>>>>>
>>>>>>> Something doesn't add up.  If computing &sending a digest response
>>>>>>> takes 6s, either reading the row or computing the digest must be
>>>>>>> taking substantial time.  But if local reads are fast, it's not
>>>>>>> reading the row that is slow, and if resolve is fast (from your logs,
>>>>>>> it is 2ms) then it's not computing the digest that is slow.
>>>>>>>
>>>>>>> Try doing a local read from the node with 3 data files, see if that is slower.
>>>>>>>
>>>>>>> If you still see a fast read there, add some more debug logging in
>>>>>>> ReadVerbHandler.java, where it says something like (I'm going from
>>>>>>> trunk here but 0.4.1 is similar)
>>>>>>>
>>>>>>>            Row row = readCommand.getRow(table);
>>>>>>>            ReadResponse readResponse;
>>>>>>>            if (readCommand.isDigestQuery())
>>>>>>>            {
>>>>>>>                readResponse = new ReadResponse(row.digest());
>>>>>>>
>>>>>>> time the getRow and the row.digest() calls and log how long those take
>>>>>>> in the quorum scenario.
>>>>>>>
>>>>>>> -Jonathan
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
On Thu, Oct 22, 2009 at 10:59 AM, Edmond Lau <ed...@ooyala.com> wrote:
> Try:
>
> keyspace: "Analytics"
> key: "o:movie"
> column family: "movie"
> super column: "all"
>
> I was able to get timeouts with a few concurrent get_slice() calls
> (3-4 with debug logging, 6 without) on the super column, with start =
> "", finish = "", and count = 1.
>
> Edmond
>
> On Thu, Oct 22, 2009 at 7:42 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> What is your columnfamily definition?  What query should I test with?
>>
>> On Wed, Oct 21, 2009 at 7:33 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>> On Tue, Oct 20, 2009 at 2:03 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> Okay, so we have 2 problems:
>>>>
>>>>  - the read is simply too slow
>>>>  - we're doing unnecessary(?) read repairs
>>>>
>>>> For the first I think it's going to be easiest if you can export the
>>>> row in question so we can do some profiling.  (Eric will have a tool
>>>> to do this soon.)
>>>
>>> Ok - I've created a cassandra setup with just the row in question and
>>> tarred up the data + storage-conf.xml for the three nodes:
>>>
>>> http://demo.ooyala.com/cassandra/cassandra-1.tgz
>>> http://demo.ooyala.com/cassandra/cassandra-2.tgz
>>> http://demo.ooyala.com/cassandra/cassandra-3.tgz
>>>
>>>>
>>>> For the second, how sure are you that the repair is unnecessary?  Does
>>>> it repeat the repair when you do the query again after letting the
>>>> callbacks complete?
>>>
>>> I didn't do any writes as I was debugging the issue whereas I was
>>> reading the same data a whole bunch of times, so the repair should've
>>> been unnecessary.
>>>
>>> Thanks for your help,
>>> Edmond
>>>
>>>>
>>>> -Jonathan
>>>>
>>>> On Mon, Oct 19, 2009 at 11:38 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>> A single local read with debug logging takes 3-4 secs on the node with
>>>>> 3 data files.  It actually takes roughly as long on the other nodes as
>>>>> well (I was incorrect in saying that they were fast).
>>>>>
>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.java
>>>>> (line 219) get_slice
>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java
>>>>> (line 468) weakreadlocal reading
>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>> superColumnName='[B@26e00675', columnName='null')', start='',
>>>>> finish='', reversed=false, count=1)
>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.java
>>>>> (line 115) collecting 1000036:false:515@1255978226411394
>>>>> DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383
>>>>> ConsistencyManager.java (line 168) Reading consistency digest for
>>>>> o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000]
>>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332
>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>> from 2926@172.16.129.71:7000
>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675
>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>> from 2926@172.16.129.72:7000
>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677
>>>>> ConsistencyManager.java (line 96) Performing read repair for o:movie
>>>>> to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000]
>>>>>
>>>>> Also, it's unclear why there's a read repair going on above, since I
>>>>> haven't made any updates.
>>>>>
>>>>> Doing a quorum read on the node with 3 data files takes 7 seconds w/
>>>>> debug logging and leads to a DigestMismatchException, again even
>>>>> though I haven't made any updates.
>>>>>
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.java
>>>>> (line 219) get_slice
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>>>> (line 374) strongread reading data for
>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>> finish='', reversed=false, count=1) from 3675@172.16.129.75:7000
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>>>> (line 381) strongread reading digest for
>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>> finish='', reversed=false, count=1) from 3676@172.16.129.71:7000
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java
>>>>> (line 381) strongread reading digest for
>>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>>> finish='', reversed=false, count=1) from 3676@172.16.129.72:7000
>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994
>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000
>>>>> DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107
>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000
>>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.java
>>>>> (line 115) collecting 1000036:false:515@1255978226411394
>>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.java
>>>>> (line 100) Read key o:movie; sending response to
>>>>> 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>>>> DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788
>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789
>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>> ms.
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790
>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>> ms.
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791
>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>>>> ms.
>>>>>  INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java
>>>>> (line 410) DigestMismatchException: o:movie
>>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953
>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>> from 3690@172.16.129.71:7000
>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130
>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>>> from 3690@172.16.129.72:7000
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>>>> ms.
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>>> ms.
>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>>> ReadResponseResolver.java (line 149) resolve: 1 ms.
>>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235
>>>>> SliceQueryFilter.java (line 115) collecting
>>>>> 1000036:false:515@1255978226411394
>>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.java
>>>>> (line 100) Read key o:movie; sending response to
>>>>> 3690@172.16.129.75:7000
>>>>>
>>>>> Edmond
>>>>>
>>>>> On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>>>> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>>>>>>>
>>>>>>>> Does it still take ~8s if you direct a CL.ONE query at one of the
>>>>>>>> nodes you know has the data (i.e., a local read)?
>>>>>>>
>>>>>>> Local reads return quickly, but if you look at the logs below, there's
>>>>>>> 6 secs from when cassandra prints that it's reading a consistency
>>>>>>> digest to when it processes a callback.
>>>>>>
>>>>>> That's from when it sends the digest command, to when it gets a callback.
>>>>>>
>>>>>> Something doesn't add up.  If computing &sending a digest response
>>>>>> takes 6s, either reading the row or computing the digest must be
>>>>>> taking substantial time.  But if local reads are fast, it's not
>>>>>> reading the row that is slow, and if resolve is fast (from your logs,
>>>>>> it is 2ms) then it's not computing the digest that is slow.
>>>>>>
>>>>>> Try doing a local read from the node with 3 data files, see if that is slower.
>>>>>>
>>>>>> If you still see a fast read there, add some more debug logging in
>>>>>> ReadVerbHandler.java, where it says something like (I'm going from
>>>>>> trunk here but 0.4.1 is similar)
>>>>>>
>>>>>>            Row row = readCommand.getRow(table);
>>>>>>            ReadResponse readResponse;
>>>>>>            if (readCommand.isDigestQuery())
>>>>>>            {
>>>>>>                readResponse = new ReadResponse(row.digest());
>>>>>>
>>>>>> time the getRow and the row.digest() calls and log how long those take
>>>>>> in the quorum scenario.
>>>>>>
>>>>>> -Jonathan
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
Try:

keyspace: "Analytics"
key: "o:movie"
column family: "movie"
super column: "all"

I was able to get timeouts with a few concurrent get_slice() calls
(3-4 with debug logging, 6 without) on the super column, with start =
"", finish = "", and count = 1.

Edmond

On Thu, Oct 22, 2009 at 7:42 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> What is your columnfamily definition?  What query should I test with?
>
> On Wed, Oct 21, 2009 at 7:33 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> On Tue, Oct 20, 2009 at 2:03 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> Okay, so we have 2 problems:
>>>
>>>  - the read is simply too slow
>>>  - we're doing unnecessary(?) read repairs
>>>
>>> For the first I think it's going to be easiest if you can export the
>>> row in question so we can do some profiling.  (Eric will have a tool
>>> to do this soon.)
>>
>> Ok - I've created a cassandra setup with just the row in question and
>> tarred up the data + storage-conf.xml for the three nodes:
>>
>> http://demo.ooyala.com/cassandra/cassandra-1.tgz
>> http://demo.ooyala.com/cassandra/cassandra-2.tgz
>> http://demo.ooyala.com/cassandra/cassandra-3.tgz
>>
>>>
>>> For the second, how sure are you that the repair is unnecessary?  Does
>>> it repeat the repair when you do the query again after letting the
>>> callbacks complete?
>>
>> I didn't do any writes as I was debugging the issue whereas I was
>> reading the same data a whole bunch of times, so the repair should've
>> been unnecessary.
>>
>> Thanks for your help,
>> Edmond
>>
>>>
>>> -Jonathan
>>>
>>> On Mon, Oct 19, 2009 at 11:38 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>> A single local read with debug logging takes 3-4 secs on the node with
>>>> 3 data files.  It actually takes roughly as long on the other nodes as
>>>> well (I was incorrect in saying that they were fast).
>>>>
>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.java
>>>> (line 219) get_slice
>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java
>>>> (line 468) weakreadlocal reading
>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>> superColumnName='[B@26e00675', columnName='null')', start='',
>>>> finish='', reversed=false, count=1)
>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.java
>>>> (line 115) collecting 1000036:false:515@1255978226411394
>>>> DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383
>>>> ConsistencyManager.java (line 168) Reading consistency digest for
>>>> o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000]
>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332
>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>> from 2926@172.16.129.71:7000
>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675
>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>> from 2926@172.16.129.72:7000
>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677
>>>> ConsistencyManager.java (line 96) Performing read repair for o:movie
>>>> to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000]
>>>>
>>>> Also, it's unclear why there's a read repair going on above, since I
>>>> haven't made any updates.
>>>>
>>>> Doing a quorum read on the node with 3 data files takes 7 seconds w/
>>>> debug logging and leads to a DigestMismatchException, again even
>>>> though I haven't made any updates.
>>>>
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.java
>>>> (line 219) get_slice
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>>> (line 374) strongread reading data for
>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>> finish='', reversed=false, count=1) from 3675@172.16.129.75:7000
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>>> (line 381) strongread reading digest for
>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>> finish='', reversed=false, count=1) from 3676@172.16.129.71:7000
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java
>>>> (line 381) strongread reading digest for
>>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>>> column_parent='QueryPath(columnFamilyName='movie',
>>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>>> finish='', reversed=false, count=1) from 3676@172.16.129.72:7000
>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994
>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000
>>>> DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107
>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000
>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.java
>>>> (line 115) collecting 1000036:false:515@1255978226411394
>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.java
>>>> (line 100) Read key o:movie; sending response to
>>>> 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>>> DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788
>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789
>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>> ms.
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790
>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>> ms.
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791
>>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>>> ms.
>>>>  INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java
>>>> (line 410) DigestMismatchException: o:movie
>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953
>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>> from 3690@172.16.129.71:7000
>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130
>>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>>> from 3690@172.16.129.72:7000
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>>> ms.
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>>> ms.
>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>>> ReadResponseResolver.java (line 149) resolve: 1 ms.
>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235
>>>> SliceQueryFilter.java (line 115) collecting
>>>> 1000036:false:515@1255978226411394
>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.java
>>>> (line 100) Read key o:movie; sending response to
>>>> 3690@172.16.129.75:7000
>>>>
>>>> Edmond
>>>>
>>>> On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>>> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>>>>>>
>>>>>>> Does it still take ~8s if you direct a CL.ONE query at one of the
>>>>>>> nodes you know has the data (i.e., a local read)?
>>>>>>
>>>>>> Local reads return quickly, but if you look at the logs below, there's
>>>>>> 6 secs from when cassandra prints that it's reading a consistency
>>>>>> digest to when it processes a callback.
>>>>>
>>>>> That's from when it sends the digest command, to when it gets a callback.
>>>>>
>>>>> Something doesn't add up.  If computing &sending a digest response
>>>>> takes 6s, either reading the row or computing the digest must be
>>>>> taking substantial time.  But if local reads are fast, it's not
>>>>> reading the row that is slow, and if resolve is fast (from your logs,
>>>>> it is 2ms) then it's not computing the digest that is slow.
>>>>>
>>>>> Try doing a local read from the node with 3 data files, see if that is slower.
>>>>>
>>>>> If you still see a fast read there, add some more debug logging in
>>>>> ReadVerbHandler.java, where it says something like (I'm going from
>>>>> trunk here but 0.4.1 is similar)
>>>>>
>>>>>            Row row = readCommand.getRow(table);
>>>>>            ReadResponse readResponse;
>>>>>            if (readCommand.isDigestQuery())
>>>>>            {
>>>>>                readResponse = new ReadResponse(row.digest());
>>>>>
>>>>> time the getRow and the row.digest() calls and log how long those take
>>>>> in the quorum scenario.
>>>>>
>>>>> -Jonathan
>>>>>
>>>>
>>>
>>
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
What is your columnfamily definition?  What query should I test with?

On Wed, Oct 21, 2009 at 7:33 PM, Edmond Lau <ed...@ooyala.com> wrote:
> On Tue, Oct 20, 2009 at 2:03 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Okay, so we have 2 problems:
>>
>>  - the read is simply too slow
>>  - we're doing unnecessary(?) read repairs
>>
>> For the first I think it's going to be easiest if you can export the
>> row in question so we can do some profiling.  (Eric will have a tool
>> to do this soon.)
>
> Ok - I've created a cassandra setup with just the row in question and
> tarred up the data + storage-conf.xml for the three nodes:
>
> http://demo.ooyala.com/cassandra/cassandra-1.tgz
> http://demo.ooyala.com/cassandra/cassandra-2.tgz
> http://demo.ooyala.com/cassandra/cassandra-3.tgz
>
>>
>> For the second, how sure are you that the repair is unnecessary?  Does
>> it repeat the repair when you do the query again after letting the
>> callbacks complete?
>
> I didn't do any writes as I was debugging the issue whereas I was
> reading the same data a whole bunch of times, so the repair should've
> been unnecessary.
>
> Thanks for your help,
> Edmond
>
>>
>> -Jonathan
>>
>> On Mon, Oct 19, 2009 at 11:38 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>> A single local read with debug logging takes 3-4 secs on the node with
>>> 3 data files.  It actually takes roughly as long on the other nodes as
>>> well (I was incorrect in saying that they were fast).
>>>
>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.java
>>> (line 219) get_slice
>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java
>>> (line 468) weakreadlocal reading
>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>> column_parent='QueryPath(columnFamilyName='movie',
>>> superColumnName='[B@26e00675', columnName='null')', start='',
>>> finish='', reversed=false, count=1)
>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.java
>>> (line 115) collecting 1000036:false:515@1255978226411394
>>> DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383
>>> ConsistencyManager.java (line 168) Reading consistency digest for
>>> o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000]
>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332
>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>> from 2926@172.16.129.71:7000
>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675
>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>> from 2926@172.16.129.72:7000
>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677
>>> ConsistencyManager.java (line 96) Performing read repair for o:movie
>>> to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000]
>>>
>>> Also, it's unclear why there's a read repair going on above, since I
>>> haven't made any updates.
>>>
>>> Doing a quorum read on the node with 3 data files takes 7 seconds w/
>>> debug logging and leads to a DigestMismatchException, again even
>>> though I haven't made any updates.
>>>
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.java
>>> (line 219) get_slice
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>> (line 374) strongread reading data for
>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>> column_parent='QueryPath(columnFamilyName='movie',
>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>> finish='', reversed=false, count=1) from 3675@172.16.129.75:7000
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>>> (line 381) strongread reading digest for
>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>> column_parent='QueryPath(columnFamilyName='movie',
>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>> finish='', reversed=false, count=1) from 3676@172.16.129.71:7000
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java
>>> (line 381) strongread reading digest for
>>> SliceFromReadCommand(table='Analytics', key='o:movie',
>>> column_parent='QueryPath(columnFamilyName='movie',
>>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>>> finish='', reversed=false, count=1) from 3676@172.16.129.72:7000
>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994
>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000
>>> DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107
>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000
>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.java
>>> (line 115) collecting 1000036:false:515@1255978226411394
>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.java
>>> (line 100) Read key o:movie; sending response to
>>> 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>> DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788
>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789
>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>> ms.
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790
>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>> ms.
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791
>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>> ms.
>>>  INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java
>>> (line 410) DigestMismatchException: o:movie
>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953
>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>> from 3690@172.16.129.71:7000
>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130
>>> ResponseVerbHandler.java (line 34) Processing response on a callback
>>> from 3690@172.16.129.72:7000
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>>> ms.
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>>> ms.
>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>>> ReadResponseResolver.java (line 149) resolve: 1 ms.
>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235
>>> SliceQueryFilter.java (line 115) collecting
>>> 1000036:false:515@1255978226411394
>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.java
>>> (line 100) Read key o:movie; sending response to
>>> 3690@172.16.129.75:7000
>>>
>>> Edmond
>>>
>>> On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>>> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>>>>>
>>>>>> Does it still take ~8s if you direct a CL.ONE query at one of the
>>>>>> nodes you know has the data (i.e., a local read)?
>>>>>
>>>>> Local reads return quickly, but if you look at the logs below, there's
>>>>> 6 secs from when cassandra prints that it's reading a consistency
>>>>> digest to when it processes a callback.
>>>>
>>>> That's from when it sends the digest command, to when it gets a callback.
>>>>
>>>> Something doesn't add up.  If computing &sending a digest response
>>>> takes 6s, either reading the row or computing the digest must be
>>>> taking substantial time.  But if local reads are fast, it's not
>>>> reading the row that is slow, and if resolve is fast (from your logs,
>>>> it is 2ms) then it's not computing the digest that is slow.
>>>>
>>>> Try doing a local read from the node with 3 data files, see if that is slower.
>>>>
>>>> If you still see a fast read there, add some more debug logging in
>>>> ReadVerbHandler.java, where it says something like (I'm going from
>>>> trunk here but 0.4.1 is similar)
>>>>
>>>>            Row row = readCommand.getRow(table);
>>>>            ReadResponse readResponse;
>>>>            if (readCommand.isDigestQuery())
>>>>            {
>>>>                readResponse = new ReadResponse(row.digest());
>>>>
>>>> time the getRow and the row.digest() calls and log how long those take
>>>> in the quorum scenario.
>>>>
>>>> -Jonathan
>>>>
>>>
>>
>

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
On Tue, Oct 20, 2009 at 2:03 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Okay, so we have 2 problems:
>
>  - the read is simply too slow
>  - we're doing unnecessary(?) read repairs
>
> For the first I think it's going to be easiest if you can export the
> row in question so we can do some profiling.  (Eric will have a tool
> to do this soon.)

Ok - I've created a cassandra setup with just the row in question and
tarred up the data + storage-conf.xml for the three nodes:

http://demo.ooyala.com/cassandra/cassandra-1.tgz
http://demo.ooyala.com/cassandra/cassandra-2.tgz
http://demo.ooyala.com/cassandra/cassandra-3.tgz

>
> For the second, how sure are you that the repair is unnecessary?  Does
> it repeat the repair when you do the query again after letting the
> callbacks complete?

I didn't do any writes as I was debugging the issue whereas I was
reading the same data a whole bunch of times, so the repair should've
been unnecessary.

Thanks for your help,
Edmond

>
> -Jonathan
>
> On Mon, Oct 19, 2009 at 11:38 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> A single local read with debug logging takes 3-4 secs on the node with
>> 3 data files.  It actually takes roughly as long on the other nodes as
>> well (I was incorrect in saying that they were fast).
>>
>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.java
>> (line 219) get_slice
>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java
>> (line 468) weakreadlocal reading
>> SliceFromReadCommand(table='Analytics', key='o:movie',
>> column_parent='QueryPath(columnFamilyName='movie',
>> superColumnName='[B@26e00675', columnName='null')', start='',
>> finish='', reversed=false, count=1)
>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.java
>> (line 115) collecting 1000036:false:515@1255978226411394
>> DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383
>> ConsistencyManager.java (line 168) Reading consistency digest for
>> o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000]
>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332
>> ResponseVerbHandler.java (line 34) Processing response on a callback
>> from 2926@172.16.129.71:7000
>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675
>> ResponseVerbHandler.java (line 34) Processing response on a callback
>> from 2926@172.16.129.72:7000
>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677
>> ConsistencyManager.java (line 96) Performing read repair for o:movie
>> to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000]
>>
>> Also, it's unclear why there's a read repair going on above, since I
>> haven't made any updates.
>>
>> Doing a quorum read on the node with 3 data files takes 7 seconds w/
>> debug logging and leads to a DigestMismatchException, again even
>> though I haven't made any updates.
>>
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.java
>> (line 219) get_slice
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>> (line 374) strongread reading data for
>> SliceFromReadCommand(table='Analytics', key='o:movie',
>> column_parent='QueryPath(columnFamilyName='movie',
>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>> finish='', reversed=false, count=1) from 3675@172.16.129.75:7000
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
>> (line 381) strongread reading digest for
>> SliceFromReadCommand(table='Analytics', key='o:movie',
>> column_parent='QueryPath(columnFamilyName='movie',
>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>> finish='', reversed=false, count=1) from 3676@172.16.129.71:7000
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java
>> (line 381) strongread reading digest for
>> SliceFromReadCommand(table='Analytics', key='o:movie',
>> column_parent='QueryPath(columnFamilyName='movie',
>> superColumnName='[B@2cf50c5f', columnName='null')', start='',
>> finish='', reversed=false, count=1) from 3676@172.16.129.72:7000
>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994
>> ResponseVerbHandler.java (line 34) Processing response on a callback
>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000
>> DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107
>> ResponseVerbHandler.java (line 34) Processing response on a callback
>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000
>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.java
>> (line 115) collecting 1000036:false:515@1255978226411394
>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.java
>> (line 100) Read key o:movie; sending response to
>> 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>> DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788
>> ResponseVerbHandler.java (line 34) Processing response on a callback
>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789
>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>> ms.
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790
>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>> ms.
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791
>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>> ms.
>>  INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java
>> (line 410) DigestMismatchException: o:movie
>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953
>> ResponseVerbHandler.java (line 34) Processing response on a callback
>> from 3690@172.16.129.71:7000
>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130
>> ResponseVerbHandler.java (line 34) Processing response on a callback
>> from 3690@172.16.129.72:7000
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>> ReadResponseResolver.java (line 84) Response deserialization time : 1
>> ms.
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>> ReadResponseResolver.java (line 84) Response deserialization time : 0
>> ms.
>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
>> ReadResponseResolver.java (line 149) resolve: 1 ms.
>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235
>> SliceQueryFilter.java (line 115) collecting
>> 1000036:false:515@1255978226411394
>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.java
>> (line 100) Read key o:movie; sending response to
>> 3690@172.16.129.75:7000
>>
>> Edmond
>>
>> On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>>>>
>>>>> Does it still take ~8s if you direct a CL.ONE query at one of the
>>>>> nodes you know has the data (i.e., a local read)?
>>>>
>>>> Local reads return quickly, but if you look at the logs below, there's
>>>> 6 secs from when cassandra prints that it's reading a consistency
>>>> digest to when it processes a callback.
>>>
>>> That's from when it sends the digest command, to when it gets a callback.
>>>
>>> Something doesn't add up.  If computing &sending a digest response
>>> takes 6s, either reading the row or computing the digest must be
>>> taking substantial time.  But if local reads are fast, it's not
>>> reading the row that is slow, and if resolve is fast (from your logs,
>>> it is 2ms) then it's not computing the digest that is slow.
>>>
>>> Try doing a local read from the node with 3 data files, see if that is slower.
>>>
>>> If you still see a fast read there, add some more debug logging in
>>> ReadVerbHandler.java, where it says something like (I'm going from
>>> trunk here but 0.4.1 is similar)
>>>
>>>            Row row = readCommand.getRow(table);
>>>            ReadResponse readResponse;
>>>            if (readCommand.isDigestQuery())
>>>            {
>>>                readResponse = new ReadResponse(row.digest());
>>>
>>> time the getRow and the row.digest() calls and log how long those take
>>> in the quorum scenario.
>>>
>>> -Jonathan
>>>
>>
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
Okay, so we have 2 problems:

 - the read is simply too slow
 - we're doing unnecessary(?) read repairs

For the first I think it's going to be easiest if you can export the
row in question so we can do some profiling.  (Eric will have a tool
to do this soon.)

For the second, how sure are you that the repair is unnecessary?  Does
it repeat the repair when you do the query again after letting the
callbacks complete?

-Jonathan

On Mon, Oct 19, 2009 at 11:38 PM, Edmond Lau <ed...@ooyala.com> wrote:
> A single local read with debug logging takes 3-4 secs on the node with
> 3 data files.  It actually takes roughly as long on the other nodes as
> well (I was incorrect in saying that they were fast).
>
> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.java
> (line 219) get_slice
> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java
> (line 468) weakreadlocal reading
> SliceFromReadCommand(table='Analytics', key='o:movie',
> column_parent='QueryPath(columnFamilyName='movie',
> superColumnName='[B@26e00675', columnName='null')', start='',
> finish='', reversed=false, count=1)
> DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.java
> (line 115) collecting 1000036:false:515@1255978226411394
> DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383
> ConsistencyManager.java (line 168) Reading consistency digest for
> o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000]
> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332
> ResponseVerbHandler.java (line 34) Processing response on a callback
> from 2926@172.16.129.71:7000
> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675
> ResponseVerbHandler.java (line 34) Processing response on a callback
> from 2926@172.16.129.72:7000
> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677
> ConsistencyManager.java (line 96) Performing read repair for o:movie
> to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000]
>
> Also, it's unclear why there's a read repair going on above, since I
> haven't made any updates.
>
> Doing a quorum read on the node with 3 data files takes 7 seconds w/
> debug logging and leads to a DigestMismatchException, again even
> though I haven't made any updates.
>
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.java
> (line 219) get_slice
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
> (line 374) strongread reading data for
> SliceFromReadCommand(table='Analytics', key='o:movie',
> column_parent='QueryPath(columnFamilyName='movie',
> superColumnName='[B@2cf50c5f', columnName='null')', start='',
> finish='', reversed=false, count=1) from 3675@172.16.129.75:7000
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
> (line 381) strongread reading digest for
> SliceFromReadCommand(table='Analytics', key='o:movie',
> column_parent='QueryPath(columnFamilyName='movie',
> superColumnName='[B@2cf50c5f', columnName='null')', start='',
> finish='', reversed=false, count=1) from 3676@172.16.129.71:7000
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java
> (line 381) strongread reading digest for
> SliceFromReadCommand(table='Analytics', key='o:movie',
> column_parent='QueryPath(columnFamilyName='movie',
> superColumnName='[B@2cf50c5f', columnName='null')', start='',
> finish='', reversed=false, count=1) from 3676@172.16.129.72:7000
> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994
> ResponseVerbHandler.java (line 34) Processing response on a callback
> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000
> DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107
> ResponseVerbHandler.java (line 34) Processing response on a callback
> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000
> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.java
> (line 115) collecting 1000036:false:515@1255978226411394
> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.java
> (line 100) Read key o:movie; sending response to
> 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
> DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788
> ResponseVerbHandler.java (line 34) Processing response on a callback
> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789
> ReadResponseResolver.java (line 84) Response deserialization time : 0
> ms.
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790
> ReadResponseResolver.java (line 84) Response deserialization time : 0
> ms.
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791
> ReadResponseResolver.java (line 84) Response deserialization time : 1
> ms.
>  INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java
> (line 410) DigestMismatchException: o:movie
> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953
> ResponseVerbHandler.java (line 34) Processing response on a callback
> from 3690@172.16.129.71:7000
> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130
> ResponseVerbHandler.java (line 34) Processing response on a callback
> from 3690@172.16.129.72:7000
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
> ReadResponseResolver.java (line 84) Response deserialization time : 1
> ms.
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
> ReadResponseResolver.java (line 84) Response deserialization time : 0
> ms.
> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
> ReadResponseResolver.java (line 149) resolve: 1 ms.
> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235
> SliceQueryFilter.java (line 115) collecting
> 1000036:false:515@1255978226411394
> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.java
> (line 100) Read key o:movie; sending response to
> 3690@172.16.129.75:7000
>
> Edmond
>
> On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>>>
>>>> Does it still take ~8s if you direct a CL.ONE query at one of the
>>>> nodes you know has the data (i.e., a local read)?
>>>
>>> Local reads return quickly, but if you look at the logs below, there's
>>> 6 secs from when cassandra prints that it's reading a consistency
>>> digest to when it processes a callback.
>>
>> That's from when it sends the digest command, to when it gets a callback.
>>
>> Something doesn't add up.  If computing &sending a digest response
>> takes 6s, either reading the row or computing the digest must be
>> taking substantial time.  But if local reads are fast, it's not
>> reading the row that is slow, and if resolve is fast (from your logs,
>> it is 2ms) then it's not computing the digest that is slow.
>>
>> Try doing a local read from the node with 3 data files, see if that is slower.
>>
>> If you still see a fast read there, add some more debug logging in
>> ReadVerbHandler.java, where it says something like (I'm going from
>> trunk here but 0.4.1 is similar)
>>
>>            Row row = readCommand.getRow(table);
>>            ReadResponse readResponse;
>>            if (readCommand.isDigestQuery())
>>            {
>>                readResponse = new ReadResponse(row.digest());
>>
>> time the getRow and the row.digest() calls and log how long those take
>> in the quorum scenario.
>>
>> -Jonathan
>>
>

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
A single local read with debug logging takes 3-4 secs on the node with
3 data files.  It actually takes roughly as long on the other nodes as
well (I was incorrect in saying that they were fast).

DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.java
(line 219) get_slice
DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java
(line 468) weakreadlocal reading
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@26e00675', columnName='null')', start='',
finish='', reversed=false, count=1)
DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.java
(line 115) collecting 1000036:false:515@1255978226411394
DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383
ConsistencyManager.java (line 168) Reading consistency digest for
o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000]
DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332
ResponseVerbHandler.java (line 34) Processing response on a callback
from 2926@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675
ResponseVerbHandler.java (line 34) Processing response on a callback
from 2926@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677
ConsistencyManager.java (line 96) Performing read repair for o:movie
to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000]

Also, it's unclear why there's a read repair going on above, since I
haven't made any updates.

Doing a quorum read on the node with 3 data files takes 7 seconds w/
debug logging and leads to a DigestMismatchException, again even
though I haven't made any updates.

DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.java
(line 219) get_slice
DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
(line 374) strongread reading data for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@2cf50c5f', columnName='null')', start='',
finish='', reversed=false, count=1) from 3675@172.16.129.75:7000
DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java
(line 381) strongread reading digest for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@2cf50c5f', columnName='null')', start='',
finish='', reversed=false, count=1) from 3676@172.16.129.71:7000
DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java
(line 381) strongread reading digest for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@2cf50c5f', columnName='null')', start='',
finish='', reversed=false, count=1) from 3676@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994
ResponseVerbHandler.java (line 34) Processing response on a callback
from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107
ResponseVerbHandler.java (line 34) Processing response on a callback
from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000
DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.java
(line 115) collecting 1000036:false:515@1255978226411394
DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.java
(line 100) Read key o:movie; sending response to
15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788
ResponseVerbHandler.java (line 34) Processing response on a callback
from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000
DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791
ReadResponseResolver.java (line 84) Response deserialization time : 1
ms.
 INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java
(line 410) DigestMismatchException: o:movie
DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953
ResponseVerbHandler.java (line 34) Processing response on a callback
from 3690@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130
ResponseVerbHandler.java (line 34) Processing response on a callback
from 3690@172.16.129.72:7000
DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
ReadResponseResolver.java (line 84) Response deserialization time : 1
ms.
DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132
ReadResponseResolver.java (line 149) resolve: 1 ms.
DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235
SliceQueryFilter.java (line 115) collecting
1000036:false:515@1255978226411394
DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.java
(line 100) Read key o:movie; sending response to
3690@172.16.129.75:7000

Edmond

On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>>
>>> Does it still take ~8s if you direct a CL.ONE query at one of the
>>> nodes you know has the data (i.e., a local read)?
>>
>> Local reads return quickly, but if you look at the logs below, there's
>> 6 secs from when cassandra prints that it's reading a consistency
>> digest to when it processes a callback.
>
> That's from when it sends the digest command, to when it gets a callback.
>
> Something doesn't add up.  If computing &sending a digest response
> takes 6s, either reading the row or computing the digest must be
> taking substantial time.  But if local reads are fast, it's not
> reading the row that is slow, and if resolve is fast (from your logs,
> it is 2ms) then it's not computing the digest that is slow.
>
> Try doing a local read from the node with 3 data files, see if that is slower.
>
> If you still see a fast read there, add some more debug logging in
> ReadVerbHandler.java, where it says something like (I'm going from
> trunk here but 0.4.1 is similar)
>
>            Row row = readCommand.getRow(table);
>            ReadResponse readResponse;
>            if (readCommand.isDigestQuery())
>            {
>                readResponse = new ReadResponse(row.digest());
>
> time the getRow and the row.digest() calls and log how long those take
> in the quorum scenario.
>
> -Jonathan
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau <ed...@ooyala.com> wrote:
> 8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.
>
>> Does it still take ~8s if you direct a CL.ONE query at one of the
>> nodes you know has the data (i.e., a local read)?
>
> Local reads return quickly, but if you look at the logs below, there's
> 6 secs from when cassandra prints that it's reading a consistency
> digest to when it processes a callback.

That's from when it sends the digest command, to when it gets a callback.

Something doesn't add up.  If computing &sending a digest response
takes 6s, either reading the row or computing the digest must be
taking substantial time.  But if local reads are fast, it's not
reading the row that is slow, and if resolve is fast (from your logs,
it is 2ms) then it's not computing the digest that is slow.

Try doing a local read from the node with 3 data files, see if that is slower.

If you still see a fast read there, add some more debug logging in
ReadVerbHandler.java, where it says something like (I'm going from
trunk here but 0.4.1 is similar)

            Row row = readCommand.getRow(table);
            ReadResponse readResponse;
            if (readCommand.isDigestQuery())
            {
                readResponse = new ReadResponse(row.digest());

time the getRow and the row.digest() calls and log how long those take
in the quorum scenario.

-Jonathan

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
On Mon, Oct 19, 2009 at 7:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Mon, Oct 19, 2009 at 9:08 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> Why do we need 2 nodes to compute hashes in addition to another node
>> for retrieving the data?  Then the system is reading all three copies
>> instead of just a quorum of 2.  Why not just one node for a hash and
>> one node for hash + data?
>
> That way you can take whichever hash finishes first.

Ah.

>
>> Here are my debug logs from the node with 3 concurrent reads.  5
>> seconds after the get_slice, the rpc times out.  3 seconds after the
>> timeout, all the callbacks return.  I know that the entire supercolumn
>> (~15MB) has to be read into memory, but 8 seconds for 3 reads seems
>> extremely slow.
>
> I agree.
>
> A couple more questions:
>
> How many data files are there across all the directories (on a single
> node) for the CF in question?
>

8 data files total.  3 nodes have 1, 1 has 2, the 3rd has 3.

> Does it still take ~8s if you direct a CL.ONE query at one of the
> nodes you know has the data (i.e., a local read)?

Local reads return quickly, but if you look at the logs below, there's
6 secs from when cassandra prints that it's reading a consistency
digest to when it processes a callback.

DEBUG [pool-1-thread-58] 2009-10-20 02:36:19,780 CassandraServer.java
(line 219) get_slice
DEBUG [pool-1-thread-58] 2009-10-20 02:36:19,781 StorageProxy.java
(line 468) weakreadlocal reading
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@7ac0e349', columnName='null')', start='',
finish='', reversed=false, count=1)
DEBUG [pool-1-thread-59] 2009-10-20 02:36:19,789 CassandraServer.java
(line 219) get_slice
DEBUG [pool-1-thread-59] 2009-10-20 02:36:19,790 StorageProxy.java
(line 468) weakreadlocal reading
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@d8a0c95', columnName='null')', start='',
finish='', reversed=false, count=1)
DEBUG [pool-1-thread-60] 2009-10-20 02:36:19,798 CassandraServer.java
(line 219) get_slice
DEBUG [pool-1-thread-60] 2009-10-20 02:36:19,800 StorageProxy.java
(line 468) weakreadlocal reading
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@62401124', columnName='null')', start='',
finish='', reversed=false, count=1)
DEBUG [pool-1-thread-59] 2009-10-20 02:36:24,842 SliceQueryFilter.java
(line 115) collecting 1000002:false:592@1255996277427491
DEBUG [CONSISTENCY-MANAGER:4] 2009-10-20 02:36:24,843
ConsistencyManager.java (line 168) Reading consistency digest for
o:movie from 14629@[172.16.129.75:7000, 172.16.129.72:7000]
DEBUG [pool-1-thread-58] 2009-10-20 02:36:25,319 SliceQueryFilter.java
(line 115) collecting 1000002:false:592@1255996277427491
DEBUG [CONSISTENCY-MANAGER:1] 2009-10-20 02:36:25,324
ConsistencyManager.java (line 168) Reading consistency digest for
o:movie from 14632@[172.16.129.75:7000, 172.16.129.72:7000]
DEBUG [pool-1-thread-60] 2009-10-20 02:36:25,327 SliceQueryFilter.java
(line 115) collecting 1000002:false:592@1255996277427491
DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 02:36:25,328
ConsistencyManager.java (line 168) Reading consistency digest for
o:movie from 14633@[172.16.129.75:7000, 172.16.129.72:7000]
DEBUG [RESPONSE-STAGE:3] 2009-10-20 02:36:31,501
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14629@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 02:36:31,541
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14633@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:1] 2009-10-20 02:36:31,610
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14632@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:2] 2009-10-20 02:36:31,711
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14629@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:2] 2009-10-20 02:36:31,713
ConsistencyManager.java (line 96) Performing read repair for o:movie
to 14653@[172.16.129.75:7000, 172.16.129.72:7000, 172.16.129.71:7000]
DEBUG [RESPONSE-STAGE:3] 2009-10-20 02:36:32,326
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14633@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 02:36:32,328
ConsistencyManager.java (line 96) Performing read repair for o:movie
to 14657@[172.16.129.75:7000, 172.16.129.72:7000, 172.16.129.71:7000]
DEBUG [RESPONSE-STAGE:4] 2009-10-20 02:36:34,024
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14632@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 02:36:34,026
ConsistencyManager.java (line 96) Performing read repair for o:movie
to 14665@[172.16.129.75:7000, 172.16.129.72:7000, 172.16.129.71:7000]
DEBUG [RESPONSE-STAGE:1] 2009-10-20 02:36:37,082
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14653@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:1] 2009-10-20 02:36:37,083
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14653
FROM:172.16.129.72:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

DEBUG [ROW-READ-STAGE:13] 2009-10-20 02:36:37,119
SliceQueryFilter.java (line 115) collecting
1000002:false:592@1255996277427491
DEBUG [ROW-READ-STAGE:13] 2009-10-20 02:36:37,119 ReadVerbHandler.java
(line 100) Read key o:movie; sending response to
14653@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:2] 2009-10-20 02:36:37,162
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14653@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:2] 2009-10-20 02:36:37,163
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14653
FROM:172.16.129.71:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

DEBUG [RESPONSE-STAGE:3] 2009-10-20 02:36:37,415
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14657@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 02:36:37,416
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14657
FROM:172.16.129.72:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:37,992
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:37,993
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:37,994
ReadResponseResolver.java (line 149) resolve: 2 ms.
DEBUG [ROW-READ-STAGE:15] 2009-10-20 02:36:38,051
SliceQueryFilter.java (line 115) collecting
1000002:false:592@1255996277427491
DEBUG [ROW-READ-STAGE:15] 2009-10-20 02:36:38,052 ReadVerbHandler.java
(line 100) Read key o:movie; sending response to
14665@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 02:36:38,052
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14665@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 02:36:38,053
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14665
FROM:172.16.129.71:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

DEBUG [ROW-READ-STAGE:17] 2009-10-20 02:36:38,146
SliceQueryFilter.java (line 115) collecting
1000002:false:592@1255996277427491
DEBUG [ROW-READ-STAGE:17] 2009-10-20 02:36:38,146 ReadVerbHandler.java
(line 100) Read key o:movie; sending response to
14657@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:1] 2009-10-20 02:36:38,146
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14657@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:1] 2009-10-20 02:36:38,147
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14657
FROM:172.16.129.71:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

DEBUG [RESPONSE-STAGE:2] 2009-10-20 02:36:38,957
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14665@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:2] 2009-10-20 02:36:38,958
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14665
FROM:172.16.129.72:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:39,193
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:39,194
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:39,194
ReadResponseResolver.java (line 149) resolve: 1 ms.
DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:39,792
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:39,793
ReadResponseResolver.java (line 84) Response deserialization time : 0
ms.
DEBUG [CACHETABLE-TIMER-4] 2009-10-20 02:36:39,793
ReadResponseResolver.java (line 149) resolve: 1 ms.
DEBUG [RESPONSE-STAGE:3] 2009-10-20 02:36:40,294
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14657@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 02:36:40,295
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14657
FROM:172.16.129.75:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

DEBUG [RESPONSE-STAGE:4] 2009-10-20 02:36:40,357
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14653@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 02:36:40,358
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14653
FROM:172.16.129.75:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

DEBUG [RESPONSE-STAGE:1] 2009-10-20 02:36:41,208
ResponseVerbHandler.java (line 34) Processing response on a callback
from 14665@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:1] 2009-10-20 02:36:41,209
ConsistencyManager.java (line 116) Received responses in
DataRepairHandler : ID:14665
FROM:172.16.129.75:7000
TYPE:RESPONSE-STAGE
VERB:RESPONSE

>
> What does JConsole show about jvm memory usage during the query?  Is
> it right up against the heap size limit?
>

jvm heap memory usage goes from 50MB to < 200MB.  Nothing major.


Edmond

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, Oct 19, 2009 at 9:08 PM, Edmond Lau <ed...@ooyala.com> wrote:
> Why do we need 2 nodes to compute hashes in addition to another node
> for retrieving the data?  Then the system is reading all three copies
> instead of just a quorum of 2.  Why not just one node for a hash and
> one node for hash + data?

That way you can take whichever hash finishes first.

> Here are my debug logs from the node with 3 concurrent reads.  5
> seconds after the get_slice, the rpc times out.  3 seconds after the
> timeout, all the callbacks return.  I know that the entire supercolumn
> (~15MB) has to be read into memory, but 8 seconds for 3 reads seems
> extremely slow.

I agree.

A couple more questions:

How many data files are there across all the directories (on a single
node) for the CF in question?

Does it still take ~8s if you direct a CL.ONE query at one of the
nodes you know has the data (i.e., a local read)?

What does JConsole show about jvm memory usage during the query?  Is
it right up against the heap size limit?

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
Comments inline.

On Mon, Oct 19, 2009 at 6:33 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Mon, Oct 19, 2009 at 8:20 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> On Mon, Oct 19, 2009 at 6:01 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> are there many rows like this?
>>
>> No - just a handful.  I'm able to repro by just launching 5 or 6
>> threads that all read the same key.
>
> Does it work if you just read one column? 2? 10?  If it does, how long
> do those take?

I've simplified my repro case.  No other cassandra reads/writes other
than one client sending 3 concurrent get_slice() calls to the same
node.  Each get_slice() now asks for 1 column with start and finish
set to "", and all of them target the same row I've been talking
about.  Debug logging is enabled for the node, and I've attached logs
snippets at the end of email.  All 3 reads end up failing.

>
> Is the system busy with other reads or writes?

No.

>
> What resource do you see being used on the queried nodes?  cpu? (top)
> disk? (iostat)

cpu on the node with the data and those responsible for the digests
spike up to 250-450%.  Barely any disk activity on iostat.

>
> (Turn on debug logging to see which nodes are getting the queries, if
> necessary.)
>
>> I did find one point of note though.  Most of the quorum reads fail
>> with 0 or 1 responses, but I see some that fail with 2 responses,
>> which is odd given that only 2 responses were needed:
>
> That means it got an answer from the 2 nodes computing hashes of the
> answer for quorum purposes, but not the node that was tasked with
> retrieving the actual data.

Why do we need 2 nodes to compute hashes in addition to another node
for retrieving the data?  Then the system is reading all three copies
instead of just a quorum of 2.  Why not just one node for a hash and
one node for hash + data?

Here are my debug logs from the node with 3 concurrent reads.  5
seconds after the get_slice, the rpc times out.  3 seconds after the
timeout, all the callbacks return.  I know that the entire supercolumn
(~15MB) has to be read into memory, but 8 seconds for 3 reads seems
extremely slow.

DEBUG [pool-1-thread-48] 2009-10-20 01:56:38,258 CassandraServer.java
(line 219) get_slice
DEBUG [pool-1-thread-48] 2009-10-20 01:56:38,260 StorageProxy.java
(line 374) strongread reading data for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@68eda583', columnName='null')', start='',
finish='', reversed=false, count=1) from 5613@172.16.129.71:7000
DEBUG [pool-1-thread-48] 2009-10-20 01:56:38,285 StorageProxy.java
(line 381) strongread reading digest for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@68eda583', columnName='null')', start='',
finish='', reversed=false, count=1) from 5614@172.16.129.75:7000
DEBUG [pool-1-thread-48] 2009-10-20 01:56:38,286 StorageProxy.java
(line 381) strongread reading digest for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@68eda583', columnName='null')', start='',
finish='', reversed=false, count=1) from 5614@172.16.129.72:7000
DEBUG [pool-1-thread-50] 2009-10-20 01:56:38,287 CassandraServer.java
(line 219) get_slice
DEBUG [pool-1-thread-49] 2009-10-20 01:56:38,277 CassandraServer.java
(line 219) get_slice
DEBUG [pool-1-thread-50] 2009-10-20 01:56:38,289 StorageProxy.java
(line 374) strongread reading data for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@4a0d7146', columnName='null')', start='',
finish='', reversed=false, count=1) from 5615@172.16.129.71:7000
DEBUG [pool-1-thread-50] 2009-10-20 01:56:38,290 StorageProxy.java
(line 381) strongread reading digest for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@4a0d7146', columnName='null')', start='',
finish='', reversed=false, count=1) from 5616@172.16.129.75:7000
DEBUG [pool-1-thread-49] 2009-10-20 01:56:38,291 StorageProxy.java
(line 374) strongread reading data for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@5c931da1', columnName='null')', start='',
finish='', reversed=false, count=1) from 5617@172.16.129.71:7000
DEBUG [pool-1-thread-50] 2009-10-20 01:56:38,291 StorageProxy.java
(line 381) strongread reading digest for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@4a0d7146', columnName='null')', start='',
finish='', reversed=false, count=1) from 5616@172.16.129.72:7000
DEBUG [pool-1-thread-49] 2009-10-20 01:56:38,291 StorageProxy.java
(line 381) strongread reading digest for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@5c931da1', columnName='null')', start='',
finish='', reversed=false, count=1) from 5618@172.16.129.75:7000
DEBUG [pool-1-thread-49] 2009-10-20 01:56:38,292 StorageProxy.java
(line 381) strongread reading digest for
SliceFromReadCommand(table='Analytics', key='o:movie',
column_parent='QueryPath(columnFamilyName='movie',
superColumnName='[B@5c931da1', columnName='null')', start='',
finish='', reversed=false, count=1) from 5618@172.16.129.72:7000
ERROR [pool-1-thread-50] 2009-10-20 01:56:43,296 Cassandra.java (line
679) Internal error processing get_slice
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Operation timed out - received only 0 responses from  .
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.util.concurrent.TimeoutException: Operation timed out
- received only 0 responses from  .
        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        ... 9 more
ERROR [pool-1-thread-49] 2009-10-20 01:56:43,296 Cassandra.java (line
679) Internal error processing get_slice
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Operation timed out - received only 0 responses from  .
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.util.concurrent.TimeoutException: Operation timed out
- received only 0 responses from  .
        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        ... 9 more
ERROR [pool-1-thread-48] 2009-10-20 01:56:43,325 Cassandra.java (line
679) Internal error processing get_slice
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Operation timed out - received only 0 responses from  .
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.util.concurrent.TimeoutException: Operation timed out
- received only 0 responses from  .
        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        ... 9 more
DEBUG [ROW-READ-STAGE:2] 2009-10-20 01:56:44,365 SliceQueryFilter.java
(line 115) collecting 1000002:false:592@1255996277427491
DEBUG [ROW-READ-STAGE:2] 2009-10-20 01:56:44,366 ReadVerbHandler.java
(line 100) Read key o:movie; sending response to
20CFAD41-BB15-EDF4-FC0F-CD2E5DE3B7B0@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 01:56:44,366
ResponseVerbHandler.java (line 34) Processing response on a callback
from 20CFAD41-BB15-EDF4-FC0F-CD2E5DE3B7B0@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:1] 2009-10-20 01:56:44,896
ResponseVerbHandler.java (line 34) Processing response on a callback
from 20CFAD41-BB15-EDF4-FC0F-CD2E5DE3B7B0@172.16.129.72:7000
DEBUG [ROW-READ-STAGE:3] 2009-10-20 01:56:44,903 SliceQueryFilter.java
(line 115) collecting 1000002:false:592@1255996277427491
DEBUG [ROW-READ-STAGE:3] 2009-10-20 01:56:44,904 ReadVerbHandler.java
(line 100) Read key o:movie; sending response to
8FAC5892-1A6D-7948-9F9F-EAE5D570F10F@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:2] 2009-10-20 01:56:44,904
ResponseVerbHandler.java (line 34) Processing response on a callback
from 8FAC5892-1A6D-7948-9F9F-EAE5D570F10F@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 01:56:44,976
ResponseVerbHandler.java (line 34) Processing response on a callback
from B7129ABF-B49D-F5E9-4E0A-CD2F46FEC31A@172.16.129.72:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 01:56:45,150
ResponseVerbHandler.java (line 34) Processing response on a callback
from 8FAC5892-1A6D-7948-9F9F-EAE5D570F10F@172.16.129.72:7000
DEBUG [ROW-READ-STAGE:4] 2009-10-20 01:56:45,283 SliceQueryFilter.java
(line 115) collecting 1000002:false:592@1255996277427491
DEBUG [ROW-READ-STAGE:4] 2009-10-20 01:56:45,284 ReadVerbHandler.java
(line 100) Read key o:movie; sending response to
B7129ABF-B49D-F5E9-4E0A-CD2F46FEC31A@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:1] 2009-10-20 01:56:45,284
ResponseVerbHandler.java (line 34) Processing response on a callback
from B7129ABF-B49D-F5E9-4E0A-CD2F46FEC31A@172.16.129.71:7000
DEBUG [RESPONSE-STAGE:2] 2009-10-20 01:56:45,371
ResponseVerbHandler.java (line 34) Processing response on a callback
from B7129ABF-B49D-F5E9-4E0A-CD2F46FEC31A@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:3] 2009-10-20 01:56:45,800
ResponseVerbHandler.java (line 34) Processing response on a callback
from 8FAC5892-1A6D-7948-9F9F-EAE5D570F10F@172.16.129.75:7000
DEBUG [RESPONSE-STAGE:4] 2009-10-20 01:56:46,005
ResponseVerbHandler.java (line 34) Processing response on a callback
from 20CFAD41-BB15-EDF4-FC0F-CD2E5DE3B7B0@172.16.129.75:7000

Edmond

>
> -Jonathan
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, Oct 19, 2009 at 8:20 PM, Edmond Lau <ed...@ooyala.com> wrote:
> On Mon, Oct 19, 2009 at 6:01 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> are there many rows like this?
>
> No - just a handful.  I'm able to repro by just launching 5 or 6
> threads that all read the same key.

Does it work if you just read one column? 2? 10?  If it does, how long
do those take?

Is the system busy with other reads or writes?

What resource do you see being used on the queried nodes?  cpu? (top)
disk? (iostat)

(Turn on debug logging to see which nodes are getting the queries, if
necessary.)

> I did find one point of note though.  Most of the quorum reads fail
> with 0 or 1 responses, but I see some that fail with 2 responses,
> which is odd given that only 2 responses were needed:

That means it got an answer from the 2 nodes computing hashes of the
answer for quorum purposes, but not the node that was tasked with
retrieving the actual data.

-Jonathan

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
On Mon, Oct 19, 2009 at 6:01 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> are there many rows like this?

No - just a handful.  I'm able to repro by just launching 5 or 6
threads that all read the same key.

>
> did you check the logs on the other nodes for exceptions?

Yes - no exceptions.

I did find one point of note though.  Most of the quorum reads fail
with 0 or 1 responses, but I see some that fail with 2 responses,
which is odd given that only 2 responses were needed:

ERROR [pool-1-thread-18] 2009-10-20 01:17:26,266 Cassandra.java (line
679) Internal error processing get_slice
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Operation timed out - received only 2 responses from
172.16.129.75:7000172.16.129.72:7000 .
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.util.concurrent.TimeoutException: Operation timed out
- received only 2 responses from 172.16.129.75:7000172.16.129.72:7000
.
        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        ... 9 more

>
> On Mon, Oct 19, 2009 at 7:40 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> Usually I'm trying to read 500 columns (~250KB) out of the 30K columns
>> (~15MB) of the supercolumn.  But the same issues happen when I drop
>> down to 100 (~50KB) columns.  The columns I request from get_slice()
>> are named, i.e. I'm not reading 500 consecutive columns.
>>
>> Edmond
>>
>> On Mon, Oct 19, 2009 at 5:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> How much of the row that fails are you trying to read at once?
>>>
>>> On Mon, Oct 19, 2009 at 7:30 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>>> Whenever I try to do a quorum read on a row with a particularly large
>>>> supercolumn with get_slice under high load, cassandra throws timeouts.
>>>>  The reads for that row repeatedly fail until load decreases, but
>>>> smaller reads still succeed during that time.  bin/nodeprobe info
>>>> shows that the read latency for the column family spikes to up 6-8
>>>> seconds.  I've run into this issue since I started to play with
>>>> cassandra, but thought that it might go away with beefier nodes.  I've
>>>> since gotten more powerful machines, but the timeouts still happen.
>>>>
>>>> Some details:
>>>> - cassandra 0.4.1
>>>> - 5 nodes, each with 12-core 800MHz with 8GB RAM, 5GB heap size
>>>> - replication factor of 3
>>>> - RandomPartitioner
>>>> - row that fails has a supercolumn with ~30K subcolumns, ~500 bytes
>>>> per cell, ~15MB total
>>>> - my failed quorum read lists 500 columns to read in the get_slice
>>>> call, but the same happens even when I read 100.
>>>>
>>>> The nodes either timeout with 0 or 1 responses (2 of 3 required for a
>>>> quorum read):
>>>>
>>>> ERROR [pool-1-thread-24] 2009-10-20 00:07:43,851 Cassandra.java (line
>>>> 679) Internal error processing get_slice
>>>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>>>> Operation timed out - received only 0 responses from  .
>>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>>>>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>>>>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>>>>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>>>>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>>>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>        at java.lang.Thread.run(Thread.java:619)
>>>> Caused by: java.util.concurrent.TimeoutException: Operation timed out
>>>> - received only 0 responses from  .
>>>>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>>>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>>>>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>>>>        ... 9 more
>>>>
>>>> ERROR [pool-1-thread-32] 2009-10-19 23:47:21,045 Cassandra.java (line
>>>> 679) Internal error processing get_slice
>>>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>>>> Operation timed out - received only 1 responses from
>>>> 172.16.129.75:7000 .
>>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>>>>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>>>>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>>>>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>>>>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>>>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>        at java.lang.Thread.run(Thread.java:619)
>>>> Caused by: java.util.concurrent.TimeoutException: Operation timed out
>>>> - received only 1 responses from 172.16.129.75:7000 .
>>>>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>>>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>>>>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>>>>        ... 9 more
>>>>
>>>> Any ideas what the issue might be?
>>>>
>>>> Edmond
>>>>
>>>
>>
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
are there many rows like this?

did you check the logs on the other nodes for exceptions?

On Mon, Oct 19, 2009 at 7:40 PM, Edmond Lau <ed...@ooyala.com> wrote:
> Usually I'm trying to read 500 columns (~250KB) out of the 30K columns
> (~15MB) of the supercolumn.  But the same issues happen when I drop
> down to 100 (~50KB) columns.  The columns I request from get_slice()
> are named, i.e. I'm not reading 500 consecutive columns.
>
> Edmond
>
> On Mon, Oct 19, 2009 at 5:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> How much of the row that fails are you trying to read at once?
>>
>> On Mon, Oct 19, 2009 at 7:30 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>> Whenever I try to do a quorum read on a row with a particularly large
>>> supercolumn with get_slice under high load, cassandra throws timeouts.
>>>  The reads for that row repeatedly fail until load decreases, but
>>> smaller reads still succeed during that time.  bin/nodeprobe info
>>> shows that the read latency for the column family spikes to up 6-8
>>> seconds.  I've run into this issue since I started to play with
>>> cassandra, but thought that it might go away with beefier nodes.  I've
>>> since gotten more powerful machines, but the timeouts still happen.
>>>
>>> Some details:
>>> - cassandra 0.4.1
>>> - 5 nodes, each with 12-core 800MHz with 8GB RAM, 5GB heap size
>>> - replication factor of 3
>>> - RandomPartitioner
>>> - row that fails has a supercolumn with ~30K subcolumns, ~500 bytes
>>> per cell, ~15MB total
>>> - my failed quorum read lists 500 columns to read in the get_slice
>>> call, but the same happens even when I read 100.
>>>
>>> The nodes either timeout with 0 or 1 responses (2 of 3 required for a
>>> quorum read):
>>>
>>> ERROR [pool-1-thread-24] 2009-10-20 00:07:43,851 Cassandra.java (line
>>> 679) Internal error processing get_slice
>>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>>> Operation timed out - received only 0 responses from  .
>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>>>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>>>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>>>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>>>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:619)
>>> Caused by: java.util.concurrent.TimeoutException: Operation timed out
>>> - received only 0 responses from  .
>>>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>>>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>>>        ... 9 more
>>>
>>> ERROR [pool-1-thread-32] 2009-10-19 23:47:21,045 Cassandra.java (line
>>> 679) Internal error processing get_slice
>>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>>> Operation timed out - received only 1 responses from
>>> 172.16.129.75:7000 .
>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>>>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>>>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>>>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>>>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:619)
>>> Caused by: java.util.concurrent.TimeoutException: Operation timed out
>>> - received only 1 responses from 172.16.129.75:7000 .
>>>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>>>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>>>        ... 9 more
>>>
>>> Any ideas what the issue might be?
>>>
>>> Edmond
>>>
>>
>

Re: repeated timeouts on quorum reads

Posted by Edmond Lau <ed...@ooyala.com>.
Usually I'm trying to read 500 columns (~250KB) out of the 30K columns
(~15MB) of the supercolumn.  But the same issues happen when I drop
down to 100 (~50KB) columns.  The columns I request from get_slice()
are named, i.e. I'm not reading 500 consecutive columns.

Edmond

On Mon, Oct 19, 2009 at 5:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> How much of the row that fails are you trying to read at once?
>
> On Mon, Oct 19, 2009 at 7:30 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> Whenever I try to do a quorum read on a row with a particularly large
>> supercolumn with get_slice under high load, cassandra throws timeouts.
>>  The reads for that row repeatedly fail until load decreases, but
>> smaller reads still succeed during that time.  bin/nodeprobe info
>> shows that the read latency for the column family spikes to up 6-8
>> seconds.  I've run into this issue since I started to play with
>> cassandra, but thought that it might go away with beefier nodes.  I've
>> since gotten more powerful machines, but the timeouts still happen.
>>
>> Some details:
>> - cassandra 0.4.1
>> - 5 nodes, each with 12-core 800MHz with 8GB RAM, 5GB heap size
>> - replication factor of 3
>> - RandomPartitioner
>> - row that fails has a supercolumn with ~30K subcolumns, ~500 bytes
>> per cell, ~15MB total
>> - my failed quorum read lists 500 columns to read in the get_slice
>> call, but the same happens even when I read 100.
>>
>> The nodes either timeout with 0 or 1 responses (2 of 3 required for a
>> quorum read):
>>
>> ERROR [pool-1-thread-24] 2009-10-20 00:07:43,851 Cassandra.java (line
>> 679) Internal error processing get_slice
>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>> Operation timed out - received only 0 responses from  .
>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>        at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.util.concurrent.TimeoutException: Operation timed out
>> - received only 0 responses from  .
>>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>>        ... 9 more
>>
>> ERROR [pool-1-thread-32] 2009-10-19 23:47:21,045 Cassandra.java (line
>> 679) Internal error processing get_slice
>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>> Operation timed out - received only 1 responses from
>> 172.16.129.75:7000 .
>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>        at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.util.concurrent.TimeoutException: Operation timed out
>> - received only 1 responses from 172.16.129.75:7000 .
>>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>>        ... 9 more
>>
>> Any ideas what the issue might be?
>>
>> Edmond
>>
>

Re: repeated timeouts on quorum reads

Posted by Jonathan Ellis <jb...@gmail.com>.
How much of the row that fails are you trying to read at once?

On Mon, Oct 19, 2009 at 7:30 PM, Edmond Lau <ed...@ooyala.com> wrote:
> Whenever I try to do a quorum read on a row with a particularly large
> supercolumn with get_slice under high load, cassandra throws timeouts.
>  The reads for that row repeatedly fail until load decreases, but
> smaller reads still succeed during that time.  bin/nodeprobe info
> shows that the read latency for the column family spikes to up 6-8
> seconds.  I've run into this issue since I started to play with
> cassandra, but thought that it might go away with beefier nodes.  I've
> since gotten more powerful machines, but the timeouts still happen.
>
> Some details:
> - cassandra 0.4.1
> - 5 nodes, each with 12-core 800MHz with 8GB RAM, 5GB heap size
> - replication factor of 3
> - RandomPartitioner
> - row that fails has a supercolumn with ~30K subcolumns, ~500 bytes
> per cell, ~15MB total
> - my failed quorum read lists 500 columns to read in the get_slice
> call, but the same happens even when I read 100.
>
> The nodes either timeout with 0 or 1 responses (2 of 3 required for a
> quorum read):
>
> ERROR [pool-1-thread-24] 2009-10-20 00:07:43,851 Cassandra.java (line
> 679) Internal error processing get_slice
> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
> Operation timed out - received only 0 responses from  .
>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: java.util.concurrent.TimeoutException: Operation timed out
> - received only 0 responses from  .
>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>        ... 9 more
>
> ERROR [pool-1-thread-32] 2009-10-19 23:47:21,045 Cassandra.java (line
> 679) Internal error processing get_slice
> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
> Operation timed out - received only 1 responses from
> 172.16.129.75:7000 .
>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: java.util.concurrent.TimeoutException: Operation timed out
> - received only 1 responses from 172.16.129.75:7000 .
>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>        ... 9 more
>
> Any ideas what the issue might be?
>
> Edmond
>