You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Yiming Sun <yi...@gmail.com> on 2012/11/28 20:20:44 UTC

strange row cache behavior

Hi,

I am trying to understand some strange behavior of cassandra row cache.  We
have a 6-node Cassandra cluster in a single data center on 2 racks, and the
neighboring nodes on the ring are from alternative racks.  Each node has
1GB row cache, with key cache disabled.   The cluster uses
PropertyFileSnitch, and the ColumnFamily I fetch from uses
NetworkTopologyStrategy, with replication factor of 2.  My client code uses
Hector to fetch a fixed set of rows from cassandra

What I don't quite understand is even after I ran the client code several
times, there are always some nodes with 0 row cache hits, despite that the
row cache from all nodes are filled and all nodes receive requests.

Which nodes have 0 hits seem to be strongly related to the following:

 - the set of row keys to fetch
 - the order of the set of row keys to fetch
 - the list of hosts passed to Hector's CassandraHostConfigurator
 - the order of the list of hosts passed to Hector

Can someone shed some lights on how exactly the row cache works and
hopefully also explain the behavior I have been seeing?  I thought if the
fixed set of the rows keys are the only thing I am fetching (each row
should be on the order of 10's of MBs, no more than 100MB), and each node
gets requests, and its row cache is filled, there's gotta be some hits.
 Apparent this is not the case.   Thanks.

cluster information:

Address         DC          Rack        Status State   Load
Effective-Ownership Token

141784319550391026443072753096570088105
x.x.x.1    DC1         r1          Up     Normal  587.46 GB
33.33%              0
x.x.x.2    DC1         r2          Up     Normal  591.21 GB
33.33%              28356863910078205288614550619314017621
x.x.x.3    DC1         r1          Up     Normal  594.97 GB
33.33%              56713727820156410577229101238628035242
x.x.x.4    DC1         r2          Up     Normal  587.15 GB
33.33%              85070591730234615865843651857942052863
x.x.x.5    DC1         r1          Up     Normal  590.26 GB
33.33%              113427455640312821154458202477256070484
x.x.x.6    DC1         r2          Up     Normal  583.21 GB
33.33%              141784319550391026443072753096570088105


[user@node]$ ./checkinfo.sh
*************** x.x.x.4
Token            : 85070591730234615865843651857942052863
Gossip active    : true
Thrift active    : true
Load             : 587.15 GB
Generation No    : 1354074048
Uptime (seconds) : 36957
Heap Memory (MB) : 2027.29 / 3948.00
Data Center      : DC1
Rack             : r2
Exceptions       : 0

Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 14400 save period in seconds
Row Cache        : size 1072651974 (bytes), capacity 1073741824 (bytes), 0
hits, 2576 requests, NaN recent hit rate, 0 save period in seconds

*************** x.x.x.6
Token            : 141784319550391026443072753096570088105
Gossip active    : true
Thrift active    : true
Load             : 583.21 GB
Generation No    : 1354074461
Uptime (seconds) : 36535
Heap Memory (MB) : 828.71 / 3948.00
Data Center      : DC1
Rack             : r2
Exceptions       : 0

Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 14400 save period in seconds
Row Cache        : size 1072602906 (bytes), capacity 1073741824 (bytes), 0
hits, 3194 requests, NaN recent hit rate, 0 save period in seconds

Re: strange row cache behavior

Posted by Yiming Sun <yi...@gmail.com>.

Got it.  Thanks again, Aaron.

-- Y.


On Tue, Dec 4, 2012 at 3:07 PM, aaron morton <aa...@thelastpickle.com>wrote:

>  Does this mean we should not enable row caches until we are absolutely
> sure about what's hot (I think there is a reason why row caches are
> disabled by default) ?
>
> Yes and Yes.
> Row cache takes memory and CPU, unless you know you are getting a benefit
> from it leave it off. The key cache and os disk cache will help. If you
> find latency is an issue then start poking around.
>
> Cheers
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/12/2012, at 4:23 AM, Yiming Sun <yi...@gmail.com> wrote:
>
> Hi Aaron,
>
> Thank you,and your explanation makes sense.  At the time, I thought having
> 1GB of row cache on each node was plenty enough, because there was an
> aggregated 6GB cache, but you are right, with each row in 10's of MBs, some
> of the nodes can go into a constant load and evict cycle and would have
> negative effects on the performance.  I will try as you suggested to 1.)
> reduce the requested entry set, and 2.) increase the row cache size and see
> if they get better hits, and also do 3) by reversing the requested entry
> list in alternate runs.
>
> Our data space has close to 3 million rows, but we haven't gotten enough
> usage statistics to know what rows are hot.  Does this mean we should not
> enable row caches until we are absolutely sure about what's hot (I think
> there is a reason why row caches are disabled by default) ?  It also seems
> from my test that OS page cache works much better, but it could be that OS
> page cache can utilize all the available memory so it is essentially larger
> -- I guess I will find out by doing 2.) above.
>
> best,
>
> -- Y.
>
>
>
> On Tue, Dec 4, 2012 at 4:47 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> > Row Cache        : size 1072651974 (bytes), capacity 1073741824
>> (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in
>> seconds
>>
>> So the cache is pretty much full, there is only 1 MB free.
>>
>> There were 2,576 read requests that tried to get a row from the cache.
>> Zero of those had a hit. If you have 6 nodes and RF 2, each node has  one
>> third of the data in the cluster (from the effective ownership info). So
>> depending on the read workload the number of read requests on each node may
>> be different.
>>
>> What I think is happening is reads are populating the row cache, then
>> subsequent reads are evicting items from the row cache before you get back
>> to reading the original rows. So if you read rows 1 to 5, they are put in
>> the cache, when you read rows 6 to 10 they are put in and evict rows 1 to
>> 5. Then you read rows 1 to 5 again they are not in the cache.
>>
>> Try testing with a lower number of hot rows, and/or a bigger row cache.
>>
>> But to be honest, with rows in the 10's of MB you will probably only get
>> good cache performance with a small set of hot rows.
>>
>> Hope that helps.
>>
>>
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 1/12/2012, at 5:11 AM, Yiming Sun <yi...@gmail.com> wrote:
>>
>> > Does anyone have any comments/suggestions for me regarding this?  Thanks
>> >
>> >
>> > I am trying to understand some strange behavior of cassandra row cache.
>>  We have a 6-node Cassandra cluster in a single data center on 2 racks, and
>> the neighboring nodes on the ring are from alternative racks.  Each node
>> has 1GB row cache, with key cache disabled.   The cluster uses
>> PropertyFileSnitch, and the ColumnFamily I fetch from uses
>> NetworkTopologyStrategy, with replication factor of 2.  My client code uses
>> Hector to fetch a fixed set of rows from cassandra
>> >
>> > What I don't quite understand is even after I ran the client code
>> several times, there are always some nodes with 0 row cache hits, despite
>> that the row cache from all nodes are filled and all nodes receive requests.
>> >
>> > Which nodes have 0 hits seem to be strongly related to the following:
>> >
>> >  - the set of row keys to fetch
>> >  - the order of the set of row keys to fetch
>> >  - the list of hosts passed to Hector's CassandraHostConfigurator
>> >  - the order of the list of hosts passed to Hector
>> >
>> > Can someone shed some lights on how exactly the row cache works and
>> hopefully also explain the behavior I have been seeing?  I thought if the
>> fixed set of the rows keys are the only thing I am fetching (each row
>> should be on the order of 10's of MBs, no more than 100MB), and each node
>> gets requests, and its row cache is filled, there's gotta be some hits.
>>  Apparent this is not the case.   Thanks.
>> >
>> > cluster information:
>> >
>> > Address         DC          Rack        Status State   Load
>>  Effective-Ownership Token
>> >
>>                    141784319550391026443072753096570088105
>> > x.x.x.1    DC1         r1          Up     Normal  587.46 GB
>> 33.33%              0
>> > x.x.x.2    DC1         r2          Up     Normal  591.21 GB
>> 33.33%              28356863910078205288614550619314017621
>> > x.x.x.3    DC1         r1          Up     Normal  594.97 GB
>> 33.33%              56713727820156410577229101238628035242
>> > x.x.x.4    DC1         r2          Up     Normal  587.15 GB
>> 33.33%              85070591730234615865843651857942052863
>> > x.x.x.5    DC1         r1          Up     Normal  590.26 GB
>> 33.33%              113427455640312821154458202477256070484
>> > x.x.x.6    DC1         r2          Up     Normal  583.21 GB
>> 33.33%              141784319550391026443072753096570088105
>> >
>> >
>> > [user@node]$ ./checkinfo.sh
>> > *************** x.x.x.4
>> > Token            : 85070591730234615865843651857942052863
>> > Gossip active    : true
>> > Thrift active    : true
>> > Load             : 587.15 GB
>> > Generation No    : 1354074048
>> > Uptime (seconds) : 36957
>> > Heap Memory (MB) : 2027.29 / 3948.00
>> > Data Center      : DC1
>> > Rack             : r2
>> > Exceptions       : 0
>> >
>> > Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0
>> requests, NaN recent hit rate, 14400 save period in seconds
>> > Row Cache        : size 1072651974 (bytes), capacity 1073741824
>> (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in
>> seconds
>> >
>> > *************** x.x.x.6
>> > Token            : 141784319550391026443072753096570088105
>> > Gossip active    : true
>> > Thrift active    : true
>> > Load             : 583.21 GB
>> > Generation No    : 1354074461
>> > Uptime (seconds) : 36535
>> > Heap Memory (MB) : 828.71 / 3948.00
>> > Data Center      : DC1
>> > Rack             : r2
>> > Exceptions       : 0
>> >
>> > Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0
>> requests, NaN recent hit rate, 14400 save period in seconds
>> > Row Cache        : size 1072602906 (bytes), capacity 1073741824
>> (bytes), 0 hits, 3194 requests, NaN recent hit rate, 0 save period in
>> seconds
>> >
>> >
>>
>>
>
>

Re: strange row cache behavior

Posted by aaron morton <aa...@thelastpickle.com>.

>  Does this mean we should not enable row caches until we are absolutely sure about what's hot (I think there is a reason why row caches are disabled by default) ?
Yes and Yes. 
Row cache takes memory and CPU, unless you know you are getting a benefit from it leave it off. The key cache and os disk cache will help. If you find latency is an issue then start poking around.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/12/2012, at 4:23 AM, Yiming Sun <yi...@gmail.com> wrote:

> Hi Aaron,
> 
> Thank you,and your explanation makes sense.  At the time, I thought having 1GB of row cache on each node was plenty enough, because there was an aggregated 6GB cache, but you are right, with each row in 10's of MBs, some of the nodes can go into a constant load and evict cycle and would have negative effects on the performance.  I will try as you suggested to 1.) reduce the requested entry set, and 2.) increase the row cache size and see if they get better hits, and also do 3) by reversing the requested entry list in alternate runs.
> 
> Our data space has close to 3 million rows, but we haven't gotten enough usage statistics to know what rows are hot.  Does this mean we should not enable row caches until we are absolutely sure about what's hot (I think there is a reason why row caches are disabled by default) ?  It also seems from my test that OS page cache works much better, but it could be that OS page cache can utilize all the available memory so it is essentially larger -- I guess I will find out by doing 2.) above.
> 
> best,
> 
> -- Y.
> 
> 
> 
> On Tue, Dec 4, 2012 at 4:47 AM, aaron morton <aa...@thelastpickle.com> wrote:
> > Row Cache        : size 1072651974 (bytes), capacity 1073741824 (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
> 
> So the cache is pretty much full, there is only 1 MB free.
> 
> There were 2,576 read requests that tried to get a row from the cache. Zero of those had a hit. If you have 6 nodes and RF 2, each node has  one third of the data in the cluster (from the effective ownership info). So depending on the read workload the number of read requests on each node may be different.
> 
> What I think is happening is reads are populating the row cache, then subsequent reads are evicting items from the row cache before you get back to reading the original rows. So if you read rows 1 to 5, they are put in the cache, when you read rows 6 to 10 they are put in and evict rows 1 to 5. Then you read rows 1 to 5 again they are not in the cache.
> 
> Try testing with a lower number of hot rows, and/or a bigger row cache.
> 
> But to be honest, with rows in the 10's of MB you will probably only get good cache performance with a small set of hot rows.
> 
> Hope that helps.
> 
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 1/12/2012, at 5:11 AM, Yiming Sun <yi...@gmail.com> wrote:
> 
> > Does anyone have any comments/suggestions for me regarding this?  Thanks
> >
> >
> > I am trying to understand some strange behavior of cassandra row cache.  We have a 6-node Cassandra cluster in a single data center on 2 racks, and the neighboring nodes on the ring are from alternative racks.  Each node has 1GB row cache, with key cache disabled.   The cluster uses PropertyFileSnitch, and the ColumnFamily I fetch from uses NetworkTopologyStrategy, with replication factor of 2.  My client code uses Hector to fetch a fixed set of rows from cassandra
> >
> > What I don't quite understand is even after I ran the client code several times, there are always some nodes with 0 row cache hits, despite that the row cache from all nodes are filled and all nodes receive requests.
> >
> > Which nodes have 0 hits seem to be strongly related to the following:
> >
> >  - the set of row keys to fetch
> >  - the order of the set of row keys to fetch
> >  - the list of hosts passed to Hector's CassandraHostConfigurator
> >  - the order of the list of hosts passed to Hector
> >
> > Can someone shed some lights on how exactly the row cache works and hopefully also explain the behavior I have been seeing?  I thought if the fixed set of the rows keys are the only thing I am fetching (each row should be on the order of 10's of MBs, no more than 100MB), and each node gets requests, and its row cache is filled, there's gotta be some hits.  Apparent this is not the case.   Thanks.
> >
> > cluster information:
> >
> > Address         DC          Rack        Status State   Load            Effective-Ownership Token
> >                                                                                            141784319550391026443072753096570088105
> > x.x.x.1    DC1         r1          Up     Normal  587.46 GB       33.33%              0
> > x.x.x.2    DC1         r2          Up     Normal  591.21 GB       33.33%              28356863910078205288614550619314017621
> > x.x.x.3    DC1         r1          Up     Normal  594.97 GB       33.33%              56713727820156410577229101238628035242
> > x.x.x.4    DC1         r2          Up     Normal  587.15 GB       33.33%              85070591730234615865843651857942052863
> > x.x.x.5    DC1         r1          Up     Normal  590.26 GB       33.33%              113427455640312821154458202477256070484
> > x.x.x.6    DC1         r2          Up     Normal  583.21 GB       33.33%              141784319550391026443072753096570088105
> >
> >
> > [user@node]$ ./checkinfo.sh
> > *************** x.x.x.4
> > Token            : 85070591730234615865843651857942052863
> > Gossip active    : true
> > Thrift active    : true
> > Load             : 587.15 GB
> > Generation No    : 1354074048
> > Uptime (seconds) : 36957
> > Heap Memory (MB) : 2027.29 / 3948.00
> > Data Center      : DC1
> > Rack             : r2
> > Exceptions       : 0
> >
> > Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds
> > Row Cache        : size 1072651974 (bytes), capacity 1073741824 (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
> >
> > *************** x.x.x.6
> > Token            : 141784319550391026443072753096570088105
> > Gossip active    : true
> > Thrift active    : true
> > Load             : 583.21 GB
> > Generation No    : 1354074461
> > Uptime (seconds) : 36535
> > Heap Memory (MB) : 828.71 / 3948.00
> > Data Center      : DC1
> > Rack             : r2
> > Exceptions       : 0
> >
> > Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds
> > Row Cache        : size 1072602906 (bytes), capacity 1073741824 (bytes), 0 hits, 3194 requests, NaN recent hit rate, 0 save period in seconds
> >
> >
> 
>

Re: strange row cache behavior

Posted by Yiming Sun <yi...@gmail.com>.

Hi Aaron,

Thank you,and your explanation makes sense.  At the time, I thought having
1GB of row cache on each node was plenty enough, because there was an
aggregated 6GB cache, but you are right, with each row in 10's of MBs, some
of the nodes can go into a constant load and evict cycle and would have
negative effects on the performance.  I will try as you suggested to 1.)
reduce the requested entry set, and 2.) increase the row cache size and see
if they get better hits, and also do 3) by reversing the requested entry
list in alternate runs.

Our data space has close to 3 million rows, but we haven't gotten enough
usage statistics to know what rows are hot.  Does this mean we should not
enable row caches until we are absolutely sure about what's hot (I think
there is a reason why row caches are disabled by default) ?  It also seems
from my test that OS page cache works much better, but it could be that OS
page cache can utilize all the available memory so it is essentially larger
-- I guess I will find out by doing 2.) above.

best,

-- Y.



On Tue, Dec 4, 2012 at 4:47 AM, aaron morton <aa...@thelastpickle.com>wrote:

> > Row Cache        : size 1072651974 (bytes), capacity 1073741824 (bytes),
> 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
>
> So the cache is pretty much full, there is only 1 MB free.
>
> There were 2,576 read requests that tried to get a row from the cache.
> Zero of those had a hit. If you have 6 nodes and RF 2, each node has  one
> third of the data in the cluster (from the effective ownership info). So
> depending on the read workload the number of read requests on each node may
> be different.
>
> What I think is happening is reads are populating the row cache, then
> subsequent reads are evicting items from the row cache before you get back
> to reading the original rows. So if you read rows 1 to 5, they are put in
> the cache, when you read rows 6 to 10 they are put in and evict rows 1 to
> 5. Then you read rows 1 to 5 again they are not in the cache.
>
> Try testing with a lower number of hot rows, and/or a bigger row cache.
>
> But to be honest, with rows in the 10's of MB you will probably only get
> good cache performance with a small set of hot rows.
>
> Hope that helps.
>
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/12/2012, at 5:11 AM, Yiming Sun <yi...@gmail.com> wrote:
>
> > Does anyone have any comments/suggestions for me regarding this?  Thanks
> >
> >
> > I am trying to understand some strange behavior of cassandra row cache.
>  We have a 6-node Cassandra cluster in a single data center on 2 racks, and
> the neighboring nodes on the ring are from alternative racks.  Each node
> has 1GB row cache, with key cache disabled.   The cluster uses
> PropertyFileSnitch, and the ColumnFamily I fetch from uses
> NetworkTopologyStrategy, with replication factor of 2.  My client code uses
> Hector to fetch a fixed set of rows from cassandra
> >
> > What I don't quite understand is even after I ran the client code
> several times, there are always some nodes with 0 row cache hits, despite
> that the row cache from all nodes are filled and all nodes receive requests.
> >
> > Which nodes have 0 hits seem to be strongly related to the following:
> >
> >  - the set of row keys to fetch
> >  - the order of the set of row keys to fetch
> >  - the list of hosts passed to Hector's CassandraHostConfigurator
> >  - the order of the list of hosts passed to Hector
> >
> > Can someone shed some lights on how exactly the row cache works and
> hopefully also explain the behavior I have been seeing?  I thought if the
> fixed set of the rows keys are the only thing I am fetching (each row
> should be on the order of 10's of MBs, no more than 100MB), and each node
> gets requests, and its row cache is filled, there's gotta be some hits.
>  Apparent this is not the case.   Thanks.
> >
> > cluster information:
> >
> > Address         DC          Rack        Status State   Load
>  Effective-Ownership Token
> >
>                    141784319550391026443072753096570088105
> > x.x.x.1    DC1         r1          Up     Normal  587.46 GB       33.33%
>              0
> > x.x.x.2    DC1         r2          Up     Normal  591.21 GB       33.33%
>              28356863910078205288614550619314017621
> > x.x.x.3    DC1         r1          Up     Normal  594.97 GB       33.33%
>              56713727820156410577229101238628035242
> > x.x.x.4    DC1         r2          Up     Normal  587.15 GB       33.33%
>              85070591730234615865843651857942052863
> > x.x.x.5    DC1         r1          Up     Normal  590.26 GB       33.33%
>              113427455640312821154458202477256070484
> > x.x.x.6    DC1         r2          Up     Normal  583.21 GB       33.33%
>              141784319550391026443072753096570088105
> >
> >
> > [user@node]$ ./checkinfo.sh
> > *************** x.x.x.4
> > Token            : 85070591730234615865843651857942052863
> > Gossip active    : true
> > Thrift active    : true
> > Load             : 587.15 GB
> > Generation No    : 1354074048
> > Uptime (seconds) : 36957
> > Heap Memory (MB) : 2027.29 / 3948.00
> > Data Center      : DC1
> > Rack             : r2
> > Exceptions       : 0
> >
> > Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0
> requests, NaN recent hit rate, 14400 save period in seconds
> > Row Cache        : size 1072651974 (bytes), capacity 1073741824 (bytes),
> 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
> >
> > *************** x.x.x.6
> > Token            : 141784319550391026443072753096570088105
> > Gossip active    : true
> > Thrift active    : true
> > Load             : 583.21 GB
> > Generation No    : 1354074461
> > Uptime (seconds) : 36535
> > Heap Memory (MB) : 828.71 / 3948.00
> > Data Center      : DC1
> > Rack             : r2
> > Exceptions       : 0
> >
> > Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0
> requests, NaN recent hit rate, 14400 save period in seconds
> > Row Cache        : size 1072602906 (bytes), capacity 1073741824 (bytes),
> 0 hits, 3194 requests, NaN recent hit rate, 0 save period in seconds
> >
> >
>
>

Re: strange row cache behavior

Posted by aaron morton <aa...@thelastpickle.com>.

> Row Cache        : size 1072651974 (bytes), capacity 1073741824 (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds

So the cache is pretty much full, there is only 1 MB free. 

There were 2,576 read requests that tried to get a row from the cache. Zero of those had a hit. If you have 6 nodes and RF 2, each node has  one third of the data in the cluster (from the effective ownership info). So depending on the read workload the number of read requests on each node may be different. 

What I think is happening is reads are populating the row cache, then subsequent reads are evicting items from the row cache before you get back to reading the original rows. So if you read rows 1 to 5, they are put in the cache, when you read rows 6 to 10 they are put in and evict rows 1 to 5. Then you read rows 1 to 5 again they are not in the cache. 

Try testing with a lower number of hot rows, and/or a bigger row cache. 

But to be honest, with rows in the 10's of MB you will probably only get good cache performance with a small set of hot rows. 

Hope that helps. 



-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/12/2012, at 5:11 AM, Yiming Sun <yi...@gmail.com> wrote:

> Does anyone have any comments/suggestions for me regarding this?  Thanks
> 
> 
> I am trying to understand some strange behavior of cassandra row cache.  We have a 6-node Cassandra cluster in a single data center on 2 racks, and the neighboring nodes on the ring are from alternative racks.  Each node has 1GB row cache, with key cache disabled.   The cluster uses PropertyFileSnitch, and the ColumnFamily I fetch from uses NetworkTopologyStrategy, with replication factor of 2.  My client code uses Hector to fetch a fixed set of rows from cassandra
> 
> What I don't quite understand is even after I ran the client code several times, there are always some nodes with 0 row cache hits, despite that the row cache from all nodes are filled and all nodes receive requests.
> 
> Which nodes have 0 hits seem to be strongly related to the following:
> 
>  - the set of row keys to fetch
>  - the order of the set of row keys to fetch
>  - the list of hosts passed to Hector's CassandraHostConfigurator
>  - the order of the list of hosts passed to Hector
> 
> Can someone shed some lights on how exactly the row cache works and hopefully also explain the behavior I have been seeing?  I thought if the fixed set of the rows keys are the only thing I am fetching (each row should be on the order of 10's of MBs, no more than 100MB), and each node gets requests, and its row cache is filled, there's gotta be some hits.  Apparent this is not the case.   Thanks.
> 
> cluster information:
> 
> Address         DC          Rack        Status State   Load            Effective-Ownership Token                                       
>                                                                                            141784319550391026443072753096570088105     
> x.x.x.1    DC1         r1          Up     Normal  587.46 GB       33.33%              0                                           
> x.x.x.2    DC1         r2          Up     Normal  591.21 GB       33.33%              28356863910078205288614550619314017621      
> x.x.x.3    DC1         r1          Up     Normal  594.97 GB       33.33%              56713727820156410577229101238628035242      
> x.x.x.4    DC1         r2          Up     Normal  587.15 GB       33.33%              85070591730234615865843651857942052863      
> x.x.x.5    DC1         r1          Up     Normal  590.26 GB       33.33%              113427455640312821154458202477256070484     
> x.x.x.6    DC1         r2          Up     Normal  583.21 GB       33.33%              141784319550391026443072753096570088105    
> 
> 
> [user@node]$ ./checkinfo.sh   
> *************** x.x.x.4
> Token            : 85070591730234615865843651857942052863
> Gossip active    : true
> Thrift active    : true
> Load             : 587.15 GB
> Generation No    : 1354074048
> Uptime (seconds) : 36957
> Heap Memory (MB) : 2027.29 / 3948.00
> Data Center      : DC1
> Rack             : r2
> Exceptions       : 0
> 
> Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds
> Row Cache        : size 1072651974 (bytes), capacity 1073741824 (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
> 
> *************** x.x.x.6
> Token            : 141784319550391026443072753096570088105
> Gossip active    : true
> Thrift active    : true
> Load             : 583.21 GB
> Generation No    : 1354074461
> Uptime (seconds) : 36535
> Heap Memory (MB) : 828.71 / 3948.00
> Data Center      : DC1
> Rack             : r2
> Exceptions       : 0
> 
> Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds
> Row Cache        : size 1072602906 (bytes), capacity 1073741824 (bytes), 0 hits, 3194 requests, NaN recent hit rate, 0 save period in seconds
> 
>

Re: strange row cache behavior

Posted by Yiming Sun <yi...@gmail.com>.

Does anyone have any comments/suggestions for me regarding this?  Thanks


I am trying to understand some strange behavior of cassandra row cache.  We
> have a 6-node Cassandra cluster in a single data center on 2 racks, and the
> neighboring nodes on the ring are from alternative racks.  Each node has
> 1GB row cache, with key cache disabled.   The cluster uses
> PropertyFileSnitch, and the ColumnFamily I fetch from uses
> NetworkTopologyStrategy, with replication factor of 2.  My client code uses
> Hector to fetch a fixed set of rows from cassandra
>
> What I don't quite understand is even after I ran the client code several
> times, there are always some nodes with 0 row cache hits, despite that the
> row cache from all nodes are filled and all nodes receive requests.
>
> Which nodes have 0 hits seem to be strongly related to the following:
>
>  - the set of row keys to fetch
>  - the order of the set of row keys to fetch
>  - the list of hosts passed to Hector's CassandraHostConfigurator
>  - the order of the list of hosts passed to Hector
>
> Can someone shed some lights on how exactly the row cache works and
> hopefully also explain the behavior I have been seeing?  I thought if the
> fixed set of the rows keys are the only thing I am fetching (each row
> should be on the order of 10's of MBs, no more than 100MB), and each node
> gets requests, and its row cache is filled, there's gotta be some hits.
>  Apparent this is not the case.   Thanks.
>
> cluster information:
>
> Address         DC          Rack        Status State   Load
> Effective-Ownership Token
>
> 141784319550391026443072753096570088105
> x.x.x.1    DC1         r1          Up     Normal  587.46 GB
> 33.33%              0
> x.x.x.2    DC1         r2          Up     Normal  591.21 GB
> 33.33%              28356863910078205288614550619314017621
> x.x.x.3    DC1         r1          Up     Normal  594.97 GB
> 33.33%              56713727820156410577229101238628035242
> x.x.x.4    DC1         r2          Up     Normal  587.15 GB
> 33.33%              85070591730234615865843651857942052863
> x.x.x.5    DC1         r1          Up     Normal  590.26 GB
> 33.33%              113427455640312821154458202477256070484
> x.x.x.6    DC1         r2          Up     Normal  583.21 GB
> 33.33%              141784319550391026443072753096570088105
>
>
> [user@node]$ ./checkinfo.sh
> *************** x.x.x.4
> Token            : 85070591730234615865843651857942052863
> Gossip active    : true
> Thrift active    : true
> Load             : 587.15 GB
> Generation No    : 1354074048
> Uptime (seconds) : 36957
> Heap Memory (MB) : 2027.29 / 3948.00
> Data Center      : DC1
> Rack             : r2
> Exceptions       : 0
>
> Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
> NaN recent hit rate, 14400 save period in seconds
> Row Cache        : size 1072651974 (bytes), capacity 1073741824 (bytes), 0
> hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
>
> *************** x.x.x.6
> Token            : 141784319550391026443072753096570088105
> Gossip active    : true
> Thrift active    : true
> Load             : 583.21 GB
> Generation No    : 1354074461
> Uptime (seconds) : 36535
> Heap Memory (MB) : 828.71 / 3948.00
> Data Center      : DC1
> Rack             : r2
> Exceptions       : 0
>
> Key Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
> NaN recent hit rate, 14400 save period in seconds
> Row Cache        : size 1072602906 (bytes), capacity 1073741824 (bytes), 0
> hits, 3194 requests, NaN recent hit rate, 0 save period in seconds
>