You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Saket Joshi <sj...@touchcommerce.com> on 2011/01/13 19:47:41 UTC

cassandra row cache

Hi, 

I am running a 15 node cluster ,version 0.6.8, Linux 64bit OS, using
mmap I/O, 6GB ram allocated. I have row cache enabled to 80000 keys
(mean row size is 2KB). I am observing a strange behaviour.. I query for
1.6 Million rows across the cluster and time taken is around 40 mins , I
query the same data again , the time now is 25 mins to fetch data (i am
expecting the cache to be warm now) , but i see row cache hit rate
around 30% . Now i request the same data 3rd time, time to fetch is
under 4 mins and cache hit ratios are 99% ... Does any one have an idea
why this may be happening ?

 

 

Thanks,

Saket

Re: phpcassa never return(infinite loop)?!!!

Posted by Tyler Hobbs <ty...@riptano.com>.

Answered in the phpcassa ML here:

http://groups.google.com/group/phpcassa/browse_thread/thread/2771112a323860f7

- Tyler

On Fri, Jan 14, 2011 at 12:36 PM, kh jo <jo...@yahoo.com> wrote:

> I am trying to use phpcasse
>
> I use the following example
>
>
> CassandraConn::add_node('localhost', 9160);
> $users = new CassandraCF('rhg', 'Users'); // ColumnFamily
> $users->insert('1', array('email' => 't...<http://groups.google.com/groups/unlock?_done=/group/phpcassa/browse_thread/thread/2771112a323860f7&msg=6036f4324755a72e>
> @example.com', 'password' =>
> 'test'));
>
>
> when I run it, it never returns,,, and apache process eats 100% CPU
>
>
> I am using cassandra 0.7
>
> any idea why this happens?
>
> thanks
>
>

Re: cassandra row cache

Posted by Jonathan Ellis <jb...@gmail.com>.

That's possible, yes.  He'd want to make sure there aren't any of
those WARN messages in the logs.

On Fri, Jan 14, 2011 at 11:46 AM, Mike Malone <mi...@simplegeo.com> wrote:
> Digest reads could be being dropped..?
>
> On Thu, Jan 13, 2011 at 4:11 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> On Thu, Jan 13, 2011 at 2:00 PM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>> > Is it possible that your are reading at READ.ONE and that READ.ONE
>> > only warms cache on 1 of your three nodes= 20. 2nd read warms another
>> > 60%, and by the third read all the replicas are warm? 99% ?
>> >
>> > This would be true if digest reads were not warming caches.
>>
>> Digest reads do go through the cache path.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

phpcassa never return(infinite loop)?!!!

Posted by kh jo <jo...@yahoo.com>.

I am trying to use phpcasse


 I use the following example 
 CassandraConn::add_node('localhost', 9160);

 $users = new CassandraCF('rhg', 'Users'); // ColumnFamily

 $users->insert('1', array('email' => 't...@example.com', 'password' =>

 'test')); 
 when I run it, it never returns,,, and apache process eats 100% CPU 
 I am using cassandra 0.7

any idea why this happens?

thanks

Re: cassandra row cache

Posted by Mike Malone <mi...@simplegeo.com>.

Digest reads could be being dropped..?

On Thu, Jan 13, 2011 at 4:11 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Thu, Jan 13, 2011 at 2:00 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
> > Is it possible that your are reading at READ.ONE and that READ.ONE
> > only warms cache on 1 of your three nodes= 20. 2nd read warms another
> > 60%, and by the third read all the replicas are warm? 99% ?
> >
> > This would be true if digest reads were not warming caches.
>
> Digest reads do go through the cache path.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: cassandra row cache

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Jan 13, 2011 at 2:00 PM, Edward Capriolo <ed...@gmail.com> wrote:
> Is it possible that your are reading at READ.ONE and that READ.ONE
> only warms cache on 1 of your three nodes= 20. 2nd read warms another
> 60%, and by the third read all the replicas are warm? 99% ?
>
> This would be true if digest reads were not warming caches.

Digest reads do go through the cache path.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: limiting columns in a row

Posted by Sylvain Lebresne <sy...@riptano.com>.

Hi,

> does this seem like a generally useful feature?

I do think this could be a useful feature. If only because I don't think
there
is any satisfactory/efficient way to do this client side.

> if so, would it be hard to implement (maybe it could be done at compaction
> time like the TTL feature)?

Out of the top of my hat (aka, I haven't really think that through but I'll
still give my opinion), I see the following difficulties:
  1) You can only do this limiting during major compaction or the same cases
     as CASSANDRA-1074 for minor, since you need to make sure the x columns
you
     are keeping are not deleted ones. Or you'll want to disable deletes
     altogether on the cf with this 'limit' option (I feel like this last
     option would really simplify things).
  2) Even if the removal of the column exceeding the limit is eventual (and
it
     will), you'll want query to only ever return column inside the limit
     (otherwise the feature would be too unpredictable). But I think this
will
     be quite challenging. That is, slice query from the start of the row
are
     easy. Everything else is harder (at least if you want to make it
efficient).

That was my 2 cents. Anyway, you can always open a JIRA ticket.

--
Sylvain


On Fri, Jan 14, 2011 at 7:38 AM, mike dooley <do...@apple.com> wrote:

> hi,
>
> the time-to-live feature in 0.7 is very nice and it made me want to ask
> about
> a somewhat similar feature.
>
> i have a stream of data consisting of entities and associated samples.  so
> i create
> a row for each entity and the columns in each row contain the samples for
> that entity.
> when i get around to processing  an entity i only care about the most
> recent N samples.
> so i read the most recent N columns and delete all the rest.
>
> what i would like would be a column family property that allows me to
> specify a maximum number of columns per row.  then i could just keep
> writing
> and not have to do the deletes.
>
> in my case it would be fine if the limit is only 'eventually' applied (so
> that
> sometimes there might be extra columns).
>
> does this seem like a generally useful feature?  if so, would it be hard to
> implement (maybe it could be done at compaction time like the TTL feature)?
>
> thanks,
> -mike

limiting columns in a row

Posted by mike dooley <do...@apple.com>.

hi,

the time-to-live feature in 0.7 is very nice and it made me want to ask about
a somewhat similar feature.  

i have a stream of data consisting of entities and associated samples.  so i create 
a row for each entity and the columns in each row contain the samples for that entity.  
when i get around to processing  an entity i only care about the most recent N samples. 
so i read the most recent N columns and delete all the rest.

what i would like would be a column family property that allows me to
specify a maximum number of columns per row.  then i could just keep writing
and not have to do the deletes.

in my case it would be fine if the limit is only 'eventually' applied (so that
sometimes there might be extra columns).

does this seem like a generally useful feature?  if so, would it be hard to
implement (maybe it could be done at compaction time like the TTL feature)?

thanks,
-mike

Re: cassandra row cache

Posted by Edward Capriolo <ed...@gmail.com>.

Is it possible that your are reading at READ.ONE and that READ.ONE
only warms cache on 1 of your three nodes= 20. 2nd read warms another
60%, and by the third read all the replicas are warm? 99% ?

This would be true if digest reads were not warming caches.

Edward

On Thu, Jan 13, 2011 at 4:07 PM, Saket Joshi <sj...@touchcommerce.com> wrote:
> The cache is 800,000 per node , I have 15 nodes in the cluster. I see the cache value increased after the first run, the row cache hit rate was 0 for first run. For second run of the same data , the hit rate increased to 30% but on the third it jumps to 99%
>
>
> -Saket
>
> -----Original Message-----
> From: Chris Burroughs [mailto:chris.burroughs@gmail.com]
> Sent: Thursday, January 13, 2011 1:03 PM
> To: user@cassandra.apache.org
> Cc: Saket Joshi
> Subject: Re: cassandra row cache
>
> On 01/13/2011 02:05 PM, Saket Joshi wrote:
>> Yes it does change.
>>
>
> So the confusing part for me is why a cache of size 80,000 would not be
> fill after 1,600,000 requests.  Can you observe items cached and hit
> rate while making the first 1.6 million row query?
>

RE: cassandra row cache

Posted by Saket Joshi <sj...@touchcommerce.com>.

The cache is 800,000 per node , I have 15 nodes in the cluster. I see the cache value increased after the first run, the row cache hit rate was 0 for first run. For second run of the same data , the hit rate increased to 30% but on the third it jumps to 99%

-Saket

-----Original Message-----
From: Chris Burroughs [mailto:chris.burroughs@gmail.com] 
Sent: Thursday, January 13, 2011 1:03 PM
To: user@cassandra.apache.org
Cc: Saket Joshi
Subject: Re: cassandra row cache

On 01/13/2011 02:05 PM, Saket Joshi wrote:
> Yes it does change. 
> 

So the confusing part for me is why a cache of size 80,000 would not be
fill after 1,600,000 requests.  Can you observe items cached and hit
rate while making the first 1.6 million row query?

Re: Bloom filter

Posted by Chris Burroughs <ch...@gmail.com>.

On 01/13/2011 04:07 PM, Carlos Sanchez wrote:
> Could someone tell me where (what classes) or what library is Cassandra using for its bloom filters?

src/java/org/apache/cassandra/utils/BloomFilter.java

Bloom filter

Posted by Carlos Sanchez <ca...@msci.com>.

All,

Could someone tell me where (what classes) or what library is Cassandra using for its bloom filters?

Thanks

Carlos

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message. http://www.mscibarra.com/legal/local_registered_entities.html

Re: cassandra row cache

Posted by Chris Burroughs <ch...@gmail.com>.

On 01/13/2011 02:05 PM, Saket Joshi wrote:
> Yes it does change. 
> 

So the confusing part for me is why a cache of size 80,000 would not be
fill after 1,600,000 requests.  Can you observe items cached and hit
rate while making the first 1.6 million row query?

Re: cassandra row cache

Posted by Ryan King <ry...@twitter.com>.

I'm not sure if this is entirely true, but I *think* older version of
cassandra used a version of the ConcurrentLinkedHashmap (which backs
the row cache) that used the Second Chance algorithm, rather than LRU,
which might explain this non-LRU-like behavior. I may be entirely
wrong about this though.

-ryan

On Thu, Jan 13, 2011 at 11:05 AM, Saket Joshi <sj...@touchcommerce.com> wrote:
> Yes it does change.
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Thursday, January 13, 2011 11:01 AM
> To: user
> Subject: Re: cassandra row cache
>
> does the cache size change between 2nd and 3rd time?
>
> On Thu, Jan 13, 2011 at 10:47 AM, Saket Joshi <sj...@touchcommerce.com>
> wrote:
>> Hi,
>>
>> I am running a 15 node cluster ,version 0.6.8, Linux 64bit OS, using
> mmap
>> I/O, 6GB ram allocated. I have row cache enabled to 80000 keys (mean
> row
>> size is 2KB). I am observing a strange behaviour.. I query for 1.6
> Million
>> rows across the cluster and time taken is around 40 mins , I query the
> same
>> data again , the time now is 25 mins to fetch data (i am expecting the
> cache
>> to be warm now) , but i see row cache hit rate around 30% . Now i
> request
>> the same data 3rd time, time to fetch is under 4 mins and cache hit
> ratios
>> are 99% ... Does any one have an idea why this may be happening ?
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Saket
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

RE: cassandra row cache

Posted by Saket Joshi <sj...@touchcommerce.com>.

Yes it does change. 

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Thursday, January 13, 2011 11:01 AM
To: user
Subject: Re: cassandra row cache

does the cache size change between 2nd and 3rd time?

On Thu, Jan 13, 2011 at 10:47 AM, Saket Joshi <sj...@touchcommerce.com>
wrote:
> Hi,
>
> I am running a 15 node cluster ,version 0.6.8, Linux 64bit OS, using
mmap
> I/O, 6GB ram allocated. I have row cache enabled to 80000 keys (mean
row
> size is 2KB). I am observing a strange behaviour.. I query for 1.6
Million
> rows across the cluster and time taken is around 40 mins , I query the
same
> data again , the time now is 25 mins to fetch data (i am expecting the
cache
> to be warm now) , but i see row cache hit rate around 30% . Now i
request
> the same data 3rd time, time to fetch is under 4 mins and cache hit
ratios
> are 99% ... Does any one have an idea why this may be happening ?
>
>
>
>
>
> Thanks,
>
> Saket



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: cassandra row cache

Posted by Jonathan Ellis <jb...@gmail.com>.

does the cache size change between 2nd and 3rd time?

On Thu, Jan 13, 2011 at 10:47 AM, Saket Joshi <sj...@touchcommerce.com> wrote:
> Hi,
>
> I am running a 15 node cluster ,version 0.6.8, Linux 64bit OS, using mmap
> I/O, 6GB ram allocated. I have row cache enabled to 80000 keys (mean row
> size is 2KB). I am observing a strange behaviour.. I query for 1.6 Million
> rows across the cluster and time taken is around 40 mins , I query the same
> data again , the time now is 25 mins to fetch data (i am expecting the cache
> to be warm now) , but i see row cache hit rate around 30% . Now i request
> the same data 3rd time, time to fetch is under 4 mins and cache hit ratios
> are 99% ... Does any one have an idea why this may be happening ?
>
>
>
>
>
> Thanks,
>
> Saket



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com