You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Hannu Kröger <hk...@gmail.com> on 2018/03/04 18:45:43 UTC

Row cache functionality - Some confusion

Hello,

I am trying to verify and understand fully the functionality of row cache in Cassandra.

I have been using mainly two different sources for information:
https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476 <https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476>
AND
http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options <http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options>

and based on what I read documentation is not correct. 

Documentation says like this:
“rows_per_partition: The amount of rows to cache per partition (“row cache”). If an integer n is specified, the first n queried rows of a partition will be cached. Other possible options are ALL, to cache all rows of a queried partition, or NONE to disable row caching.”

The problematic part is "the first n queried rows of a partition will be cached”. Shouldn’t it be that the first N rows in a partition will be cached? Not first N that are queried?

If this is the case, I’m more than happy to create a ticket (and maybe even create a patch) for the doc update.

BR,
Hannu


Re: Row cache functionality - Some confusion

Posted by Rahul Singh <ra...@gmail.com>.
It’s pretty clear to me that the only thing that gets put into the caches are the top N rows.

https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L523

It may fetch more, but it doesn’t cache it. It may get more if its not the full partition cache, but theres no code that inserts into the CacheService except

https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L528



--
Rahul Singh
rahul.singh@anant.us

Anant Corporation

On Mar 12, 2018, 8:56 AM -0400, Hannu Kröger <hk...@gmail.com>, wrote:
>
> > On 12 Mar 2018, at 14:45, Rahul Singh <ra...@gmail.com> wrote:
> >
> > I may be wrong, but what I’ve read and used in the past assumes that the “first” N rows are cached and the clustering key design is how I change what N rows are put into memory. Looking at the code, it seems that’s the case.
>
> So we agree that we row cache is storing only N rows from the beginning of the partition. So if only the last row in a partition is read, then it probably doesn’t get cached assuming there are more than N rows in a partition?
>
> > The language of the comment basically says that it holds in cache what satisfies the query if and only if it’s the head of the partition, if not it fetches it and saves it - I dont interpret it differently from what I have seen in the documentation.
>
> Hmm, I’m trying to understand this. Does it mean that it stores the results in cache if it is head and if not, it will fetch the head and store that (instead of the results for the query) ?
>
> Hannu

Re: Row cache functionality - Some confusion

Posted by Hannu Kröger <hk...@gmail.com>.
> On 12 Mar 2018, at 14:45, Rahul Singh <ra...@gmail.com> wrote:
> 
> I may be wrong, but what I’ve read and used in the past assumes that the “first” N rows are cached and the clustering key design is how I change what N rows are put into memory. Looking at the code, it seems that’s the case. 

So we agree that we row cache is storing only N rows from the beginning of the partition. So if only the last row in a partition is read, then it probably doesn’t get cached assuming there are more than N rows in a partition?

> The language of the comment basically says that it holds in cache what satisfies the query if and only if it’s the head of the partition, if not it fetches it and saves it - I dont interpret it differently from what I have seen in the documentation. 

Hmm, I’m trying to understand this. Does it mean that it stores the results in cache if it is head and if not, it will fetch the head and store that (instead of the results for the query) ?

Hannu

Re: Row cache functionality - Some confusion

Posted by Rahul Singh <ra...@gmail.com>.
I may be wrong, but what I’ve read and used in the past assumes that the “first” N rows are cached and the clustering key design is how I change what N rows are put into memory. Looking at the code, it seems that’s the case.

The language of the comment basically says that it holds in cache what satisfies the query if and only if it’s the head of the partition, if not it fetches it and saves it - I dont interpret it differently from what I have seen in the documentation.



--
Rahul Singh
rahul.singh@anant.us

Anant Corporation

On Mar 12, 2018, 7:13 AM -0400, Hannu Kröger <hk...@gmail.com>, wrote:
>
> rows_per_partition

Re: Row cache functionality - Some confusion

Posted by Hannu Kröger <hk...@gmail.com>.
Hi,

My goal is to make sure that I understand functionality correctly and that the documentation is accurate. 

The question in other words: Is the documentation or the comment in the code wrong (or inaccurate).

Hannu

> On 12 Mar 2018, at 13:00, Rahul Singh <ra...@gmail.com> wrote:
> 
> What’s the goal? How big are your partitions , size in MB and in rows?
> 
> --
> Rahul Singh
> rahul.singh@anant.us
> 
> Anant Corporation
> 
> On Mar 12, 2018, 6:37 AM -0400, Hannu Kröger <hk...@gmail.com>, wrote:
>> Anyone?
>> 
>>> On 4 Mar 2018, at 20:45, Hannu Kröger <hkroger@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hello,
>>> 
>>> I am trying to verify and understand fully the functionality of row cache in Cassandra.
>>> 
>>> I have been using mainly two different sources for information:
>>> https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476 <https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476>
>>> AND
>>> http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options <http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options>
>>> 
>>> and based on what I read documentation is not correct. 
>>> 
>>> Documentation says like this:
>>> “rows_per_partition: The amount of rows to cache per partition (“row cache”). If an integer n is specified, the first n queried rows of a partition will be cached. Other possible options are ALL, to cache all rows of a queried partition, or NONE to disable row caching.”
>>> 
>>> The problematic part is "the first n queried rows of a partition will be cached”. Shouldn’t it be that the first N rows in a partition will be cached? Not first N that are queried?
>>> 
>>> If this is the case, I’m more than happy to create a ticket (and maybe even create a patch) for the doc update.
>>> 
>>> BR,
>>> Hannu
>>> 
>> 


Re: Row cache functionality - Some confusion

Posted by Rahul Singh <ra...@gmail.com>.
What’s the goal? How big are your partitions , size in MB and in rows?

--
Rahul Singh
rahul.singh@anant.us

Anant Corporation

On Mar 12, 2018, 6:37 AM -0400, Hannu Kröger <hk...@gmail.com>, wrote:
> Anyone?
>
> > On 4 Mar 2018, at 20:45, Hannu Kröger <hk...@gmail.com> wrote:
> >
> > Hello,
> >
> > I am trying to verify and understand fully the functionality of row cache in Cassandra.
> >
> > I have been using mainly two different sources for information:
> > https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
> > AND
> > http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options
> >
> > and based on what I read documentation is not correct.
> >
> > Documentation says like this:
> > “rows_per_partition: The amount of rows to cache per partition (“row cache”). If an integer n is specified, the first n queried rows of a partition will be cached. Other possible options are ALL, to cache all rows of a queried partition, or NONE to disable row caching.”
> >
> > The problematic part is "the first n queried rows of a partition will be cached”. Shouldn’t it be that the first N rows in a partition will be cached? Not first N that are queried?
> >
> > If this is the case, I’m more than happy to create a ticket (and maybe even create a patch) for the doc update.
> >
> > BR,
> > Hannu
> >
>

Re: Row cache functionality - Some confusion

Posted by Hannu Kröger <hk...@gmail.com>.
Anyone?

> On 4 Mar 2018, at 20:45, Hannu Kröger <hk...@gmail.com> wrote:
> 
> Hello,
> 
> I am trying to verify and understand fully the functionality of row cache in Cassandra.
> 
> I have been using mainly two different sources for information:
> https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476 <https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476>
> AND
> http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options <http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options>
> 
> and based on what I read documentation is not correct. 
> 
> Documentation says like this:
> “rows_per_partition: The amount of rows to cache per partition (“row cache”). If an integer n is specified, the first n queried rows of a partition will be cached. Other possible options are ALL, to cache all rows of a queried partition, or NONE to disable row caching.”
> 
> The problematic part is "the first n queried rows of a partition will be cached”. Shouldn’t it be that the first N rows in a partition will be cached? Not first N that are queried?
> 
> If this is the case, I’m more than happy to create a ticket (and maybe even create a patch) for the doc update.
> 
> BR,
> Hannu
>