You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Chidamber Kulkarni <ch...@reniac.com> on 2020/12/14 17:43:42 UTC

Cassandra row-cache API

Hello All,

Wondering if anyone has tried to modify the row-cache API to use both the
partition key and the clustering keys to convert the row-cache, which is
really a partition cache today, into a true row-cache? This might help with
broader adoption of row-cache for use-cases with large partition sizes.
Would appreciate any thoughts from the experts here.

thanks,
Chidamber

Re: Cassandra row-cache API

Posted by Chidamber Kulkarni <ch...@reniac.com>.

Thanks Jeff, the summary at the end is very insightful - something we also
are observing.

On a related note, we do observe that the "first N clustering" doesn't
exactly behave the way it is documented to. Is it related to this open
ticket CASSANDRA-8646 <https://issues.apache.org/jira/browse/CASSANDRA-8646>
?



On Mon, Dec 14, 2020 at 9:53 AM Jeff Jirsa <jj...@gmail.com> wrote:

> Sometime around 2.0 or 2.1, it was changed from a "partition cache" to a
> "head of the partition cache", where "head of the partition" means "first N
> clustering".
>
> The reason individual rows are "hard" is the same reason most things with
> Cassandra caching and consistency are "hard" - a clustering / row may not
> change, but it may be deleted by a range delete that deletes it and many
> other clusterings / rows, which makes maintaining correctness of an
> individual row cache not that different from maintenance of the data around
> it, which ends up looking a lot like "keep a part of the partition in
> memory", which is basically what's there now.
>
> That said:
> - The implementation is not great. I haven't looked into specifics but it
> is incredibly rare to find a use case where it's a win, even on very narrow
> partitions (you basically need workloads that are ALMOST immutable), partly
> because:
> - You're still caching the data on one replica of N, and caching the
> converged result usually ends up to be a bigger win and easier to
> manage/invalidate. So memcached/redis/etc outside of the result still
> usually ends up better.
>
>
>
>
> On Mon, Dec 14, 2020 at 9:44 AM Chidamber Kulkarni <ch...@reniac.com>
> wrote:
>
>> Hello All,
>>
>> Wondering if anyone has tried to modify the row-cache API to use both the
>> partition key and the clustering keys to convert the row-cache, which is
>> really a partition cache today, into a true row-cache? This might help with
>> broader adoption of row-cache for use-cases with large partition sizes.
>> Would appreciate any thoughts from the experts here.
>>
>> thanks,
>> Chidamber
>>
>>

Re: Cassandra row-cache API

Posted by Jeff Jirsa <jj...@gmail.com>.

Sometime around 2.0 or 2.1, it was changed from a "partition cache" to a
"head of the partition cache", where "head of the partition" means "first N
clustering".

The reason individual rows are "hard" is the same reason most things with
Cassandra caching and consistency are "hard" - a clustering / row may not
change, but it may be deleted by a range delete that deletes it and many
other clusterings / rows, which makes maintaining correctness of an
individual row cache not that different from maintenance of the data around
it, which ends up looking a lot like "keep a part of the partition in
memory", which is basically what's there now.

That said:
- The implementation is not great. I haven't looked into specifics but it
is incredibly rare to find a use case where it's a win, even on very narrow
partitions (you basically need workloads that are ALMOST immutable), partly
because:
- You're still caching the data on one replica of N, and caching the
converged result usually ends up to be a bigger win and easier to
manage/invalidate. So memcached/redis/etc outside of the result still
usually ends up better.

On Mon, Dec 14, 2020 at 9:44 AM Chidamber Kulkarni <ch...@reniac.com>
wrote:

> Hello All,
>
> Wondering if anyone has tried to modify the row-cache API to use both the
> partition key and the clustering keys to convert the row-cache, which is
> really a partition cache today, into a true row-cache? This might help with
> broader adoption of row-cache for use-cases with large partition sizes.
> Would appreciate any thoughts from the experts here.
>
> thanks,
> Chidamber
>
>