You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by nitin padalia <pa...@gmail.com> on 2015/01/19 09:47:19 UTC

Cassandra fetches complete partition

Hi,

Does Cassandra fetches complete partition if I include Cluster key in
where clause.

Or What is the difference in:
1. Select * from column_family where partition_key = 'somekey' limit 1;
2. Select * from column_family where partition_key = 'somekey' and
clustering_key = 'some_clustring_key';



Thanks! in advance.
Nitin Padalia

Re: Cassandra fetches complete partition

Posted by Eric Stevens <mi...@gmail.com>.
It depends on your version of Cassandra.  I would suggest starting with
this, which describes the differences between 2.0 and 2.1
http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

In particular:

> In previous releases, this cache has required storing the entire
partition in memory, which meant that if that was larger than the cache
size, you would never be reading it from the cache. Cassandra 2.1 has
introduced extra CQL syntax to specify the number of rows to cache per
partition.

*However* row cache is actually a surprisingly dangerous property for the
health of a cluster.  Practically speaking it's very, very rarely useful.
In particular the OS does a good job of caching disk seeks in the page
cache, and Cassandra relies on this heavily for consistent and reliable
performance.  When you establish a row cache, you're putting a copy of the
data into off-heap Cassandra memory (a huge win over previous on-heap row
caches), but practically speaking this has little to no real advantage over
the OS level cache of the same data.  And it has the downside that it can
hold onto cold data whose memory would be better used for some other
operation.

In Cassandra 2.0, row caches were "Don't use them.  No, seriously!"
(Jonathan Ellis, CTO Datastax at Cassandra Summit '14 keynote).  In 2.1
they're better because of changes mentioned in the article above, but
except for fairly narrow use cases you're usually better off focusing on
something else for performance optimizations first.

Couple this with the strong recommendation that Cassandra 2.1 isn't yet a
good candidate for important production uses (see
https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/),
you probably should not be concerning yourself with row caches yet.


On Mon, Jan 19, 2015 at 7:05 AM, nitin padalia <pa...@gmail.com>
wrote:

> e.g.
> CREATE TABLE usertable_cache (
>   user_id uuid,
>   dept_id uuid,
>   location_id text,
>   locationmap_id uuid,
>   PRIMARY KEY ((user_id, dept_id), location_id)
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='{"keys":"ALL", "rows_per_partition":"1000"}' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.100000 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.000000 AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
>
>
>
> select * from usertable_cache WHERE user_id =
> 7bf16edf-b552-40f4-94ac-87b2e878d8c2  and dept_id
> =de3ac44f-2078-4321-a47c-de96c615d40d and location_id = 'ABC4:1';
>
>  user_id                              | dept_id
>       | location_id   | locationmap_id
>
> --------------------------------------+--------------------------------------+---------------+--------------------------------------
>  7bf16edf-b552-40f4-94ac-87b2e878d8c2 |
> de3ac44f-2078-4321-a47c-de96c615d40d |        ABC4:1 |
> 32b97639-ea5b-427f-8c27-8a5016e2ad6e
>
> (1 rows)
>
>
> Tracing session: c40f9ba0-9fe2-11e4-9522-35de4dc20d00
>
>  activity
>
>                                                                  |
> timestamp    | source       | source_elapsed
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+--------------+----------------
>
>
>                                               execute_cql3_query |
> 19:25:02,875 | 10.76.214.80 |              0
>  Parsing select * from usertable_cache WHERE user_id =
> 7bf16edf-b552-40f4-94ac-87b2e878d8c2  and dept_id =
> de3ac44f-2078-4321-a47c-de96c615d40d and location_id = 'ABC4:1 LIMIT
> 10000; | 19:25:02,875 | 10.76.214.80 |             60
>
>
>                                              Preparing statement |
> 19:25:02,875 | 10.76.214.80 |            157
>
>
>       Ignoring row cache as cached value could not satisfy query |
> 19:25:02,879 | 10.76.214.80 |           3668
>
>
> Executing single-partition query on userobjectid_by_extn_uri_10k_cache
> | 19:25:02,879 | 10.76.214.80 |           3690
>
>
>                                     Acquiring sstable references |
> 19:25:02,879 | 10.76.214.80 |           3700
>
>
>                                      Merging memtable tombstones |
> 19:25:02,879 | 10.76.214.80 |           3755
>
>
>                                      Key cache hit for sstable 3 |
> 19:25:02,879 | 10.76.214.80 |           4264
>
>
>                Seeking to partition indexed section in data file |
> 19:25:02,879 | 10.76.214.80 |           4276
>
>                                                               Skipped
> 0/1 non-slice-intersecting sstables, included 0 due to tombstones |
> 19:25:02,879 | 10.76.214.80 |           4324
>
>
>                       Merging data from memtables and 1 sstables |
> 19:25:02,879 | 10.76.214.80 |           4337
>
>
>                               Read 1 live and 0 tombstoned cells |
> 19:25:02,883 | 10.76.214.80 |           7596
>
>
>                                                 Request complete |
> 19:25:02,883 | 10.76.214.80 |           8263
>
>
>
> select * from usertable_cache WHERE user_id =
> 7bf16edf-b552-40f4-94ac-87b2e878d8c2  and dept_id =
> de3ac44f-2078-4321-a47c-de96c615d40d and location_id = 'ABC4:2';
>
>  user_id                              | dept_id
>       | location_id   | locationmap_id
>
> --------------------------------------+--------------------------------------+---------------+--------------------------------------
>  7bf16edf-b552-40f4-94ac-87b2e878d8c2 |
> de3ac44f-2078-4321-a47c-de96c615d40d |        ABC4:2 |
> 1ddf3188-2642-4f8b-948b-78f220987e54
>
> (1 rows)
>
>
> Tracing session: 42bfdbe0-9fe3-11e4-9522-35de4dc20d00
>
>  activity
>
>                                                                      |
> timestamp    | source       | source_elapsed
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+--------------+----------------
>
>
>                                                   execute_cql3_query |
> 19:28:35,423 | 10.76.214.80 |              0
>  Parsing select * from usertable_cache WHERE user_id =
> 7bf16edf-b552-40f4-94ac-87b2e878d8c2  and dept_id =
> de3ac44f-2078-4321-a47c-de96c615d40d and location_id = 'ABC4:2' LIMIT
> 10000; | 19:28:35,423 | 10.76.214.80 |             56
>
>
>                                                  Preparing statement |
> 19:28:35,423 | 10.76.214.80 |            147
>
>
>                                                        Row cache hit |
> 19:28:35,425 | 10.76.214.80 |           2530
>
>
>                                   Read 1 live and 0 tombstoned cells |
> 19:28:35,425 | 10.76.214.80 |           2574
>
>
>                                                     Request complete |
> 19:28:35,425 | 10.76.214.80 |           2943
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jan 19, 2015 at 6:25 PM, nitin padalia <pa...@gmail.com>
> wrote:
> > My question is specifically for row cache?  As in cassandra 2.1.2 when I
> > populate a Column Family with 1000 rows for a partition and
> > rows_per_partition setting is 1000 for the Column Family then for first
> and
> > last row, it says cache miss.. if I mention specific row key in query?
> If I
> > increase rows_per_partition to 1002 then it is HIT for all.
> >
> > On Jan 19, 2015 2:17 PM, "nitin padalia" <pa...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> Does Cassandra fetches complete partition if I include Cluster key in
> >> where clause.
> >>
> >> Or What is the difference in:
> >> 1. Select * from column_family where partition_key = 'somekey' limit 1;
> >> 2. Select * from column_family where partition_key = 'somekey' and
> >> clustering_key = 'some_clustring_key';
> >>
> >>
> >>
> >> Thanks! in advance.
> >> Nitin Padalia
>
>
>
> --
> Nitin Padalia
> 9999256157
>

Re: Cassandra fetches complete partition

Posted by nitin padalia <pa...@gmail.com>.
e.g.
CREATE TABLE usertable_cache (
  user_id uuid,
  dept_id uuid,
  location_id text,
  locationmap_id uuid,
  PRIMARY KEY ((user_id, dept_id), location_id)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='{"keys":"ALL", "rows_per_partition":"1000"}' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.000000 AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};



select * from usertable_cache WHERE user_id =
7bf16edf-b552-40f4-94ac-87b2e878d8c2  and dept_id
=de3ac44f-2078-4321-a47c-de96c615d40d and location_id = 'ABC4:1';

 user_id                              | dept_id
      | location_id   | locationmap_id
--------------------------------------+--------------------------------------+---------------+--------------------------------------
 7bf16edf-b552-40f4-94ac-87b2e878d8c2 |
de3ac44f-2078-4321-a47c-de96c615d40d |        ABC4:1 |
32b97639-ea5b-427f-8c27-8a5016e2ad6e

(1 rows)


Tracing session: c40f9ba0-9fe2-11e4-9522-35de4dc20d00

 activity

                                                                 |
timestamp    | source       | source_elapsed
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+--------------+----------------


                                              execute_cql3_query |
19:25:02,875 | 10.76.214.80 |              0
 Parsing select * from usertable_cache WHERE user_id =
7bf16edf-b552-40f4-94ac-87b2e878d8c2  and dept_id =
de3ac44f-2078-4321-a47c-de96c615d40d and location_id = 'ABC4:1 LIMIT
10000; | 19:25:02,875 | 10.76.214.80 |             60


                                             Preparing statement |
19:25:02,875 | 10.76.214.80 |            157


      Ignoring row cache as cached value could not satisfy query |
19:25:02,879 | 10.76.214.80 |           3668


Executing single-partition query on userobjectid_by_extn_uri_10k_cache
| 19:25:02,879 | 10.76.214.80 |           3690


                                    Acquiring sstable references |
19:25:02,879 | 10.76.214.80 |           3700


                                     Merging memtable tombstones |
19:25:02,879 | 10.76.214.80 |           3755


                                     Key cache hit for sstable 3 |
19:25:02,879 | 10.76.214.80 |           4264


               Seeking to partition indexed section in data file |
19:25:02,879 | 10.76.214.80 |           4276

                                                              Skipped
0/1 non-slice-intersecting sstables, included 0 due to tombstones |
19:25:02,879 | 10.76.214.80 |           4324


                      Merging data from memtables and 1 sstables |
19:25:02,879 | 10.76.214.80 |           4337


                              Read 1 live and 0 tombstoned cells |
19:25:02,883 | 10.76.214.80 |           7596


                                                Request complete |
19:25:02,883 | 10.76.214.80 |           8263



select * from usertable_cache WHERE user_id =
7bf16edf-b552-40f4-94ac-87b2e878d8c2  and dept_id =
de3ac44f-2078-4321-a47c-de96c615d40d and location_id = 'ABC4:2';

 user_id                              | dept_id
      | location_id   | locationmap_id
--------------------------------------+--------------------------------------+---------------+--------------------------------------
 7bf16edf-b552-40f4-94ac-87b2e878d8c2 |
de3ac44f-2078-4321-a47c-de96c615d40d |        ABC4:2 |
1ddf3188-2642-4f8b-948b-78f220987e54

(1 rows)


Tracing session: 42bfdbe0-9fe3-11e4-9522-35de4dc20d00

 activity

                                                                     |
timestamp    | source       | source_elapsed
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+--------------+----------------


                                                  execute_cql3_query |
19:28:35,423 | 10.76.214.80 |              0
 Parsing select * from usertable_cache WHERE user_id =
7bf16edf-b552-40f4-94ac-87b2e878d8c2  and dept_id =
de3ac44f-2078-4321-a47c-de96c615d40d and location_id = 'ABC4:2' LIMIT
10000; | 19:28:35,423 | 10.76.214.80 |             56


                                                 Preparing statement |
19:28:35,423 | 10.76.214.80 |            147


                                                       Row cache hit |
19:28:35,425 | 10.76.214.80 |           2530


                                  Read 1 live and 0 tombstoned cells |
19:28:35,425 | 10.76.214.80 |           2574


                                                    Request complete |
19:28:35,425 | 10.76.214.80 |           2943











On Mon, Jan 19, 2015 at 6:25 PM, nitin padalia <pa...@gmail.com> wrote:
> My question is specifically for row cache?  As in cassandra 2.1.2 when I
> populate a Column Family with 1000 rows for a partition and
> rows_per_partition setting is 1000 for the Column Family then for first and
> last row, it says cache miss.. if I mention specific row key in query? If I
> increase rows_per_partition to 1002 then it is HIT for all.
>
> On Jan 19, 2015 2:17 PM, "nitin padalia" <pa...@gmail.com> wrote:
>>
>> Hi,
>>
>> Does Cassandra fetches complete partition if I include Cluster key in
>> where clause.
>>
>> Or What is the difference in:
>> 1. Select * from column_family where partition_key = 'somekey' limit 1;
>> 2. Select * from column_family where partition_key = 'somekey' and
>> clustering_key = 'some_clustring_key';
>>
>>
>>
>> Thanks! in advance.
>> Nitin Padalia



-- 
Nitin Padalia
9999256157

Re: Cassandra fetches complete partition

Posted by nitin padalia <pa...@gmail.com>.
My question is specifically for row cache?  As in cassandra 2.1.2 when I
populate a Column Family with 1000 rows for a partition and
rows_per_partition setting is 1000 for the Column Family then for first and
last row, it says cache miss.. if I mention specific row key in query? If I
increase rows_per_partition to 1002 then it is HIT for all.
On Jan 19, 2015 2:17 PM, "nitin padalia" <pa...@gmail.com> wrote:

> Hi,
>
> Does Cassandra fetches complete partition if I include Cluster key in
> where clause.
>
> Or What is the difference in:
> 1. Select * from column_family where partition_key = 'somekey' limit 1;
> 2. Select * from column_family where partition_key = 'somekey' and
> clustering_key = 'some_clustring_key';
>
>
>
> Thanks! in advance.
> Nitin Padalia
>