You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefania (JIRA)" <ji...@apache.org> on 2015/07/22 10:13:05 UTC

[jira] [Commented] (CASSANDRA-8180) Optimize disk seek using min/max column name meta data when the LIMIT clause is used

    [ https://issues.apache.org/jira/browse/CASSANDRA-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636477#comment-14636477 ] 

Stefania commented on CASSANDRA-8180:
-------------------------------------

I have attached the latest patch based on trunk, which is ready for review.

The main things to note are:

* In MergeIterator we need to discard fake lower bound values and we must advance the same iterator to check that there isn't another real value with the exact same value, in which case we must pass it to the reducer together with the other equal values from other iterators. However, we cannot just discard fake values too soon, i.e. in ManyToOne.advance() we cannot just advance as long as we have fake values, we need to wait for the heap to be sorted first. I implemented a [peek method|https://github.com/stef1927/cassandra/commit/9528e672ea65c7a71c0004adfc27e5f4d9ee0acb#diff-a9e2c345aa605d1b8d360b4c44ade32f] in the candidate, this is called for fake values when reducing values. This should not affect the correctness of the algorithm but there may be a more efficient way to do this, cc [~benedict] and [~blambov]. 

* We need to decide if we are happy with a wrapping iterator, [LowerBoundUnfilteredRowIterator|https://github.com/stef1927/cassandra/commit/9528e672ea65c7a71c0004adfc27e5f4d9ee0acb#diff-1cdf42ebc69336015e04f287e8450e51], in which case we may need a better name. I understand that wrapping too many iterators may hurt performance so we may want to look into modifying AbstractSSTableIterator directly, even though this might be a bit more work. I also wrap the merged iterator to make sure we update the metrics of iterated tables in the close method, when we know how many tables were iterated. It would be nice to at least save this wrapped iterator.

* We need a reliable way to signal a fake Unfiltered object (the lower bound). I used an [empty row|https://github.com/stef1927/cassandra/commit/9528e672ea65c7a71c0004adfc27e5f4d9ee0acb#diff-3e7088b7213c9faaf80e18dadaaa6929] but perhaps we should use a new specialization for Unfiltered or something else.

[~slebresne] are you happy to still be the reviewer or do you want to suggest someone else?


> Optimize disk seek using min/max column name meta data when the LIMIT clause is used
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8180
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 2.0.10
>            Reporter: DOAN DuyHai
>            Assignee: Stefania
>            Priority: Minor
>             Fix For: 3.x
>
>
> I was working on an example of sensor data table (timeseries) and face a use case where C* does not optimize read on disk.
> {code}
> cqlsh:test> CREATE TABLE test(id int, col int, val text, PRIMARY KEY(id,col)) WITH CLUSTERING ORDER BY (col DESC);
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 10, '10');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 20, '20');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 30, '30');
> ...
> >nodetool flush test test
> {code}
> After that, I activate request tracing:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 LIMIT 1;
>  activity                                                                  | timestamp    | source    | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
>                                                         execute_cql3_query | 23:48:46,498 | 127.0.0.1 |              0
>                             Parsing SELECT * FROM test WHERE id=1 LIMIT 1; | 23:48:46,498 | 127.0.0.1 |             74
>                                                        Preparing statement | 23:48:46,499 | 127.0.0.1 |            253
>                                   Executing single-partition query on test | 23:48:46,499 | 127.0.0.1 |            930
>                                               Acquiring sstable references | 23:48:46,499 | 127.0.0.1 |            943
>                                                Merging memtable tombstones | 23:48:46,499 | 127.0.0.1 |           1032
>                                                Key cache hit for sstable 3 | 23:48:46,500 | 127.0.0.1 |           1160
>                                Seeking to partition beginning in data file | 23:48:46,500 | 127.0.0.1 |           1173
>                                                Key cache hit for sstable 2 | 23:48:46,500 | 127.0.0.1 |           1889
>                                Seeking to partition beginning in data file | 23:48:46,500 | 127.0.0.1 |           1901
>                                                Key cache hit for sstable 1 | 23:48:46,501 | 127.0.0.1 |           2373
>                                Seeking to partition beginning in data file | 23:48:46,501 | 127.0.0.1 |           2384
>  Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 23:48:46,501 | 127.0.0.1 |           2768
>                                 Merging data from memtables and 3 sstables | 23:48:46,501 | 127.0.0.1 |           2784
>                                         Read 2 live and 0 tombstoned cells | 23:48:46,501 | 127.0.0.1 |           2976
>                                                           Request complete | 23:48:46,501 | 127.0.0.1 |           3551
> {code}
> We can clearly see that C* hits 3 SSTables on disk instead of just one, although it has the min/max column meta data to decide which SSTable contains the most recent data.
> Funny enough, if we add a clause on the clustering column to the select, this time C* optimizes the read path:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1;
>  activity                                                                  | timestamp    | source    | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
>                                                         execute_cql3_query | 23:52:31,888 | 127.0.0.1 |              0
>                Parsing SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1; | 23:52:31,888 | 127.0.0.1 |             60
>                                                        Preparing statement | 23:52:31,888 | 127.0.0.1 |            277
>                                   Executing single-partition query on test | 23:52:31,889 | 127.0.0.1 |            961
>                                               Acquiring sstable references | 23:52:31,889 | 127.0.0.1 |            971
>                                                Merging memtable tombstones | 23:52:31,889 | 127.0.0.1 |           1020
>                                                Key cache hit for sstable 3 | 23:52:31,889 | 127.0.0.1 |           1108
>                                Seeking to partition beginning in data file | 23:52:31,889 | 127.0.0.1 |           1117
>  Skipped 2/3 non-slice-intersecting sstables, included 0 due to tombstones | 23:52:31,889 | 127.0.0.1 |           1611
>                                 Merging data from memtables and 1 sstables | 23:52:31,890 | 127.0.0.1 |           1624
>                                         Read 1 live and 0 tombstoned cells | 23:52:31,890 | 127.0.0.1 |           1700
>                                                           Request complete | 23:52:31,890 | 127.0.0.1 |           2140
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)