You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/07/31 22:35:15 UTC

[jira] Created: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
-----------------------------------------------------------------------

                 Key: CASSANDRA-332
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
             Project: Cassandra
          Issue Type: Sub-task
            Reporter: Jonathan Ellis


use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740336#action_12740336 ] 

Jun Rao commented on CASSANDRA-332:
-----------------------------------

Since the first column and the last column are always in the index, isn't the following enough?

if (finish < indexes[0].firstcolumn || start > indexes[-1].firstcolumn) 

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740887#action_12740887 ] 

Hudson commented on CASSANDRA-332:
----------------------------------

Integrated in Cassandra #161 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/161/])
    final cleanup of SSTableSliceIterator, now with less unnecessary copying of Columns around
patch by jbellis; reviewed by Jun Rao for 
avoid copying variables to ColumnGroupReader that it can get from the parent class.  rename underscored variables.
patch by jbellis; reviewed by Jun Rao for 
move comparator out of IndexInfo
patch by jbellis; reviewed by Jun Rao for 
always write at least one index (with first and last column of the range) for the columns in row.  this vastly simplifies column reading code and makes indexing bugs much more obvious (since there is only one read path each for names / slices now).
patch by jbellis; reviewed by Jun Rao for 
don't serialize unused column count into column index.  remove DataInput/Output round-tripping from ColumnGroupReader
patch by jbellis; reviewed by Jun Rao for 
add test for multi-block reversal.
patch by jbellis; reviewed by Jun Rao for 


> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739896#action_12739896 ] 

Jun Rao commented on CASSANDRA-332:
-----------------------------------

Comments:

1. IndexHelper.IndexInfo: it's not clear to me why both firstName and LastName are needed.

2. In SSTableNamesIterator, after calling IndexHelper.IndexFor(), it seems that you need to check if the index is out of range (in particular, when index == indexList.size())

3. In TableTest.testGetSliceFromLarge(), now that you made the value smaller, you probably need to increase the number of columns inserted to make sure multiple column index entries are created.



> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740028#action_12740028 ] 

Jonathan Ellis edited comment on CASSANDRA-332 at 8/6/09 5:38 AM:
------------------------------------------------------------------

(thanks for the Wednesday night review :)

      was (Author: jbellis):
    (thanks for the Friday night review :)
  
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740137#action_12740137 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

1. if you don't have both the beginning and end of a the block in the index it doesn't work, since you have a question mark at one side or the other.  getting rid of count isn't about space savings, it's about simplifying the code -- there was a _lot_ of lines to updating and passing count around as you probably noticed.

2. ok, will fix

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740244#action_12740244 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

attached patches for 1 and 2 to CASSANDRA-351

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740599#action_12740599 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

ah, a bug:

        if (comparator.compare(indexList.get(indexList.size() - 1).firstName, column.name()) != 0)

should be

        if (comparator.compare(indexList.get(indexList.size() - 1).lastName, column.name()) != 0)

otherwise it could get indexed twice.

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740264#action_12740264 ] 

Jun Rao commented on CASSANDRA-332:
-----------------------------------

Patches in #351 look good to me. I still don't see how the actual use of both firstname and lastname in indexInfo though. Since this affects storage format, I'd like to see that we get it right sooner than later.

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740307#action_12740307 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

like I said, for the range scan check you will write

if (finish < indexes[0].firstcolumn || start > indexes[-1].lastcolumn) 

so you need both firstcolumn and lastcolumn; desc scan also needs both.

also in patch 4 for CASSANDRA-351 I added

                if (comparator.compare(name, indexInfo.firstName) < 0)
                   continue;

as a similar "see if we can skip the actual scan" check.

(this won't get hit as often b/c of the bloom filter...  of course we haven't really decided if that's going to stay or not yet.  that's CASSANDRA-325 for those not following -commits.)

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Greene updated CASSANDRA-332:
-------------------------------------

    Component/s: Core

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739635#action_12739635 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

06
    final cleanup of SSTableSliceIterator, now with less unnecessary copying of Columns around

05
    avoid copying variables to ColumnGroupReader that it can get from the parent class.  rename underscored variables

04
    move comparator out of IndexInfo

03
    always write indexes for at least first and last columns in row.  this vastly simplifies column reading code
    and makes indexing bugs much more obvious (since there is only one read path each for names / slices now).

02
    don't serialize unused column count into column index.  remove DataInput/Output round-tripping from ColumnGroupRe

01
    CASSANDRA-332 add test for multi-block reversal


> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740096#action_12740096 ] 

Jun Rao commented on CASSANDRA-332:
-----------------------------------

1. Since both the first and the last column are always in the index, can't you just check the firstcolumn? Column name takes more space than count. I'd rather not duplicate it in the index entry.

2. If you check the Java doc, the insertionPoint can be collection.size() when the search key is larger than the last element. When index==0 or index===indexList.size(), you can avoid deserializing the column index since the searched column is guaranteed not to be found.


> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740167#action_12740167 ] 

Jun Rao commented on CASSANDRA-332:
-----------------------------------

1. Do you want to add the TODO into this patch?

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-332:
-------------------------------------

    Issue Type: Improvement  (was: Sub-task)
        Parent:     (was: CASSANDRA-330)

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740027#action_12740027 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

1) that relates to this         // TODO push finishColumn down here too, so we can tell when we're done and optimize away the slice when the index + start/stop shows there's nothing to scan for

the check will look something like

if (finish < indexes[0].firstcolumn || start > indexes[-1].lastcolumn)

+ the special casing of ascending of course.

2) my understanding is that the java binary search will never yield index == indexList.size().  or did you mean something else?

3) ok, will fix

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740347#action_12740347 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

Nope.  If the last index block contains Y and Z, for instance, then firstcolumn=Y and lastcolumn=Z.  then if you slice with start=Z that would give a false negative from the second term.

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740028#action_12740028 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

(thanks for the Friday night review :)

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740594#action_12740594 ] 

Jun Rao commented on CASSANDRA-332:
-----------------------------------

This can't happen. Since the last index entry is always the last column (in a row), indexes[-1].firstcolumn will be Z. The test using just firstcolumn should be correct.

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740597#action_12740597 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

> the last index entry is always the last column

no, it has to _include_ the last column, but it won't always just be the last column.

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-332:
-------------------------------------

    Attachment: 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
                0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt
                0004-move-comparator-out-of-IndexInfo.txt
                0003-always-write-indexes-for-at-least-first-and-last-colum.txt
                0002-don-t-serialize-unused-column-count-into-column-index.txt
                0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740214#action_12740214 ] 

Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------

It's not on my priority list for 0.4, hence the TODO.  I let the scope creep far enough on this ticket already. :)  It can be done any time with the existing (post-332) disk format, so I think it's reasonable to leave it for another day.  Created CASSANDRA-350 to track it.

> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-332
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-332
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.