You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/07/31 22:35:15 UTC
[jira] Created: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
-----------------------------------------------------------------------
Key: CASSANDRA-332
URL: https://issues.apache.org/jira/browse/CASSANDRA-332
Project: Cassandra
Issue Type: Sub-task
Reporter: Jonathan Ellis
use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740336#action_12740336 ]
Jun Rao commented on CASSANDRA-332:
-----------------------------------
Since the first column and the last column are always in the index, isn't the following enough?
if (finish < indexes[0].firstcolumn || start > indexes[-1].firstcolumn)
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740887#action_12740887 ]
Hudson commented on CASSANDRA-332:
----------------------------------
Integrated in Cassandra #161 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/161/])
final cleanup of SSTableSliceIterator, now with less unnecessary copying of Columns around
patch by jbellis; reviewed by Jun Rao for
avoid copying variables to ColumnGroupReader that it can get from the parent class. rename underscored variables.
patch by jbellis; reviewed by Jun Rao for
move comparator out of IndexInfo
patch by jbellis; reviewed by Jun Rao for
always write at least one index (with first and last column of the range) for the columns in row. this vastly simplifies column reading code and makes indexing bugs much more obvious (since there is only one read path each for names / slices now).
patch by jbellis; reviewed by Jun Rao for
don't serialize unused column count into column index. remove DataInput/Output round-tripping from ColumnGroupReader
patch by jbellis; reviewed by Jun Rao for
add test for multi-block reversal.
patch by jbellis; reviewed by Jun Rao for
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739896#action_12739896 ]
Jun Rao commented on CASSANDRA-332:
-----------------------------------
Comments:
1. IndexHelper.IndexInfo: it's not clear to me why both firstName and LastName are needed.
2. In SSTableNamesIterator, after calling IndexHelper.IndexFor(), it seems that you need to check if the index is out of range (in particular, when index == indexList.size())
3. In TableTest.testGetSliceFromLarge(), now that you made the value smaller, you probably need to increase the number of columns inserted to make sure multiple column index entries are created.
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (CASSANDRA-332) Clean up
SSTableSliceIterator to not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740028#action_12740028 ]
Jonathan Ellis edited comment on CASSANDRA-332 at 8/6/09 5:38 AM:
------------------------------------------------------------------
(thanks for the Wednesday night review :)
was (Author: jbellis):
(thanks for the Friday night review :)
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740137#action_12740137 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
1. if you don't have both the beginning and end of a the block in the index it doesn't work, since you have a question mark at one side or the other. getting rid of count isn't about space savings, it's about simplifying the code -- there was a _lot_ of lines to updating and passing count around as you probably noticed.
2. ok, will fix
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740244#action_12740244 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
attached patches for 1 and 2 to CASSANDRA-351
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740599#action_12740599 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
ah, a bug:
if (comparator.compare(indexList.get(indexList.size() - 1).firstName, column.name()) != 0)
should be
if (comparator.compare(indexList.get(indexList.size() - 1).lastName, column.name()) != 0)
otherwise it could get indexed twice.
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740264#action_12740264 ]
Jun Rao commented on CASSANDRA-332:
-----------------------------------
Patches in #351 look good to me. I still don't see how the actual use of both firstname and lastname in indexInfo though. Since this affects storage format, I'd like to see that we get it right sooner than later.
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740307#action_12740307 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
like I said, for the range scan check you will write
if (finish < indexes[0].firstcolumn || start > indexes[-1].lastcolumn)
so you need both firstcolumn and lastcolumn; desc scan also needs both.
also in patch 4 for CASSANDRA-351 I added
if (comparator.compare(name, indexInfo.firstName) < 0)
continue;
as a similar "see if we can skip the actual scan" check.
(this won't get hit as often b/c of the bloom filter... of course we haven't really decided if that's going to stay or not yet. that's CASSANDRA-325 for those not following -commits.)
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Greene updated CASSANDRA-332:
-------------------------------------
Component/s: Core
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739635#action_12739635 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
06
final cleanup of SSTableSliceIterator, now with less unnecessary copying of Columns around
05
avoid copying variables to ColumnGroupReader that it can get from the parent class. rename underscored variables
04
move comparator out of IndexInfo
03
always write indexes for at least first and last columns in row. this vastly simplifies column reading code
and makes indexing bugs much more obvious (since there is only one read path each for names / slices now).
02
don't serialize unused column count into column index. remove DataInput/Output round-tripping from ColumnGroupRe
01
CASSANDRA-332 add test for multi-block reversal
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740096#action_12740096 ]
Jun Rao commented on CASSANDRA-332:
-----------------------------------
1. Since both the first and the last column are always in the index, can't you just check the firstcolumn? Column name takes more space than count. I'd rather not duplicate it in the index entry.
2. If you check the Java doc, the insertionPoint can be collection.size() when the search key is larger than the last element. When index==0 or index===indexList.size(), you can avoid deserializing the column index since the searched column is guaranteed not to be found.
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740167#action_12740167 ]
Jun Rao commented on CASSANDRA-332:
-----------------------------------
1. Do you want to add the TODO into this patch?
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-332:
-------------------------------------
Issue Type: Improvement (was: Sub-task)
Parent: (was: CASSANDRA-330)
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740027#action_12740027 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
1) that relates to this // TODO push finishColumn down here too, so we can tell when we're done and optimize away the slice when the index + start/stop shows there's nothing to scan for
the check will look something like
if (finish < indexes[0].firstcolumn || start > indexes[-1].lastcolumn)
+ the special casing of ascending of course.
2) my understanding is that the java binary search will never yield index == indexList.size(). or did you mean something else?
3) ok, will fix
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740347#action_12740347 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
Nope. If the last index block contains Y and Z, for instance, then firstcolumn=Y and lastcolumn=Z. then if you slice with start=Z that would give a false negative from the second term.
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740028#action_12740028 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
(thanks for the Friday night review :)
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740594#action_12740594 ]
Jun Rao commented on CASSANDRA-332:
-----------------------------------
This can't happen. Since the last index entry is always the last column (in a row), indexes[-1].firstcolumn will be Z. The test using just firstcolumn should be correct.
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740597#action_12740597 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
> the last index entry is always the last column
no, it has to _include_ the last column, but it won't always just be the last column.
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-332:
-------------------------------------
Attachment: 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt
0004-move-comparator-out-of-IndexInfo.txt
0003-always-write-indexes-for-at-least-first-and-last-colum.txt
0002-don-t-serialize-unused-column-count-into-column-index.txt
0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-332) Clean up SSTableSliceIterator to
not echo data around DataOutput/Inputs
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740214#action_12740214 ]
Jonathan Ellis commented on CASSANDRA-332:
------------------------------------------
It's not on my priority list for 0.4, hence the TODO. I let the scope creep far enough on this ticket already. :) It can be done any time with the existing (post-332) disk format, so I think it's reasonable to leave it for another day. Created CASSANDRA-350 to track it.
> Clean up SSTableSliceIterator to not echo data around DataOutput/Inputs
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-332
> URL: https://issues.apache.org/jira/browse/CASSANDRA-332
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 0001-CASSANDRA-332-add-test-for-multi-block-reversal.txt, 0002-don-t-serialize-unused-column-count-into-column-index.txt, 0003-always-write-indexes-for-at-least-first-and-last-colum.txt, 0004-move-comparator-out-of-IndexInfo.txt, 0005-avoid-copying-variables-to-ColumnGroupReader-that-it-c.txt, 0006-final-cleanup-of-SSTableSliceIterator-now-with-less-u.txt
>
>
> use CFSerializer.deserializeEmpty and then read the columns as necessary, similar to what was done for SSTableNamesIterator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.