You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Brandon Williams (Created) (JIRA)" <ji...@apache.org> on 2012/02/09 17:15:59 UTC

[jira] [Created] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

CFIF WideRowIterator only returns batch size columns
----------------------------------------------------

                 Key: CASSANDRA-3883
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
             Project: Cassandra
          Issue Type: Bug
          Components: Hadoop
    Affects Versions: 1.1.0
            Reporter: Brandon Williams
             Fix For: 1.1.0


Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206888#comment-13206888 ] 

Brandon Williams commented on CASSANDRA-3883:
---------------------------------------------

My original description here is incorrect; I can't repro the 198 count (not sure what happened there) but now the wide row tests counts 1033 'word1' items.  As far as I can tell, WordCountSetup actually inserts a total of 2002 'word1' matches, one in each of text1 and text2, and a thousand in each of text3 and text4.  I'm not sure what is causing the count discrepancy, but in any case 1033 is far above the batch size of 99, and and the 4th word count test using a secondary index is counting 197 items, so I think something may be fundamentally wrong with word count.

That said, I've been adding wide row support to pig and testing with that, and the problem of not being able to completely paginate wide rows is a definite problem.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207843#comment-13207843 ] 

Brandon Williams commented on CASSANDRA-3883:
---------------------------------------------

There's no sane way to do this with get_paged_slice as it currently is.  We can do the extra rpc to determine if we're at the end of a row, but then we can end up in an ugly situation where there's only one or two more columns outside of the batch size, but when we slice those on the next iteration those are the only columns we can return because our slice predicate is invalid for anything else; even if we happen to get a full batch back.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224486#comment-13224486 ] 

Sylvain Lebresne commented on CASSANDRA-3883:
---------------------------------------------

How would get_paged_slice need to be to make that sane, and is there any short term solution to get this fixed for 1.1.0?
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251703#comment-13251703 ] 

Brandon Williams commented on CASSANDRA-3883:
---------------------------------------------

LGTM, +1
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt, 3883-v2.txt, 3883-v3.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249013#comment-13249013 ] 

Jonathan Ellis commented on CASSANDRA-3883:
-------------------------------------------

bq. Optimally, we'd have a way to express "I'm at this column offset in this row, give me the next X number of columns, even if it requires going to the next row." But I'm not sure how to do that sanely, either

What if we allowed mixing (start key, end token) in KeyRange?  Wouldn't that fix it?

- 1st get_paged_slice call: ((start token, end token), empty start column) from slice
- subsequent get_paged_slice calls: ((last row key, end token), last column name)

                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207055#comment-13207055 ] 

Jonathan Ellis commented on CASSANDRA-3883:
-------------------------------------------

bq. Unfortunately if we don't start on one, I'm not sure if there's a way to detect that we're in a wide row without making an extra rpc against the last row seen every time.

If we can easily address this w/ some extra logic in get_paged_slice then great, otherwise doing one extra rpc call out of (split size * rows per split) doesn't seem like a big deal to me.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3883:
--------------------------------------

    Attachment: 3883-v3.txt

v3 to avoid double-counting the startColumn.  also cleans up lastRow cruftiness a bit.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt, 3883-v2.txt, 3883-v3.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3883:
--------------------------------------

    Reviewer: tjake
    Assignee: Brandon Williams
    
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207055#comment-13207055 ] 

Jonathan Ellis edited comment on CASSANDRA-3883 at 2/13/12 6:41 PM:
--------------------------------------------------------------------

bq. Unfortunately if we don't start on one, I'm not sure if there's a way to detect that we're in a wide row without making an extra rpc against the last row seen every time.

If we can easily address this w/ some extra logic in get_paged_slice then great, otherwise doing one extra rpc call out of (split size * pages per row in split) doesn't seem like a big deal to me.
                
      was (Author: jbellis):
    bq. Unfortunately if we don't start on one, I'm not sure if there's a way to detect that we're in a wide row without making an extra rpc against the last row seen every time.

If we can easily address this w/ some extra logic in get_paged_slice then great, otherwise doing one extra rpc call out of (split size * rows per split) doesn't seem like a big deal to me.
                  
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251625#comment-13251625 ] 

Jonathan Ellis commented on CASSANDRA-3883:
-------------------------------------------

https://github.com/jbellis/cassandra/branches/3883-6 is up, with CASSANDRA-4136 incorporated.  the results look good for the word_count test, as posted on 4136.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt, 3883-v2.txt, 3883-v3.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249019#comment-13249019 ] 

Brandon Williams commented on CASSANDRA-3883:
---------------------------------------------

That sounds like it could work.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3883:
--------------------------------------

    Attachment: 3883-v2.txt

v2 attached w/ that approach
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt, 3883-v2.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-3883:
----------------------------------------

    Assignee:     (was: Brandon Williams)
    
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251071#comment-13251071 ] 

Jonathan Ellis commented on CASSANDRA-3883:
-------------------------------------------

Latest is at https://github.com/jbellis/cassandra/branches/3383-5. I think it's working now, except for being blocked by CASSANDRA-4136.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt, 3883-v2.txt, 3883-v3.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224622#comment-13224622 ] 

Brandon Williams commented on CASSANDRA-3883:
---------------------------------------------

Optimally, we'd have a way to express "I'm at this column offset in this row, give me the next X number of columns, even if it requires going to the next row."  But I'm not sure how to do that sanely, either.  I know Jake is using a special CFIF for hive to handle wide rows that basically just grabs one row at a time and paginates it, which is fine if all the rows are wide, but will take a performance hit if they are not.  Still, that might be the best thing to do since using get_page_slices is currently so hairy.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3883:
--------------------------------------

    Reviewer: brandon.williams  (was: tjake)
    Assignee: Jonathan Ellis
    
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt, 3883-v2.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-3883:
----------------------------------------

    Attachment: 3883-v1.txt

v1 isn't perfect but it's a start; if the batch starts on a wide row, we reuse the token and iterate until we're done.  Unfortunately if we don't start on one, I'm not sure if there's a way to detect that we're in a wide row without making an extra rpc against the last row seen every time.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-3883.
---------------------------------------

    Resolution: Fixed

committed
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt, 3883-v2.txt, 3883-v3.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Jonathan Ellis (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251625#comment-13251625 ] 

Jonathan Ellis edited comment on CASSANDRA-3883 at 4/11/12 2:29 PM:
--------------------------------------------------------------------

https://github.com/jbellis/cassandra/branches/3883-6 is up, with CASSANDRA-4136 incorporated.  the results look good for the word_count test, as posted on 4136.

(the other minor change with -6 is adding conf/ to the classpath for log4j.)
                
      was (Author: jbellis):
    https://github.com/jbellis/cassandra/branches/3883-6 is up, with CASSANDRA-4136 incorporated.  the results look good for the word_count test, as posted on 4136.
                  
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 1.1.0
>
>         Attachments: 3883-v1.txt, 3883-v2.txt, 3883-v3.txt
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206502#comment-13206502 ] 

Brandon Williams commented on CASSANDRA-3883:
---------------------------------------------

At least one problem here is that after retrieving the first page of a row, we set the startToken to the token of the last row:
{noformat}
startToken = partitioner.getTokenFactory().toString(partitioner.getToken(lastRow.key));
{noformat}

which is exclusive and thus advances to the next row, without completely paging through the first.
                
> CFIF WideRowIterator only returns batch size columns
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3883
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Brandon Williams
>             Fix For: 1.1.0
>
>
> Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira