You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ben Frank (JIRA)" <ji...@apache.org> on 2012/08/31 01:04:08 UTC

[jira] [Created] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Ben Frank created CASSANDRA-4593:
------------------------------------

             Summary: Reading the ByteBuffer key from a map job causes an infinite fetch loop
                 Key: CASSANDRA-4593
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
             Project: Cassandra
          Issue Type: Bug
          Components: Hadoop
    Affects Versions: 1.1.2
            Reporter: Ben Frank
            Priority: Critical


Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Posted by "Ben Frank (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445432#comment-13445432 ] 

Ben Frank commented on CASSANDRA-4593:
--------------------------------------

Fair enough, I was just doing a byteBuffer.getLong() which I know resets the position but I didn't really consider it destructive. I can't believe I'll be the only person caught out by this, is there some documentation I've missed, or a relevant wiki page I should update with this information?
                
> Reading the ByteBuffer key from a map job causes an infinite fetch loop
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4593
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.2
>            Reporter: Ben Frank
>         Attachments: cassandra-1.1-4593.txt
>
>
> Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Posted by "Ben Frank (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445404#comment-13445404 ] 

Ben Frank edited comment on CASSANDRA-4593 at 8/31/12 10:12 AM:
----------------------------------------------------------------

patch against the cassandra-1.1 branch attached.
This does a mark on the buffer, saves off the token value and then resets the buffer to back to the mark. Downstream users then aren't able to effect the operation of the iterator. 


                
      was (Author: airlust):
    patch against the cassandra-1.1 branch
                  
> Reading the ByteBuffer key from a map job causes an infinite fetch loop
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4593
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.2
>            Reporter: Ben Frank
>         Attachments: cassandra-1.1-4593.txt
>
>
> Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Posted by "Ben Frank (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Frank updated CASSANDRA-4593:
---------------------------------

    Priority: Major  (was: Critical)
    
> Reading the ByteBuffer key from a map job causes an infinite fetch loop
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4593
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.2
>            Reporter: Ben Frank
>
> Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445627#comment-13445627 ] 

Jeremy Hanna commented on CASSANDRA-4593:
-----------------------------------------

It may be worth adding to the MapReduce or Troubleshooting section of http://wiki.apache.org/cassandra/HadoopSupport.  We were bitten by something like this at a previous job and it was hard to track down.
                
> Reading the ByteBuffer key from a map job causes an infinite fetch loop
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4593
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.2
>            Reporter: Ben Frank
>         Attachments: cassandra-1.1-4593.txt
>
>
> Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445438#comment-13445438 ] 

Jonathan Ellis commented on CASSANDRA-4593:
-------------------------------------------

Not really.  We mostly intend CFRR to be used by Pig and Hive and not manually.
                
> Reading the ByteBuffer key from a map job causes an infinite fetch loop
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4593
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.2
>            Reporter: Ben Frank
>         Attachments: cassandra-1.1-4593.txt
>
>
> Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Posted by "Ben Frank (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445708#comment-13445708 ] 

Ben Frank commented on CASSANDRA-4593:
--------------------------------------

I'll add it as a gotcha to that page, thanks for pointing it out. 

Jonathan, you really think this isn't worth being defensive about in the RecordReader? Seems like it presents an unclear api which is also hard to document; since there isn't really a natural point to put it in javadoc or anything. Doesn't seem like there is really any detriment to doing this or something like it. looking at the code history it appears as if this has been toyed with but not fully implemented. 
                
> Reading the ByteBuffer key from a map job causes an infinite fetch loop
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4593
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.2
>            Reporter: Ben Frank
>         Attachments: cassandra-1.1-4593.txt
>
>
> Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-4593.
---------------------------------------

    Resolution: Invalid

You're not free to destructively mutate the key buffer.  Use positional reads or duplicate() it.
                
> Reading the ByteBuffer key from a map job causes an infinite fetch loop
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4593
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.2
>            Reporter: Ben Frank
>         Attachments: cassandra-1.1-4593.txt
>
>
> Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

Posted by "Ben Frank (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Frank updated CASSANDRA-4593:
---------------------------------

    Attachment: cassandra-1.1-4593.txt

patch against the cassandra-1.1 branch
                
> Reading the ByteBuffer key from a map job causes an infinite fetch loop
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4593
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.2
>            Reporter: Ben Frank
>         Attachments: cassandra-1.1-4593.txt
>
>
> Reading the ByteBuffer key from a map job empties the buffer. One of these key buffers is later used in ColumnFamilyRecordReader to figure out the last token that was received, then using that as a start point to fetch more rows. With a now empty buffer, the token defaults to the start of the range and thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira