You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2012/10/11 21:03:03 UTC

[jira] [Resolved] (CASSANDRA-4789) CassandraStorage.getNextWide produces corrupt data

     [ https://issues.apache.org/jira/browse/CASSANDRA-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams resolved CASSANDRA-4789.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1.6
         Reviewer: brandon.williams

Committed the relevant portion of this with formatting fixes.  I'll leave the PIG_WIDEROW_INPUT fix to CASSANDRA-4749, though it's needed to test this patch.  Thanks, Will!  I know this function is especially tricky.
                
> CassandraStorage.getNextWide produces corrupt data
> --------------------------------------------------
>
>                 Key: CASSANDRA-4789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4789
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.5
>            Reporter: Will Oberman
>            Assignee: Brandon Williams
>             Fix For: 1.1.6
>
>         Attachments: patch.txt, patch.txt
>
>
> This took me a while to track down.  I'm seeing the problem when the "key changes" case happens.  The intended behavior (as far as I can tell) when the key changes is the method returns the current tuple, and picks up where it left off on the next call to getNextWide().  The problem I'm seeing is the sometimes the current key advances between method calls, sometimes not.  "Not" being the correct behavior, since the code is saving the value into an instance variable, but when the key advances there is a key/value mismatch (the result being the values for two different keys are being glued together).  I think the problem might be related to keys that only have a single column???  I'm still trying to track that down to help assist in solving this case...
> Maybe this will be clearer from me pasting a bunch of logging I added to the class.  The log messages are fairly self documenting (I hope):  
> ...lots of previous logging...
> enter getNextWide
> hasNext = true
> set key = dVNhbXAxMzQ3ODM1OA%3D%3D
> lastRow != null
> added 1 items to bag from lastRow
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> key changed, new key = 669392df09572d0045b964bc65f86a2c
> exit getNextWide
> enter getNextWide
> hasNext = true
> //!!!THIS IS THE PROBLEM HERE I THINK!!!
> //!!!Usually the key here == key before "exit getNextWide"!!!
> set key = 5f900ee4bb1850f8cf387cc3d5fc23ca
> //!!! lastRow is data for 669392df09572d0045b964bc65f86a2c !!! 
> //!!! but it's being added to key 5f900ee4bb1850f8cf387cc3d5fc23ca !!!
> lastRow != null
> added 1 items to bag from lastRow
> //!!! Here are the real values for 5f900ee4bb1850f8cf387cc3d5fc23ca !!!
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> key changed, new key = 50438549-cdb6-8c44-f93a-d18d7daeffd8
> exit getNextWide
> enter getNextWide
> hasNext = true
> set key = 50438549-cdb6-8c44-f93a-d18d7daeffd8

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira