You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org> on 2012/02/01 02:54:56 UTC

[jira] [Commented] (CASSANDRA-3820) Columns missing after upgrade from 0.8.5 to 1.0.7.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197496#comment-13197496 ] 

Peter Schuller commented on CASSANDRA-3820:
-------------------------------------------

Check whether the .bf files contain all zeroes above roughly 235 mb or so. If you have lots of rows, your BF will be that large.

We encountered a bug internally whereby all bloom filters larger than 2^31 bits were large on disk, but everything afger the first 2^31 bits were all zeroes.

Unfortunately I don't know whether this is specific to patches made to our branch, and I have been so busy I haven't been able to follow up to figure out whether it affects the upstream version.

But - just "tail -c 1000 | hexdump". If you only have zeroes, this is the bug. Make sure to tail on a large .bf file (take the largest, easiest).


                
> Columns missing after upgrade from 0.8.5 to 1.0.7.
> --------------------------------------------------
>
>                 Key: CASSANDRA-3820
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3820
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.0.7
>            Reporter: Jason Harvey
>
> After an upgrade, one of our CFs had a lot of rows with missing columns. I've been able to reproduce in test conditions. Working on getting the tables to DataStax(data is private).
> 0.8 results:
> {code}
> [default@reddit] get CommentVote[36353467625f63333837336f32];
> => (column=date, value=313332333932323930392e3531, timestamp=1323922909506508)
> => (column=ip, value=REDACTED, timestamp=1327048432717348, ttl=2592000)
> => (column=name, value=31, timestamp=1327048433000740)
> => (column=REDACTED, value=30, timestamp=1323922909506432)
> => (column=thing1_id, value=REDACTED, timestamp=1323922909506475)
> => (column=thing2_id, value=REDACTED, timestamp=1323922909506486)
> => (column=REDACTED, value=31, timestamp=1323922909506518)
> => (column=REDACTED, value=30, timestamp=1323922909506497)
> {code}
> 1.0 results:
> {code}
> [default@reddit] get CommentVote[36353467625f63333837336f32];
> => (column=ip, value=REDACTED, timestamp=1327048432717348, ttl=2592000)
> => (column=name, value=31, timestamp=1327048433000740)
> {code}
> A few notes:
> * The rows with missing data were fully restored after scrubbing the sstables.
> * The row which I reproduced on happened to be split across multiple sstables.
> * When I copied the first sstable I found the row on, I was able to 'list' rows from the sstable, but any and all 'get' calls failed.
> * These SStables were natively created on 0.8.5; they did not come from any previous upgrade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira