You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yasuharu Goto (JIRA)" <ji...@apache.org> on 2017/01/19 01:55:26 UTC

[jira] [Commented] (CASSANDRA-13125) Duplicate rows after upgrading from 2.1.16 to 3.0.10/3.9

    [ https://issues.apache.org/jira/browse/CASSANDRA-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829145#comment-15829145 ] 

Yasuharu Goto commented on CASSANDRA-13125:
-------------------------------------------

h2. Investigations...

After some debugging, I found interesting difference in serialized RangeTombstoneLists between 2.1.16 and 3.0.10.

- I ran 3 Cassandra nodes with some debug prints.
-- 127.0.0.1 (C* 3.0.10)
-- 127.0.0.2 (C* 2.1.16)
-- 127.0.0.3 (C* 2.1.16)
- They have a keyspace and a table already created.
-- CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
-- CREATE TABLE test.test ( a int PRIMARY KEY, b int, c set<int>, d set<int>, e int )
- And I query a same INSERT (which mutation is sent to 127.0.0.2) query from 127.0.0.1(C*3.0) and 127.0.0.3(C*2.1) and see the difference.

Insert a row from 127.0.0.1 and scan. ( inserted (a=14) row is broken)
{code:sql}
cqlsh> insert into test.test(a,b,c,d,e) values(14,1,{2,3},{4,5},6);
cqlsh> select * from test.test;

 a  | b    | c      | d      | e
----+------+--------+--------+------
 14 |    1 |   null |   null | null
 14 | null | {2, 3} | {4, 5} |    6

(2 rows)
{code}

And then, I insert from 127.0.0.3 and scan.  (neither a=5 nor a=14 are broken)
{code:sql}
cqlsh> insert into test.test(a,b,c,d,e)values(5,1,{2,3},{4,5},6);
cqlsh> select * from test.test;

 a  | b | c      | d      | e
----+---+--------+--------+---
  5 | 1 | {2, 3} | {4, 5} | 6
 14 | 1 | {2, 3} | {4, 5} | 6
{code}

And back to 127.0.0.1 and scan the table. a=14 is broken but a=5 is not.
{code:sql}
cqlsh> select * from test.test;

 a  | b    | c      | d      | e
----+------+--------+--------+------
  5 |    1 | {2, 3} | {4, 5} |    6
 14 |    1 |   null |   null | null
 14 | null | {2, 3} | {4, 5} |    6
{code}

Therefore,It looks like that "C*3 can't scan properly rows that is stored in C*2 but inserted from C*3.";

Next, I observed some incoming MUTATIONs in 127.0.0.2 like below. I saw that C*3.0 sent RangeTombstones like {{[c-c],[c-d]}}, but C*2.1 sent {{[c:_-c],[d:_-d]}}.

{noformat}
> insert into test.test(a,b,c,d,e) values(14,1,{2,3},{4,5},6); from 127.0.0.1

DeletionInfo:{deletedAt=-9223372036854775808, localDeletion=2147483647, ranges=[c-c:!, deletedAt=1484710273390930, localDeletion=1484710273][c-d:!, deletedAt=1484710273390930, localDeletion=1484710273]}
from:/127.0.0.1, payload:Mutation(keyspace='test', key='0000000e', modifications=[ColumnFamily(test -{deletedAt=-9223372036854775808, localDeletion=2147483647, ranges=[c-c:!, deletedAt=1484710273390930, localDeletion=1484710273][c-d:!, deletedAt=1484710273390930, localDeletion=1484710273]}- [:false:0@1484710273390931,b:false:4@1484710273390931,c:00000002:false:0@1484710273390931,c:00000003:false:0@1484710273390931,d:00000004:false:0@1484710273390931,d:00000005:false:0@1484710273390931,e:false:4@1484710273390931,])]), verb:MUTATION, version:8

> insert into test.test(a,b,c,d,e) values(14,1,{2,3},{4,5},6); from 127.0.0.3
DeletionInfo:{deletedAt=-9223372036854775808, localDeletion=2147483647, ranges=[c:_-c:!, deletedAt=1484710277987556, localDeletion=1484710277][d:_-d:!, deletedAt=1484710277987556, localDeletion=1484710277]}
from:/127.0.0.3, payload:Mutation(keyspace='test', key='0000000e', modifications=[ColumnFamily(test -{deletedAt=-9223372036854775808, localDeletion=2147483647, ranges=[c:_-c:!, deletedAt=1484710277987556, localDeletion=1484710277][d:_-d:!, deletedAt=1484710277987556, localDeletion=1484710277]}- [:false:0@1484710277987557,b:false:4@1484710277987557,c:00000002:false:0@1484710277987557,c:00000003:false:0@1484710277987557,d:00000004:false:0@1484710277987557,d:00000005:false:0@1484710277987557,e:false:4@1484710277987557,])]), verb:MUTATION, version:8
{noformat}

h2. Workaround Plan-A

But, LegacyRangeTombstone remove {{collectionName}} from RangeTombStone which start.bound != end.bound like {{[c-d]}}
https://github.com/apache/cassandra/blob/cassandra-3.0.10/src/java/org/apache/cassandra/db/LegacyLayout.java#L1592-L1599
It seems like that this deletions of collectionName corrupt the unmarshal of legacy tombstone. After I commentized these else-if block, I could scan the table correctly.


{code:java}
            if ((start.collectionName == null) != (stop.collectionName == null))
            {
                if (start.collectionName == null)
                    stop = new LegacyBound(stop.bound, stop.isStatic, null);
                else
                    start = new LegacyBound(start.bound, start.isStatic, null);
            }
            /*else if (!Objects.equals(start.collectionName, stop.collectionName))
            {
                // We're in the similar but slightly more complex case where on top of the big tombstone
                // A, we have 2 (or more) collection tombstones B and C within A. So we also end up with
                // a tombstone that goes between the end of B and the start of C.
                start = new LegacyBound(start.bound, start.isStatic, null);
                stop = new LegacyBound(stop.bound, stop.isStatic, null);
            }
            */
{code}

{noformat}
cqlsh> select * from test.test;

 a  | b | c      | d      | e
----+---+--------+--------+---
  5 | 1 | {2, 3} | {4, 5} | 6
 14 | 1 | {2, 3} | {4, 5} | 6
{noformat}

h2. Workaround Plan-B
Instead of modify the LegacyLayout unmarshal code, commentizing the following line fixed the problem too. It changes the TombStoneRange which is serialized  by LegacyLayout from {{[c-c][c-d]}} to {{[c-c][d-d]}}.

https://github.com/apache/cassandra/blob/cassandra-3.0.10/src/java/org/apache/cassandra/db/LegacyLayout.java#L2099
{code:java}
//                     start = ends[i];
{code}


{noformat}

DeletionInfo:{deletedAt=-9223372036854775808, localDeletion=2147483647, ranges=[c-c:!, deletedAt=1484715120458008, localDeletion=1484715120][d-d:!, deletedAt=1484715120458008, localDeletion=1484715120]}
from:/127.0.0.1, payload:Mutation(keyspace='test', key='0000000e', modifications=[ColumnFamily(test -{deletedAt=-9223372036854775808, localDeletion=2147483647, ranges=[c-c:!, deletedAt=1484715120458008, localDeletion=1484715120][d-d:!, deletedAt=1484715120458008, localDeletion=1484715120]}- [:false:0@1484715120458009,b:false:4@1484715120458009,c:00000002:false:0@1484715120458009,c:00000003:false:0@1484715120458009,d:00000004:false:0@1484715120458009,d:00000005:false:0@1484715120458009,e:false:4@1484715120458009,])]), verb:MUTATION, version:8
{noformat}

I'm not sure if my solution cause any unexpected effects. But I attach my patches for reference.
Could anyboody please review my patch?


> Duplicate rows after upgrading from 2.1.16 to 3.0.10/3.9
> --------------------------------------------------------
>
>                 Key: CASSANDRA-13125
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13125
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Zhongxiang Zheng
>
> I found that rows are splitting and duplicated after upgrading the cluster from 2.1.x to 3.0.x.
> I found the way to reproduce the problem as below.
> {code}
> $ ccm create test -v 2.1.16 -n 3 -s                                                                               
> Current cluster is now: test
> $ ccm node1 cqlsh  -e "CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}"
> $ ccm node1 cqlsh -e "CREATE TABLE test.test (id text PRIMARY KEY, value1 set<text>, value2 set<text>);"
> # Upgrade node1
> $ for i in 1; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm node${i} start;ccm node${i} nodetool upgradesstables; done
> # Insert a row through node1(3.0.10)
> $ ccm node1 cqlsh -e "INSERT INTO test.test (id, value1, value2) values ('aaa', {'aaa', 'bbb'}, {'ccc', 'ddd'});"                       
> # Insert a row through node2(2.1.16)
> $ ccm node2 cqlsh -e "INSERT INTO test.test (id, value1, value2) values ('bbb', {'aaa', 'bbb'}, {'ccc', 'ddd'});" 
> # The row inserted from node1 is splitting
> $ ccm node1 cqlsh -e "SELECT * FROM test.test ;"
>  id  | value1         | value2
> -----+----------------+----------------
>  aaa |           null |           null
>  aaa | {'aaa', 'bbb'} | {'ccc', 'ddd'}
>  bbb | {'aaa', 'bbb'} | {'ccc', 'ddd'}
> $ for i in 1 2; do ccm node${i} nodetool flush; done
> # Results of sstable2json of node2. The row inserted from node1(3.0.10) is different from the row inserted from node2(2.1.16).
> $ ccm node2 json -k test -c test
> running
> ['/home/zzheng/.ccm/test/node2/data0/test/test-5406ee80dbdb11e6a175f57c4c7c85f3/test-test-ka-1-Data.db']
> -- test-test-ka-1-Data.db -----
> [
> {"key": "aaa",
>  "cells": [["","",1484564624769577],
>            ["value1","value2:!",1484564624769576,"t",1484564624],
>            ["value1:616161","",1484564624769577],
>            ["value1:626262","",1484564624769577],
>            ["value2:636363","",1484564624769577],
>            ["value2:646464","",1484564624769577]]},
> {"key": "bbb",
>  "cells": [["","",1484564634508029],
>            ["value1:_","value1:!",1484564634508028,"t",1484564634],
>            ["value1:616161","",1484564634508029],
>            ["value1:626262","",1484564634508029],
>            ["value2:_","value2:!",1484564634508028,"t",1484564634],
>            ["value2:636363","",1484564634508029],
>            ["value2:646464","",1484564634508029]]}
> ]
> # Upgrade node2,3
> $ for i in `seq 2 3`; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm node${i} start;ccm node${i} nodetool upgradesstables; done
> # After upgrade node2,3, the row inserted from node1 is splitting in node2,3
> $ ccm node2 cqlsh -e "SELECT * FROM test.test ;"                                                                                        
>  id  | value1         | value2
> -----+----------------+----------------
>  aaa |           null |           null
>  aaa | {'aaa', 'bbb'} | {'ccc', 'ddd'}
>  bbb | {'aaa', 'bbb'} | {'ccc', 'ddd'}
> (3 rows)
> # Results of sstabledump
> # node1
> [
>   {
>     "partition" : {
>       "key" : [ "aaa" ],
>       "position" : 0
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 17,
>         "liveness_info" : { "tstamp" : "2017-01-16T11:03:44.769577Z" },
>         "cells" : [
>           { "name" : "value1", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:44.769576Z", "local_delete_time" : "2017-01-16T11:03:44Z" } },
>           { "name" : "value1", "path" : [ "aaa" ], "value" : "" },
>           { "name" : "value1", "path" : [ "bbb" ], "value" : "" },
>           { "name" : "value2", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:44.769576Z", "local_delete_time" : "2017-01-16T11:03:44Z" } },
>           { "name" : "value2", "path" : [ "ccc" ], "value" : "" },
>           { "name" : "value2", "path" : [ "ddd" ], "value" : "" }
>         ]
>       }
>     ]
>   },
>   {
>     "partition" : {
>       "key" : [ "bbb" ],
>       "position" : 48
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 65,
>         "liveness_info" : { "tstamp" : "2017-01-16T11:03:54.508029Z" },
>         "cells" : [
>           { "name" : "value1", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } },
>           { "name" : "value1", "path" : [ "aaa" ], "value" : "" },
>           { "name" : "value1", "path" : [ "bbb" ], "value" : "" },
>           { "name" : "value2", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } },
>           { "name" : "value2", "path" : [ "ccc" ], "value" : "" },
>           { "name" : "value2", "path" : [ "ddd" ], "value" : "" }
>         ]
>       }
>     ]
>   }
> ]                                                                                                                                                    
> # node2
> [
>   {
>     "partition" : {
>       "key" : [ "aaa" ],
>       "position" : 0
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 17,
>         "liveness_info" : { "tstamp" : "2017-01-16T11:03:44.769577Z" },
>         "cells" : [ ]
>       },
>       {
>         "type" : "row",
>         "position" : 22,
>         "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:44.769576Z", "local_delete_time" : "2017-01-16T11:03:44Z" },
>         "cells" : [
>           { "name" : "value1", "path" : [ "aaa" ], "value" : "", "tstamp" : "2017-01-16T11:03:44.769577Z" },
>           { "name" : "value1", "path" : [ "bbb" ], "value" : "", "tstamp" : "2017-01-16T11:03:44.769577Z" },
>           { "name" : "value2", "path" : [ "ccc" ], "value" : "", "tstamp" : "2017-01-16T11:03:44.769577Z" },
>           { "name" : "value2", "path" : [ "ddd" ], "value" : "", "tstamp" : "2017-01-16T11:03:44.769577Z" }
>         ]
>       }
>     ]
>   },
>   {
>     "partition" : {
>       "key" : [ "bbb" ],
>       "position" : 57
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 74,
>         "liveness_info" : { "tstamp" : "2017-01-16T11:03:54.508029Z" },
>         "cells" : [
>           { "name" : "value1", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } },
>           { "name" : "value1", "path" : [ "aaa" ], "value" : "" },
>           { "name" : "value1", "path" : [ "bbb" ], "value" : "" },
>           { "name" : "value2", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } },
>           { "name" : "value2", "path" : [ "ccc" ], "value" : "" },
>           { "name" : "value2", "path" : [ "ddd" ], "value" : "" }
>         ]
>       }
>     ]
>   }
> ]
> {code}
> Another example of row splitting is as follows.
> {code}
> $ ccm create test2 -v 2.1.16 -n 3 -s                                                                                                    
> Current cluster is now: test2
> $ ccm node1 cqlsh  -e "CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}"                      
> $ ccm node1 cqlsh -e "CREATE TABLE test.text_set_set (id text PRIMARY KEY, value1 text, value2 set<text>, value3 set<text>);"           
> $ for i in `seq 1`; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm node${i} start;ccm node${i} nodetool upgradesstables; done  
> $ ccm node1 cqlsh -e "INSERT INTO test.text_set_set (id, value1, value2, value3) values ('aaa', 'aaa', {'aaa', 'bbb'}, {'ccc', 'ddd'});"
> $ ccm node1 cqlsh -e "SELECT * FROM test.text_set_set;"                                                                                 
>  id  | value1 | value2         | value3
> -----+--------+----------------+----------------
>  aaa |    aaa |           null |           null
>  aaa |   null | {'aaa', 'bbb'} | {'ccc', 'ddd'}
> (2 rows)
> {code}
> As far as I investigated, the occurrence conditions are as follows.
> * Table schema contains multiple collections.
> * Insert a row, which values of the collection column are not null through 3.x node while both 2.1 and 3.x nodes exist in a cluster.
> * Rows in sstables of node which version was 2.1 at the time the row was inserted is splitting after upgrading to 3.x.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)