You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Mateusz Korniak (JIRA)" <ji...@apache.org> on 2011/01/15 22:48:46 UTC

[jira] Created: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Bootstrap breaks data stored (missing rows, extra rows, column values modified)
-------------------------------------------------------------------------------

                 Key: CASSANDRA-1992
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.0
         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
glibc-2.12-4.i686
java-sun-1.6.0.22-1.i686

            Reporter: Mateusz Korniak


Scenario:
Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
Start first one.
Run data inserting program [1],  run again in verify mode - all data intact.
Bootstrap 2nd node.
Run verification again, now it fails.

Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.

I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
Any hints ?
Thanks in advance, regards.

[1] simple program generating data and later verifying data.
http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py

[2] Logs from 1st node:
http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log

[3] Logs from 2nd (bootstraping node)
http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983861#action_12983861 ] 

Hudson commented on CASSANDRA-1992:
-----------------------------------

Integrated in Cassandra-0.7 #177 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/177/])
    fix streaming of multiple CFs during bootstrap
patch by brandonwilliams; reviewed by jbellis for CASSANDRA-1992


> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>         Attachments: 1992.txt
>
>   Original Estimate: 8h
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1992:
--------------------------------------

         Fix Version/s: 0.7.1
    Remaining Estimate: 8h
     Original Estimate: 8h

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Jeffrey Damick (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989202#comment-12989202 ] 

Jeffrey Damick commented on CASSANDRA-1992:
-------------------------------------------

is there is any way to repair the problem without deleting all of my data? (shutting down and bringing back up did not solve the problem)

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>         Attachments: 1992.txt
>
>   Original Estimate: 8h
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Issue Comment Edited: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Ivo Ladage-van Doorn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982561#action_12982561 ] 

Ivo Ladage-van Doorn edited comment on CASSANDRA-1992 at 1/17/11 4:34 AM:
--------------------------------------------------------------------------

I have the exact same problem with an existing installation and was preparing to create an issue for it, but found this issue just before creating it. I'll describe the issue I have, maybe that provides some relevant information.

I ran into this issue with Cassandra 0.7 trying to add just one node to an existing one-node cluster. The existing node contains already some data when the second node is added to the cluster. This is what I did:

Setup
I have two nodes both running on Linux; a server called 'veers' on 172.16.2.203 and a 'r2d2' on 172.16.2.206. I use Cassandra 0.7 and only change the following settings in the cassandra.yaml and log4j-server.properties (I use the default values for all other entries):

In cassandra.yaml:

initial_token: 0
data_file_directories: /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.203
rpc_address: 172.16.2.203

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the first node and connect it using cassandra-cli. I add the following keyspace, column families and rows:

create keyspace Default;
use Default;

create column family Role;
set Role['user_1']['name'] = 'User 1';
set Role['user_2']['name'] = 'User 2';
set Role['user_3']['name'] = 'User 3';

create column family Gadget;
set Gadget['gadget_1']['name'] = 'Gadget 1';
set Gadget['gadget_2']['name'] = 'Gadget 2';
set Gadget['gadget_3']['name'] = 'Gadget 3';

After this 'list Role' and 'list Gadget' return the proper rows.

Now I append a second node to the cluster, with this configuration:

In cassandra.yaml:

initial_token:
auto_bootstrap: true
data_file_directories: /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.206
rpc_address: 172.16.2.206

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the second node. Bootstrapping takes some time, about 2 minutes in total but finishes without any warnings or errors:

...
INFO [main] 2011-01-17 09:58:09,170 StorageService.java (line 399) Joining: getting load information
INFO [main] 2011-01-17 09:58:09,171 StorageLoadBalancer.java (line 366) Sleeping 90000 ms to wait for load information...
INFO [GossipStage:1] 2011-01-17 09:58:10,447 Gossiper.java (line 577) Node /172.16.2.203 is now part of the cluster
INFO [HintedHandoff:1] 2011-01-17 09:58:11,451 HintedHandOffManager.java (line 192) Started hinted handoff for endpoint /172.16.2.203
INFO [GossipStage:1] 2011-01-17 09:58:11,451 Gossiper.java (line 569) InetAddress /172.16.2.203 is now UP
INFO [HintedHandoff:1] 2011-01-17 09:58:11,453 HintedHandOffManager.java (line 248) Finished hinted handoff of 0 rows to endpoint /172.16.2.203
INFO [main] 2011-01-17 09:59:39,189 StorageService.java (line 399) Joining: getting bootstrap token
INFO [main] 2011-01-17 09:59:39,203 BootStrapper.java (line 148) New token will be 110533280274756817580689726417060138498 to assume load from /172.16.2.203
INFO [main] 2011-01-17 09:59:39,265 StorageService.java (line 399) Joining: sleeping 30000 ms for pending range setup
INFO [main] 2011-01-17 10:00:09,272 StorageService.java (line 399) Bootstrapping
INFO [main] 2011-01-17 10:00:09,663 CassandraDaemon.java (line 77) Binding thrift service to /172.16.2.206:9160
INFO [main] 2011-01-17 10:00:09,666 CassandraDaemon.java (line 91) Using TFramedTransport with a max frame size of 15728640 bytes.
INFO [main] 2011-01-17 10:00:09,671 CassandraDaemon.java (line 119) Listening for thrift clients...

Although everything seemed to worked just fine, when node 2 is completely finished bootstrapping the rows in the 'Role' and 'Gadget' Column Families are messed up;

list Role;

-------------------
RowKey: user_3
=> (column=6e616d65, value=557365722033, timestamp=1295254678545000)

1 Row Returned.


list Gadget;

-------------------
RowKey: user_2
=> (column=6e616d65, value=557365722032, timestamp=1295254678514000)
-------------------
RowKey: gadget_2
=> (column=6e616d65, value=4761646765742032, timestamp=1295254678805000)
-------------------
RowKey: gadget_3
=> (column=6e616d65, value=4761646765742033, timestamp=1295254679429000)
-------------------
RowKey: gadget_1
=> (column=6e616d65, value=4761646765742031, timestamp=1295254678771000)
-------------------
RowKey: user_1
=> (column=6e616d65, value=557365722031, timestamp=1295254678449000)

5 Rows Returned.

So 2 rows have been moved from CF 'Role' to 'Gadget', just by adding a node to the cluster. The actual result differs each time I try, but always some rows have been moved to some other CF. The problem seems the same as the one described by Mateusz.

I also found out that restarting the nodes seems to 'fix' the issue. Also changing the replication factor from 1 to 2 most of the times 'resolves' the issue.

      was (Author: ivol):
    I have the exact same problem with an existing installation and was preparing to create an issue for it, but found this issue just before creating it. I'll describe the issue I have, maybe that provides some relevant information.

I ran into this issue with Cassandra 0.7 trying to add just one node to an existing one-node cluster. The existing node contains already some data when the second node is added to the cluster. This is what I did:

Setup
I have two nodes both running on Linux; a server called 'veers' on 172.16.2.203 and a 'r2d2' on 172.16.2.206. I use Cassandra 0.7 and only change the following settings in the cassandra.yaml and log4j-server.properties (I use the default values for all other entries):

In cassandra.yaml:

initial_token: 0
data_file_directories:
    - /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.203
rpc_address: 172.16.2.203

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the first node and connect it using cassandra-cli. I add the following keyspace, column families and rows:

create keyspace Default;
use Default;

create column family Role;
set Role['user_1']['name'] = 'User 1';
set Role['user_2']['name'] = 'User 2';
set Role['user_3']['name'] = 'User 3';

create column family Gadget;
set Gadget['gadget_1']['name'] = 'Gadget 1';
set Gadget['gadget_2']['name'] = 'Gadget 2';
set Gadget['gadget_3']['name'] = 'Gadget 3';

After this 'list Role' and 'list Gadget' return the proper rows.

Now I append a second node to the cluster, with this configuration:

In cassandra.yaml:

initial_token:
auto_bootstrap: true
data_file_directories:
    - /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.206
rpc_address: 172.16.2.206

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the second node. Bootstrapping takes some time, about 2 minutes in total but finishes without any warnings or errors:

...
INFO [main] 2011-01-17 09:58:09,170 StorageService.java (line 399) Joining: getting load information
INFO [main] 2011-01-17 09:58:09,171 StorageLoadBalancer.java (line 366) Sleeping 90000 ms to wait for load information...
INFO [GossipStage:1] 2011-01-17 09:58:10,447 Gossiper.java (line 577) Node /172.16.2.203 is now part of the cluster
INFO [HintedHandoff:1] 2011-01-17 09:58:11,451 HintedHandOffManager.java (line 192) Started hinted handoff for endpoint /172.16.2.203
INFO [GossipStage:1] 2011-01-17 09:58:11,451 Gossiper.java (line 569) InetAddress /172.16.2.203 is now UP
INFO [HintedHandoff:1] 2011-01-17 09:58:11,453 HintedHandOffManager.java (line 248) Finished hinted handoff of 0 rows to endpoint /172.16.2.203
INFO [main] 2011-01-17 09:59:39,189 StorageService.java (line 399) Joining: getting bootstrap token
INFO [main] 2011-01-17 09:59:39,203 BootStrapper.java (line 148) New token will be 110533280274756817580689726417060138498 to assume load from /172.16.2.203
INFO [main] 2011-01-17 09:59:39,265 StorageService.java (line 399) Joining: sleeping 30000 ms for pending range setup
INFO [main] 2011-01-17 10:00:09,272 StorageService.java (line 399) Bootstrapping
INFO [main] 2011-01-17 10:00:09,663 CassandraDaemon.java (line 77) Binding thrift service to /172.16.2.206:9160
INFO [main] 2011-01-17 10:00:09,666 CassandraDaemon.java (line 91) Using TFramedTransport with a max frame size of 15728640 bytes.
INFO [main] 2011-01-17 10:00:09,671 CassandraDaemon.java (line 119) Listening for thrift clients...

Although everything seemed to worked just fine, when node 2 is completely finished bootstrapping the rows in the 'Role' and 'Gadget' Column Families are messed up;

list Role;

-------------------
RowKey: user_3
=> (column=6e616d65, value=557365722033, timestamp=1295254678545000)

1 Row Returned.


list Gadget;

-------------------
RowKey: user_2
=> (column=6e616d65, value=557365722032, timestamp=1295254678514000)
-------------------
RowKey: gadget_2
=> (column=6e616d65, value=4761646765742032, timestamp=1295254678805000)
-------------------
RowKey: gadget_3
=> (column=6e616d65, value=4761646765742033, timestamp=1295254679429000)
-------------------
RowKey: gadget_1
=> (column=6e616d65, value=4761646765742031, timestamp=1295254678771000)
-------------------
RowKey: user_1
=> (column=6e616d65, value=557365722031, timestamp=1295254678449000)

5 Rows Returned.

So 2 rows have been moved from CF 'Role' to 'Gadget', just by adding a node to the cluster. The actual result differs each time I try, but always some rows have been moved to some other CF. The problem seems the same as the one described by Mateusz.

I also found out that restarting the nodes seems to 'fix' the issue. Also changing the replication factor from 1 to 2 most of the times 'resolves' the issue.
  
> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982350#action_12982350 ] 

Brandon Williams commented on CASSANDRA-1992:
---------------------------------------------

Since the issue appears to be a missing row, can you reproduce with contrib/py_stress?

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Mateusz Korniak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982381#action_12982381 ] 

Mateusz Korniak commented on CASSANDRA-1992:
--------------------------------------------

And again, restarting first node, cuts number of missing row by more or less half, restarting 2nd node cures all missing rows.

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Mateusz Korniak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982288#action_12982288 ] 

Mateusz Korniak commented on CASSANDRA-1992:
--------------------------------------------

Another thing: 
By luck, I discovered that restarting one or both nodes makes data to be served again intact.
To avoid picking token 2nd node  randomness I set it explicitly to 85070591730234615865843651857942052864
.
So scenario is now:
Clean pycassa installs with tokens set to 0 and 85070591730234615865843651857942052864.

Starting first node [1], uploading data, verifing ok, ring:
192.168.3.4     Up     Normal  59.02 KB        100.00% 0

Bootstraping 2nd node, waiting to finish streaming, data verification bad:
column_family: 'CF0'
row_key: 'row=0 '
Traceback (most recent call last):
  File "test.py", line 36, in <module>
    loaded_cols_dict = current_cf.get(row_key)
  File "/usr/share/python2.7/site-packages/pycassa/columnfamily.py", line 362, in new_f
  File "/usr/share/python2.7/site-packages/pycassa/columnfamily.py", line 429, in get
pycassa.cassandra.ttypes.NotFoundException: NotFoundException()
Final ring (same on both nodes):
192.168.3.4     Up     Normal  199.28 KB       50.00%  0
192.168.3.8     Up     Normal  135.7 KB        50.00%  85070591730234615865843651857942052864

Restaring 192.168.3.4, data _same_ _error_ as above, ring changes to:
192.168.3.4     Up     Normal  201.51 KB       50.00%  0
192.168.3.8     Up     Normal  135.7 KB        50.00%  85070591730234615865843651857942052864

Restarting 192.168.3.8, data _verified_ _ok_ , ring changes on (same on both nodes) to:
192.168.3.4     Up     Normal  201.51 KB       50.00%  0
192.168.3.8     Up     Normal  145.8 KB        50.00%  85070591730234615865843651857942052864

[1] Logs from 1st 192.168.3.4 node:
http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/logs_with_restart/system-3.4.log

[2] Logs from 2nd 192.168.3.8 node:
http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/logs_with_restart/system-3.8.log

HIH, regards

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982684#action_12982684 ] 

Jonathan Ellis commented on CASSANDRA-1992:
-------------------------------------------

if restarting nodes fixes it, it sounds like the streamed data is not getting wired in correctly to the sstabletracker

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Mateusz Korniak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982379#action_12982379 ] 

Mateusz Korniak commented on CASSANDRA-1992:
--------------------------------------------

Brandon, yes and no ;).
Unable with original contrib/py_stress as it uses only one CF to do all tests. Most of my issues looks like missing row in 2nd CF or broken data in 2nd CF (like contents of 1st CF injected into 2nd CF).
I slightly modified contrib/py_stress so it created 3 standard CFs and 3 super CFs [1] and allows to select one wants to operate via --column_family_idx= switch and I can reproduce:

Starting 1st node.

$ python stress.py --nodes 192.168.3.8 --operation insert  --num-keys 100 --progress-interval 5  --keep-going --column_family_idx=1       
Created keyspaces.  Sleeping 1s for propagation.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
100,20,20,0.00823852300644,0
$ python stress.py --nodes 192.168.3.8 --operation insert  --num-keys 100 --progress-interval 5  --keep-going --column_family_idx=2 
Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
100,20,20,0.00132475852966,0
$ python stress.py --nodes 192.168.3.8 --operation insert  --num-keys 100 --progress-interval 5  --keep-going --column_family_idx=3
Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
100,20,20,0.00138550519943,0

Verification of data in each CF:
$ python stress.py --nodes 192.168.3.8 --operation read  --num-keys 100 --progress-interval 5  --keep-going --column_family_idx=3
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
100,20,20,0.00282711744308,0
$ python stress.py --nodes 192.168.3.8 --operation read  --num-keys 100 --progress-interval 5  --keep-going --column_family_idx=2
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
100,20,20,0.00149053096771,0
$ python stress.py --nodes 192.168.3.8 --operation read  --num-keys 100 --progress-interval 5  --keep-going --column_family_idx=1
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
100,20,20,0.00125009775162,0

Bootstrap 2nd node and now failure:
$ python stress.py --nodes 192.168.3.8 --operation read  --num-keys 100 --progress-interval 5  --keep-going --column_family_idx=1
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
100,20,20,0.00376108169556,0
$ python stress.py --nodes 192.168.3.8 --operation read  --num-keys 100 --progress-interval 5  --keep-going --column_family_idx=2
Key 074 not found
Key 061 not found
Key 047 not found
( cut 40 more Key 0xx not found)
Key 047 not found
Key 042 not found
Key 058 not found
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
Key 033 not found
100,20,20,0.00241538286209,0

Similar failure for 3rd CF.

[1]: Modified stress.py from 0.7.0 with --column_family_idx= added.
http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/stress.py


> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Mateusz Korniak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982283#action_12982283 ] 

Mateusz Korniak commented on CASSANDRA-1992:
--------------------------------------------

One more thing:
I repeated setup today, also reaching missing row(s) but 2nd node got different token 61078635599166706937511052402724559481
Following
http://wiki.apache.org/cassandra/Operations#Load_balancing ,
having 1st node token set to 0 and using RandomPartitioner, I would expect  2nd node toke to be set always in middle of token space to
850705917302346158658436518579420528
?



> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-1992:
----------------------------------------

    Attachment: 1992.txt

There were two bugs here in StreamInSession.  First, it was adding all streamed sstables to the last CFS it saw.  Secondly, secondary index generation was being performed against all sstables seen.  This patch switches from a scalar CFS and a separate list of all sstables to a hash of lists where the CFS is the key and the value is the sstables that belong to it.

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>         Attachments: 1992.txt
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983850#action_12983850 ] 

Nick Bailey commented on CASSANDRA-1992:
----------------------------------------

This would be a good test to have in the distributed test set up.

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>         Attachments: 1992.txt
>
>   Original Estimate: 8h
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982209#action_12982209 ] 

Stu Hood commented on CASSANDRA-1992:
-------------------------------------

Are both nodes reporting the same ring (via nodetool ring) after the bootstrap? The last entry in 3.8.log indicates that it thinks 3.4 is dead, but this might just be because you stopped the nodes before collecting the logs.

Also, what exactly is the error you get from your script?

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Mateusz Korniak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982181#action_12982181 ] 

Mateusz Korniak commented on CASSANDRA-1992:
--------------------------------------------

Not sure if it is important but:

After loading data ring looks like:
Address         Status State   Load            Owns    Token
192.168.3.4     Up     Normal  59.02 KB        100.00% 0

After bootstraping:
192.168.3.4     Up     Normal  199.28 KB       56.45%  0
192.168.3.8     Up     Normal  115.03 KB       43.55%  74091174110465149971373554442555361956

Load gets tripled on 1st node.

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Ivo Ladage-van Doorn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982561#action_12982561 ] 

Ivo Ladage-van Doorn commented on CASSANDRA-1992:
-------------------------------------------------

I have the exact same problem with an existing installation and was preparing to create an issue for it, but found this issue just before creating it. I'll describe the issue I have, maybe that provides some relevant information.

I ran into this issue with Cassandra 0.7 trying to add just one node to an existing one-node cluster. The existing node contains already some data when the second node is added to the cluster. This is what I did:

Setup
I have two nodes both running on Linux; a server called 'veers' on 172.16.2.203 and a 'r2d2' on 172.16.2.206. I use Cassandra 0.7 and only change the following settings in the cassandra.yaml and log4j-server.properties (I use the default values for all other entries):

In cassandra.yaml:

initial_token: 0
data_file_directories:
    - /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.203
rpc_address: 172.16.2.203

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the first node and connect it using cassandra-cli. I add the following keyspace, column families and rows:

create keyspace Default;
use Default;

create column family Role;
set Role['user_1']['name'] = 'User 1';
set Role['user_2']['name'] = 'User 2';
set Role['user_3']['name'] = 'User 3';

create column family Gadget;
set Gadget['gadget_1']['name'] = 'Gadget 1';
set Gadget['gadget_2']['name'] = 'Gadget 2';
set Gadget['gadget_3']['name'] = 'Gadget 3';

After this 'list Role' and 'list Gadget' return the proper rows.

Now I append a second node to the cluster, with this configuration:

In cassandra.yaml:

initial_token:
auto_bootstrap: true
data_file_directories:
    - /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.206
rpc_address: 172.16.2.206

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the second node. Bootstrapping takes some time, about 2 minutes in total but finishes without any warnings or errors:

...
INFO [main] 2011-01-17 09:58:09,170 StorageService.java (line 399) Joining: getting load information
INFO [main] 2011-01-17 09:58:09,171 StorageLoadBalancer.java (line 366) Sleeping 90000 ms to wait for load information...
INFO [GossipStage:1] 2011-01-17 09:58:10,447 Gossiper.java (line 577) Node /172.16.2.203 is now part of the cluster
INFO [HintedHandoff:1] 2011-01-17 09:58:11,451 HintedHandOffManager.java (line 192) Started hinted handoff for endpoint /172.16.2.203
INFO [GossipStage:1] 2011-01-17 09:58:11,451 Gossiper.java (line 569) InetAddress /172.16.2.203 is now UP
INFO [HintedHandoff:1] 2011-01-17 09:58:11,453 HintedHandOffManager.java (line 248) Finished hinted handoff of 0 rows to endpoint /172.16.2.203
INFO [main] 2011-01-17 09:59:39,189 StorageService.java (line 399) Joining: getting bootstrap token
INFO [main] 2011-01-17 09:59:39,203 BootStrapper.java (line 148) New token will be 110533280274756817580689726417060138498 to assume load from /172.16.2.203
INFO [main] 2011-01-17 09:59:39,265 StorageService.java (line 399) Joining: sleeping 30000 ms for pending range setup
INFO [main] 2011-01-17 10:00:09,272 StorageService.java (line 399) Bootstrapping
INFO [main] 2011-01-17 10:00:09,663 CassandraDaemon.java (line 77) Binding thrift service to /172.16.2.206:9160
INFO [main] 2011-01-17 10:00:09,666 CassandraDaemon.java (line 91) Using TFramedTransport with a max frame size of 15728640 bytes.
INFO [main] 2011-01-17 10:00:09,671 CassandraDaemon.java (line 119) Listening for thrift clients...

Although everything seemed to worked just fine, when node 2 is completely finished bootstrapping the rows in the 'Role' and 'Gadget' Column Families are messed up;

list Role;

-------------------
RowKey: user_3
=> (column=6e616d65, value=557365722033, timestamp=1295254678545000)

1 Row Returned.


list Gadget;

-------------------
RowKey: user_2
=> (column=6e616d65, value=557365722032, timestamp=1295254678514000)
-------------------
RowKey: gadget_2
=> (column=6e616d65, value=4761646765742032, timestamp=1295254678805000)
-------------------
RowKey: gadget_3
=> (column=6e616d65, value=4761646765742033, timestamp=1295254679429000)
-------------------
RowKey: gadget_1
=> (column=6e616d65, value=4761646765742031, timestamp=1295254678771000)
-------------------
RowKey: user_1
=> (column=6e616d65, value=557365722031, timestamp=1295254678449000)

5 Rows Returned.

So 2 rows have been moved from CF 'Role' to 'Gadget', just by adding a node to the cluster. The actual result differs each time I try, but always some rows have been moved to some other CF. The problem seems the same as the one described by Mateusz.

I also found out that restarting the nodes seems to 'fix' the issue. Also changing the replication factor from 1 to 2 most of the times 'resolves' the issue.

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Mateusz Korniak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982277#action_12982277 ] 

Mateusz Korniak commented on CASSANDRA-1992:
--------------------------------------------

Stu,
Yes both nodes show same via nodetool ring: 
192.168.3.4 Up Normal 199.28 KB 56.45% 0
192.168.3.8 Up Normal 115.03 KB 43.55% 74091174110465149971373554442555361956

You are right, I stopped cluster by stopping 3.4 first. (Anyway it would be good to store information of gracefull shutdown of node in system.log, IMHO) 

Error is:
[matkor@laptop-hp ~/src/caswife]$ python test.py
column_family: 'CF0'
row_key: 'row=0 '
row_key: 'row=1 \x00'
row_key: 'row=2 \x00\x01'
row_key: 'row=3 \x00\x01\x02'
row_key: 'row=4 \x00\x01\x02\x03'
row_key: 'row=5 \x00\x01\x02\x03\x04'
row_key: 'row=6 \x00\x01\x02\x03\x04\x05'
Traceback (most recent call last):
  File "test.py", line 36, in <module>
    loaded_cols_dict = current_cf.get(row_key)
  File "/usr/share/python2.7/site-packages/pycassa/columnfamily.py", line 362, in new_f
  File "/usr/share/python2.7/site-packages/pycassa/columnfamily.py", line 429, in get
pycassa.cassandra.ttypes.NotFoundException: NotFoundException()

So program found 7 rows but failed to find 8th as from my understanding of pycassa.

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983725#action_12983725 ] 

Jonathan Ellis commented on CASSANDRA-1992:
-------------------------------------------

I think we need to get rid of StreamInSession.table and StreamHeader.table too, then.  (But leave it on the wire protocol as an empty string, for compatibility w/ 0.7.0)

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>         Attachments: 1992.txt
>
>   Original Estimate: 8h
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-1992:
-----------------------------------------

    Assignee: Brandon Williams

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.