You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tyler Patterson (Created) (JIRA)" <ji...@apache.org> on 2012/02/09 20:52:59 UTC

[jira] [Created] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Intermittent SchemaDisagreementException
----------------------------------------

                 Key: CASSANDRA-3884
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.1.0
         Environment: using ccm on ubuntu. 
            Reporter: Tyler Patterson


Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 

There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Brandon Williams (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reassigned CASSANDRA-3884:
-------------------------------------------

    Assignee: Pavel Yaskevich
    
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206210#comment-13206210 ] 

Pavel Yaskevich commented on CASSANDRA-3884:
--------------------------------------------

bq. That sounds buggy to me, the goal of CASSANDRA-1391 was to make it so you don't have to care about that kind of thing anymore.

There is still no way to make migrations atomic so the same rule applies, we made possible to concurrent schema propagation with CASSANDRA-1391 so we don't need to worry about ordering (or time) of the changes but what happens when you update keyspace/column_family simultaneously doing heavy write/read is still unpredictable because that would require some sort of global lock while KSMetaData/CFMetaData are mutated.
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Pavel Yaskevich (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214806#comment-13214806 ] 

Pavel Yaskevich edited comment on CASSANDRA-3884 at 2/23/12 3:34 PM:
---------------------------------------------------------------------

Committed with fixed nit and changes to tests added to the separate commit.
                
      was (Author: xedin):
    Committed.
                  
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1.0
>
>         Attachments: CASSANDRA-3884-v2.patch, CASSANDRA-3884.patch
>
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214663#comment-13214663 ] 

Sylvain Lebresne commented on CASSANDRA-3884:
---------------------------------------------

+1 with one nit: if I understand correctly Migration.applyImpl() should never return an empty list of RowMutation, so it would be nice to assert that fact in Migration.apply().

And as said above, I'd prefer separating the test changes to another commit (or open a ticket for them, but I can live with those change being committed directly).
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1.0
>
>         Attachments: CASSANDRA-3884-v2.patch, CASSANDRA-3884.patch
>
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Pavel Yaskevich (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-3884:
---------------------------------------

    Attachment: CASSANDRA-3884-v2.patch

v2 with modifications mentioned by Sylvain.
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1.0
>
>         Attachments: CASSANDRA-3884-v2.patch, CASSANDRA-3884.patch
>
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Tyler Patterson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205571#comment-13205571 ] 

Tyler Patterson commented on CASSANDRA-3884:
--------------------------------------------

I found another problem that might be related. When I drop and re-create a column family while that column family is being compacted, I sometimes see the following error in the logs: 
{% code %}
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,773 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-73-Data.db (it will be removed on server restart; we'll also retry after GC)
 INFO [CompactionExecutor:4] 2012-02-10 00:32:27,774 CompactionTask.java (line 226) Compacted to [/tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db,].  94,406,208 to 11,542,817 (~12% of original) bytes for 302,584 keys at 1.436150MB/s.  Time: 7,665ms.
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,774 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-69-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,778 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-80-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,779 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-79-Data.db (it will be removed on server restart; we'll also retry after GC)
{% code %}
 failing_tests/change_schema_under_load.py
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Tyler Patterson (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205599#comment-13205599 ] 

Tyler Patterson edited comment on CASSANDRA-3884 at 2/10/12 6:30 PM:
---------------------------------------------------------------------

Here is another error that happens intermittently when running failing_tests/change_schema_under_load.py:
{code}
Error occured during compaction
java.util.concurrent.ExecutionException: java.io.IOError: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
	at org.apache.cassandra.db.compaction.CompactionManager.performMaximal(CompactionManager.java:236)
	at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1511)
	at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1708)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
	at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
	at sun.rmi.transport.Transport$1.run(Transport.java:159)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOError: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:60)
	at org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:821)
	at org.apache.cassandra.db.compaction.CompactionIterable.getScanners(CompactionIterable.java:73)
	at org.apache.cassandra.db.compaction.CompactionIterable.<init>(CompactionIterable.java:56)
	at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:126)
	at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:261)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	... 3 more
Caused by: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at java.io.RandomAccessFile.open(Native Method)
	at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
	at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:67)
	at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:102)
	at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:87)
	at org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:965)
	at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:56)
	... 12 more
{code}
                
      was (Author: tpatterson):
    Here is another error that happens intermittently with the dtest failing_tests/change_schema_under_load.py:
{code}
Error occured during compaction
java.util.concurrent.ExecutionException: java.io.IOError: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
	at org.apache.cassandra.db.compaction.CompactionManager.performMaximal(CompactionManager.java:236)
	at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1511)
	at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1708)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
	at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
	at sun.rmi.transport.Transport$1.run(Transport.java:159)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOError: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:60)
	at org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:821)
	at org.apache.cassandra.db.compaction.CompactionIterable.getScanners(CompactionIterable.java:73)
	at org.apache.cassandra.db.compaction.CompactionIterable.<init>(CompactionIterable.java:56)
	at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:126)
	at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:261)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	... 3 more
Caused by: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at java.io.RandomAccessFile.open(Native Method)
	at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
	at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:67)
	at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:102)
	at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:87)
	at org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:965)
	at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:56)
	... 12 more
{code}
                  
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Tyler Patterson (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205571#comment-13205571 ] 

Tyler Patterson edited comment on CASSANDRA-3884 at 2/10/12 5:46 PM:
---------------------------------------------------------------------

I found another problem that might be related. When I drop and re-create a column family while that column family is being compacted, I sometimes see the following error in the logs: 
{code}
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,773 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-73-Data.db (it will be removed on server restart; we'll also retry after GC)
 INFO [CompactionExecutor:4] 2012-02-10 00:32:27,774 CompactionTask.java (line 226) Compacted to [/tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db,].  94,406,208 to 11,542,817 (~12% of original) bytes for 302,584 keys at 1.436150MB/s.  Time: 7,665ms.
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,774 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-69-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,778 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-80-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,779 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-79-Data.db (it will be removed on server restart; we'll also retry after GC)
{code}
The dtest failing_tests/change_schema_under_load.py illustrates the problem.
                
      was (Author: tpatterson):
    I found another problem that might be related. When I drop and re-create a column family while that column family is being compacted, I sometimes see the following error in the logs: 
{% code %}
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,773 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-73-Data.db (it will be removed on server restart; we'll also retry after GC)
 INFO [CompactionExecutor:4] 2012-02-10 00:32:27,774 CompactionTask.java (line 226) Compacted to [/tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db,].  94,406,208 to 11,542,817 (~12% of original) bytes for 302,584 keys at 1.436150MB/s.  Time: 7,665ms.
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,774 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-69-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,778 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-80-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2012-02-10 00:32:27,779 SSTableDeletingTask.java (line 76) Unable to delete /tmp/dtest-l4IQpd/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-79-Data.db (it will be removed on server restart; we'll also retry after GC)
{% code %}
 failing_tests/change_schema_under_load.py
                  
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206155#comment-13206155 ] 

Jonathan Ellis commented on CASSANDRA-3884:
-------------------------------------------

bq. he first rule of the schema migration (which you have violated) is to make sure that cluster is reasonably quiet for KS/CF you do updates upon because different bad things can happen if you migrate under heavy load

That sounds buggy to me, the goal of CASSANDRA-1391 was to make it so you don't have to care about that kind of thing anymore.
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Tyler Patterson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205599#comment-13205599 ] 

Tyler Patterson commented on CASSANDRA-3884:
--------------------------------------------

Here is another error that happens intermittently with the dtest failing_tests/change_schema_under_load.py:
{code}
Error occured during compaction
java.util.concurrent.ExecutionException: java.io.IOError: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
	at org.apache.cassandra.db.compaction.CompactionManager.performMaximal(CompactionManager.java:236)
	at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1511)
	at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1708)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
	at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
	at sun.rmi.transport.Transport$1.run(Transport.java:159)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOError: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:60)
	at org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:821)
	at org.apache.cassandra.db.compaction.CompactionIterable.getScanners(CompactionIterable.java:73)
	at org.apache.cassandra.db.compaction.CompactionIterable.<init>(CompactionIterable.java:56)
	at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:126)
	at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:261)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	... 3 more
Caused by: java.io.FileNotFoundException: /tmp/dtest-cf_hbP/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-82-Data.db (No such file or directory)
	at java.io.RandomAccessFile.open(Native Method)
	at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
	at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:67)
	at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:102)
	at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:87)
	at org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:965)
	at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:56)
	... 12 more
{code}
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214652#comment-13214652 ] 

Sylvain Lebresne commented on CASSANDRA-3884:
---------------------------------------------

bq. I have experienced gossip related NPE exceptions (in isEnabled() method for example) in the MM.passiveAnnounce method when Gossiper wasn't started.

Is that related to this patch? Otherwise I'd prefer leaving that to another ticket (at least a different commit).
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1.0
>
>         Attachments: CASSANDRA-3884-v2.patch, CASSANDRA-3884.patch
>
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211974#comment-13211974 ] 

Sylvain Lebresne commented on CASSANDRA-3884:
---------------------------------------------

The test Tyler is talking about is there: https://github.com/riptano/cassandra-dtest/blob/master/schema_changes_test.py.

On my machine, the test fails every time on 1.1 with a schema version disagreement during the column family drop. The test can be simplified a bit (and still fail) so that all it does is creating a keyspace, creating a column family and then dropping that column family. There is no insertion going on at all. There is also more than a 1 second wait between the initial creation and the drop. Last but not least, that same test is working perfectly fine on 1.0. It *is* a regression of 1.1.
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206260#comment-13206260 ] 

Pavel Yaskevich commented on CASSANDRA-3884:
--------------------------------------------

Yes, this is what I mean and all existing CFs not involved in schema mutation are not touched by migration process.
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Pavel Yaskevich (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-3884:
---------------------------------------

    Attachment: CASSANDRA-3884.patch

I have localized and fixed the problem which was that coordinator wasn't sending the mutations used to change schema but a snapshot of the schema itself after each of the migrations.
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1.0
>
>         Attachments: CASSANDRA-3884.patch
>
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214627#comment-13214627 ] 

Sylvain Lebresne commented on CASSANDRA-3884:
---------------------------------------------

It seems to perfectly make sense to send only the new mutation rather the entire schema so I'm good with that and it does fix the distributed test. But I'm not sure I understand why. Since Migration.announce() was call after applying the mutation, the new changes just applied should be part of the entire schema read from SystemTable, shouldn't they? So why did we get a SchemaDisagreementException before?

Also a few other remarks/nits/questions:
* In MigrationHelper if withSchemaRecord is false the mutations will be null, and most function will return a list containing null. We should return an empty list instead or null (but in that last case, Migration.apply() should deal with null). Also MigrationHelper.dropColumnFamily() directly return null, so we should make it match whatever we do for the other method.
* It's slightly more efficient to use Collections.singleton() than Arrays.asList with one element.
* Why does the tests now need to start gossip?

                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1.0
>
>         Attachments: CASSANDRA-3884.patch
>
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206253#comment-13206253 ] 

Jonathan Ellis commented on CASSANDRA-3884:
-------------------------------------------

If you mean you can race inserts into the new CF, with the CF creation, then sure, there's no way we can make that work.  But all the other CFs should be fine.
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206087#comment-13206087 ] 

Pavel Yaskevich commented on CASSANDRA-3884:
--------------------------------------------

The first rule of the schema migration (which you have violated) is to make sure that cluster is reasonably quiet for KS/CF you do updates upon because different bad things can happen if you migrate under heavy load. Can you please attach to the ticket system.log from both nodes in the situation when you get SchemaDisagreementException?
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214634#comment-13214634 ] 

Pavel Yaskevich commented on CASSANDRA-3884:
--------------------------------------------

The answer to your main question is - compaction, everything works well when you add new or modify columns but when you e.g. delete cf columns from keyspace and compaction kicks in before you grabbed the whole schema that schema will be missing updates for that columns so they won't be pushed to the remote nodes leaving cf attributes in their schema_columnfamilies.

bq. In MigrationHelper if withSchemaRecord is false the mutations will be null, and most function will return a list containing null. We should return an empty list instead or null (but in that last case, Migration.apply() should deal with null). Also MigrationHelper.dropColumnFamily() directly return null, so we should make it match whatever we do for the other method

Sure, I will make it return Collections.singleton()

bq. It's slightly more efficient to use Collections.singleton() than Arrays.asList with one element.

Sure, will change it in updated v2.

bq. Why does the tests now need to start gossip?

I have experienced gossip related NPE exceptions (in isEnabled() method for example) in the MM.passiveAnnounce method when Gossiper wasn't started.
                
> Intermittent SchemaDisagreementException
> ----------------------------------------
>
>                 Key: CASSANDRA-3884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: using ccm on ubuntu. 
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1.0
>
>         Attachments: CASSANDRA-3884.patch
>
>
> Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and column families, and then make several schema changes. Everything is being done through only one of the nodes.  About once every 10 times (on my setup) I get a SchemaDisagreementException when creating and dropping keyspaces. 
> There is a dtest for this: schema_changes_test.py. If your environment behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira