You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Richard Dawe <ri...@messagesystems.com> on 2015/01/15 00:13:54 UTC

Schema changes: where in Java code are they sent?

Hello,

I’m doing some research on schema migrations for Cassandra.

I’ve been playing with cqlsh with TRACING ON, and I can see that a schema change like “CREATE TABLE” is sent to all nodes in the cluster. And also that “CREATE TABLE” fails if only one of my three nodes is up (with replication factor = 3).

I’ve been trying to find the Java code where the schema migration is sent to the other nodes in the cluster, to understand what the requirements are for successfully applying the update. E.g.: is QUORUM consistency level applied?

I spent an hour looking through the Java code last night, with no luck. I thought this code would be in StorageProxy.java, but I have not found it there, or in any of the other classes I looked at.

Any pointers would be appreciated.

Thanks, best regards, Rich


Re: Schema changes: where in Java code are they sent?

Posted by Richard Dawe <ri...@messagesystems.com>.
Good morning,

Sorry for the slow reply here. I finally had some time to test cqlsh tracing on a ccm cluster with 2 of 3 nodes down, to see if the unavailable error was due to cqlsh or my query. Reply inline below.

On 15/01/2015 12:46, "Tyler Hobbs" <ty...@datastax.com>> wrote:

On Thu, Jan 15, 2015 at 6:30 AM, Richard Dawe <ri...@messagesystems.com>> wrote:

I thought it might be quorum consistency level, because of the because I was seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes, vnodes enabled ("ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all three nodes up, my schema operations were working fine. When I took down two nodes using “ccm node2 stop”, “ccm node3 stop”, I found that schema operations through “ccm node1 cqlsh” were failing like this:

  cqlsh> ALTER TABLE test.test3 ADD fred text;
  Unable to complete request: one or more nodes were unavailable.

That’s the full output — I had enabled tracing, but only that error came back.

After reading your reply, I went back and re-ran my tests with cqlsh, and it seems like the “one or more nodes were unavailable” may be due to cqlsh’s error handling.

If I wait a bit, and re-run my schema operations, they work fine with only one node up. I can see in the tracing that it’s only talking to node1 (127.0.0.1) to make the schema modifications.

Is this a known issue in cqlsh? If it helps I can send the full command-line session log.

That Unavailable error may actually be from the tracing-related queries failing (that's what I suspect, at least).  Starting cqlsh with --debug might show you a stacktrace in that case, but I'm not 100% sure.

Yes, it does seem to be cqlsh tracing. The debug output below was generated with:

 * A 3 node ccm cluster, running Cassandra 2.0.8 on Ubuntu 14.10 x86_64.
 * I took down 2 of the 3 nodes.
 * Table test5 has a replication factor of 3, primary key is “id text”.
 * cqlsh session was started after 2 of the 3 nodes had been shut down.

Debug output:

rdawe@cstar:~$ ccm node1 cqlsh --debug
Using CQL driver: <module 'cql' from '/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/__init__.py'>
Using thrift lib: <module 'thrift' from '/home/rdawe/.ccm/repository/2.0.8/bin/../lib/thrift-python-internal-only-0.9.1.zip/thrift/__init__.py'>
Connected to test at 127.0.0.1:9160.
[cqlsh 4.1.1 | Cassandra 2.0.8-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE test;
cqlsh:test> TRACING ON
Now tracing requests.
cqlsh:test> SELECT * FROM test5;

 id    | foo
-------+-------
 blarg |  ness
 hello | world

(2 rows)

Traceback (most recent call last):
  File "/home/rdawe/.ccm/repository/2.0.8/bin/cqlsh", line 827, in onecmd
    self.handle_statement(st, statementtext)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/cqlsh", line 865, in handle_statement
    return custom_handler(parsed)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/cqlsh", line 901, in do_select
    with_default_limit=with_default_limit)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/cqlsh", line 910, in perform_statement
    print_trace_session(self, self.cursor, session_id)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/../pylib/cqlshlib/tracing.py", line 26, in print_trace_session
    rows  = fetch_trace_session(cursor, session_id)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/../pylib/cqlshlib/tracing.py", line 47, in fetch_trace_session
    consistency_level='ONE')
  File "/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/cursor.py", line 80, in execute
    response = self.get_response(prepared_q, cl)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/thrifteries.py", line 77, in get_response
    return self.handle_cql_execution_errors(doquery, compressed_q, compress, cl)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/thrifteries.py", line 102, in handle_cql_execution_errors
    raise cql.OperationalError("Unable to complete request: one or "
OperationalError: Unable to complete request: one or more nodes were unavailable.

Sometimes I get a different error:

rdawe@cstar:~$ echo -e 'TRACING ON\nSELECT * FROM test.test5;\n' | ccm node1 cqlsh --debug
Using CQL driver: <module 'cql' from '/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/__init__.py'>
Using thrift lib: <module 'thrift' from '/home/rdawe/.ccm/repository/2.0.8/bin/../lib/thrift-python-internal-only-0.9.1.zip/thrift/__init__.py'>
Now tracing requests.

 id    | foo
-------+-------
 blarg |  ness
 hello | world

(2 rows)

<stdin>:3:Session edc8c010-bcd5-11e4-a008-1dd7f4de70a1 wasn't found.

I notice that the system_traces keyspace has replication factor 2. Since 2 nodes are down, perhaps sometimes the tracing session would be stored on nodes that are down. And other times one of the two replicas for system_traces would be on the node that’s up, but for some reason storing the data in system_traces.sessions fails?

Thanks, best regards, Rich


Re: Schema changes: where in Java code are they sent?

Posted by Tyler Hobbs <ty...@datastax.com>.
On Thu, Jan 15, 2015 at 6:30 AM, Richard Dawe <ri...@messagesystems.com>
wrote:

>
>  I thought it might be quorum consistency level, because of the because I
> was seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes,
> vnodes enabled ("ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all
> three nodes up, my schema operations were working fine. When I took down
> two nodes using “ccm node2 stop”, “ccm node3 stop”, I found that schema
> operations through “ccm node1 cqlsh” were failing like this:
>
>    cqlsh> ALTER TABLE test.test3 ADD fred text;
>   Unable to complete request: one or more nodes were unavailable.
>
>  That’s the full output — I had enabled tracing, but only that error came
> back.
>
>  After reading your reply, I went back and re-ran my tests with cqlsh,
> and it seems like the “one or more nodes were unavailable” may be due to
> cqlsh’s error handling.
>
>  If I wait a bit, and re-run my schema operations, they work fine with
> only one node up. I can see in the tracing that it’s only talking to node1
> (127.0.0.1) to make the schema modifications.
>
>  Is this a known issue in cqlsh? If it helps I can send the full
> command-line session log.
>

That Unavailable error may actually be from the tracing-related queries
failing (that's what I suspect, at least).  Starting cqlsh with --debug
might show you a stacktrace in that case, but I'm not 100% sure.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Schema changes: where in Java code are they sent?

Posted by Richard Dawe <ri...@messagesystems.com>.
Hi Tyler,

Thank you for your quick reply; follow-up inline below.

On 14/01/2015 19:36, "Tyler Hobbs" <ty...@datastax.com>> wrote:

On Wed, Jan 14, 2015 at 5:13 PM, Richard Dawe <ri...@messagesystems.com>> wrote:

I’ve been trying to find the Java code where the schema migration is sent to the other nodes in the cluster, to understand what the requirements are for successfully applying the update. E.g.: is QUORUM consistency level applied?

A quorum isn't required.  Schema changes are simply applied against the local node (whichever node the client sends the query to) and then are pushed out to the other nodes.  Nodes will also pull the latest schema from other nodes as needed (for example, if a node was down during a schema change).

I thought it might be quorum consistency level, because of the because I was seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes, vnodes enabled ("ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all three nodes up, my schema operations were working fine. When I took down two nodes using “ccm node2 stop”, “ccm node3 stop”, I found that schema operations through “ccm node1 cqlsh” were failing like this:

  cqlsh> ALTER TABLE test.test3 ADD fred text;
  Unable to complete request: one or more nodes were unavailable.

That’s the full output — I had enabled tracing, but only that error came back.

After reading your reply, I went back and re-ran my tests with cqlsh, and it seems like the “one or more nodes were unavailable” may be due to cqlsh’s error handling.

If I wait a bit, and re-run my schema operations, they work fine with only one node up. I can see in the tracing that it’s only talking to node1 (127.0.0.1) to make the schema modifications.

Is this a known issue in cqlsh? If it helps I can send the full command-line session log.


I spent an hour looking through the Java code last night, with no luck. I thought this code would be in StorageProxy.java, but I have not found it there, or in any of the other classes I looked at.

MigrationManager is probably the most central class for this stuff.

Thank you. That code makes a lot more sense now. :)

Best regards, Rich


Re: Schema changes: where in Java code are they sent?

Posted by Tyler Hobbs <ty...@datastax.com>.
On Wed, Jan 14, 2015 at 5:13 PM, Richard Dawe <ri...@messagesystems.com>
wrote:

>
>  I’ve been trying to find the Java code where the schema migration is
> sent to the other nodes in the cluster, to understand what the requirements
> are for successfully applying the update. E.g.: is QUORUM consistency level
> applied?
>

A quorum isn't required.  Schema changes are simply applied against the
local node (whichever node the client sends the query to) and then are
pushed out to the other nodes.  Nodes will also pull the latest schema from
other nodes as needed (for example, if a node was down during a schema
change).


>
>  I spent an hour looking through the Java code last night, with no luck.
> I thought this code would be in StorageProxy.java, but I have not found it
> there, or in any of the other classes I looked at.
>

MigrationManager is probably the most central class for this stuff.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>