You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Adam Cramer <ad...@bn.co> on 2014/05/13 16:18:47 UTC

Schema errors when bootstrapping / restarting node

Hi All,

I'm having some major issues bootstrapping a new node to my cluster.  We
are running 1.2.16, with vnodes enabled.

When a new node starts up (with auto_bootstrap), it selects a host ID and
finds the ring successfully:

INFO 18:42:29,559 JOINING: waiting for ring information

It successfully selects a set of tokens.  Then the weird stuff begins.  I
get this error once, while the node is reading the system keyspace:

ERROR 18:42:32,921 Exception in thread
Thread[InternalResponseStage:1,5,main]
java.lang.NullPointerException
at org.apache.cassandra.utils.ByteBufferUtil.toLong(ByteBufferUtil.java:421)
at org.apache.cassandra.cql.jdbc.JdbcLong.compose(JdbcLong.java:94)
at org.apache.cassandra.db.marshal.LongType.compose(LongType.java:34)
at org.apache.cassandra
.cql3.UntypedResultSet$Row.getLong(UntypedResultSet.java:138)
at org.apache.cassandra.db.SystemTable.migrateKeyAlias(SystemTable.java:199)
at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:346)
at org.apache.cassandra
.service.MigrationTask$1.response(MigrationTask.java:66)
at org.apache.cassandra
.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47)
at org.apache.cassandra
.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


But it doesn't stop the bootstrap process.  The node successfully
handshakes versions, and pauses before bootstrapping:


 INFO 18:42:59,564 JOINING: schema complete, ready to bootstrap
 INFO 18:42:59,565 JOINING: waiting for pending range calculation
 INFO 18:42:59,565 JOINING: calculation complete, ready to bootstrap
 INFO 18:42:59,565 JOINING: getting bootstrap token
 INFO 18:42:59,705 JOINING: sleeping 30000 ms for pending range setup


After 30 seconds, I get a flood of endless
org.apache.cassandra.db.UnknownColumnFamilyException
errors, and all other nodes in the cluster log the following endlessly:

INFO [HANDSHAKE-/x.x.x.x] 2014-05-09 18:44:36,289
OutboundTcpConnection.java (line 418) Handshaking version with /x.x.x.x


I suspect there may be something wrong with my schemas.  Sometimes while
restarting an existing node, the node will fail to restart, with the
following error, again while reading the system keyspace:

ERROR [InternalResponseStage:5] 2014-05-05 23:56:03,786
CassandraDaemon.java (line 191) Exception in thread
Thread[InternalResponseStage:5,5,main]
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'column1' as
hex bytes
        at org.apache.cassandra
.db.marshal.BytesType.fromString(BytesType.java:69)
        at org.apache.cassandra
.config.ColumnDefinition.fromSchema(ColumnDefinition.java:231)
        at org.apache.cassandra
.config.CFMetaData.addColumnDefinitionSchema(CFMetaData.java:1524)
        at org.apache.cassandra
.config.CFMetaData.fromSchema(CFMetaData.java:1456)
        at org.apache.cassandra
.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:306)
        at org.apache.cassandra
.db.DefsTable.mergeColumnFamilies(DefsTable.java:444)
        at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:356)
        at org.apache.cassandra
.service.MigrationTask$1.response(MigrationTask.java:66)
        at org.apache.cassandra
.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47)
        at org.apache.cassandra
.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NumberFormatException: An hex string representing
bytes must have an even length
        at org.apache.cassandra.utils.Hex.hexToBytes(Hex.java:52)
        at org.apache.cassandra
.db.marshal.BytesType.fromString(BytesType.java:65)
        ... 12 more

I am able to fix this error by clearing out the schema_columns system table
on disk.  After that, a node can boot successfully.

Does anyone have a clue what's going on here?

Thanks!

Re: Schema errors when bootstrapping / restarting node

Posted by Aaron Morton <aa...@thelastpickle.com>.

> I am able to fix this error by clearing out the schema_columns system table on disk.  After that, a node can boot successfully.
> 
> Does anyone have a clue what's going on here?

Something has come corrupted in the system tables as you say. 

A less aggressive way to reset the local schema is to use nodetool resetlocalschema on the nodes that you suspect as having problems. 

> ERROR [InternalResponseStage:5] 2014-05-05 23:56:03,786 CassandraDaemon.java (line 191) Exception in thread Thread[InternalResponseStage:5,5,main]
> org.apache.cassandra.db.marshal.MarshalException: cannot parse 'column1' as hex bytes
>         at org.apache.cassandra.db.marshal.BytesType.fromString(BytesType.java:69)
>         at org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.java:231)
>         at org.apache.cassandra.config.CFMetaData.addColumnDefinitionSchema(CFMetaData.java:1524)
>         at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1456)

This looks like a secondary index has been incorrectly defined via thrift. I would guess the comparator for the CF is BytesType and you have defined an index on a column and specified the column name as “column1” which is not a valid hex value. 

You should be able to fix this by dropping the index or dropping the CF. 

Hope that helps. 
Aaron



-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 14/05/2014, at 2:18 am, Adam Cramer <ad...@bn.co> wrote:

> Hi All,
> 
> I'm having some major issues bootstrapping a new node to my cluster.  We are running 1.2.16, with vnodes enabled.
> 
> When a new node starts up (with auto_bootstrap), it selects a host ID and finds the ring successfully:
> 
> INFO 18:42:29,559 JOINING: waiting for ring information
> 
> It successfully selects a set of tokens.  Then the weird stuff begins.  I get this error once, while the node is reading the system keyspace:
> 
> ERROR 18:42:32,921 Exception in thread Thread[InternalResponseStage:1,5,main]
> java.lang.NullPointerException
> 	at org.apache.cassandra.utils.ByteBufferUtil.toLong(ByteBufferUtil.java:421)
> 	at org.apache.cassandra.cql.jdbc.JdbcLong.compose(JdbcLong.java:94)
> 	at org.apache.cassandra.db.marshal.LongType.compose(LongType.java:34)
> 	at org.apache.cassandra.cql3.UntypedResultSet$Row.getLong(UntypedResultSet.java:138)
> 	at org.apache.cassandra.db.SystemTable.migrateKeyAlias(SystemTable.java:199)
> 	at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:346)
> 	at org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:66)
> 	at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47)
> 	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 
> 
> But it doesn't stop the bootstrap process.  The node successfully handshakes versions, and pauses before bootstrapping:
> 
> 
>  INFO 18:42:59,564 JOINING: schema complete, ready to bootstrap
>  INFO 18:42:59,565 JOINING: waiting for pending range calculation
>  INFO 18:42:59,565 JOINING: calculation complete, ready to bootstrap
>  INFO 18:42:59,565 JOINING: getting bootstrap token
>  INFO 18:42:59,705 JOINING: sleeping 30000 ms for pending range setup
> 
> 
> After 30 seconds, I get a flood of endless org.apache.cassandra.db.UnknownColumnFamilyException errors, and all other nodes in the cluster log the following endlessly:
> 
> INFO [HANDSHAKE-/x.x.x.x] 2014-05-09 18:44:36,289 OutboundTcpConnection.java (line 418) Handshaking version with /x.x.x.x
> 
> 
> I suspect there may be something wrong with my schemas.  Sometimes while restarting an existing node, the node will fail to restart, with the following error, again while reading the system keyspace:
> 
> ERROR [InternalResponseStage:5] 2014-05-05 23:56:03,786 CassandraDaemon.java (line 191) Exception in thread Thread[InternalResponseStage:5,5,main]
> org.apache.cassandra.db.marshal.MarshalException: cannot parse 'column1' as hex bytes
>         at org.apache.cassandra.db.marshal.BytesType.fromString(BytesType.java:69)
>         at org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.java:231)
>         at org.apache.cassandra.config.CFMetaData.addColumnDefinitionSchema(CFMetaData.java:1524)
>         at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1456)
>         at org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:306)
>         at org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:444)
>         at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:356)
>         at org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:66)
>         at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NumberFormatException: An hex string representing bytes must have an even length
>         at org.apache.cassandra.utils.Hex.hexToBytes(Hex.java:52)
>         at org.apache.cassandra.db.marshal.BytesType.fromString(BytesType.java:65)
>         ... 12 more
> 
> I am able to fix this error by clearing out the schema_columns system table on disk.  After that, a node can boot successfully.
> 
> Does anyone have a clue what's going on here?
> 
> Thanks!