You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oleksandr Shulgin <ol...@zalando.de> on 2018/03/05 14:40:27 UTC

Seed nodes of DC2 creating own versions of system keyspaces

Hi,

We were deploying a second DC today with 3 seed nodes (30 nodes in total)
and we have noticed that all seed nodes reported the following:

INFO  10:20:50 Create new Keyspace: KeyspaceMetadata{name=system_traces,
params=KeyspaceParams{durable_writes=true,
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
replication_factor=2}}, ...

followed by similar lines for system_distributed and system_auth.  Is this
to be expected?

Cassandra version is 3.0.15.  The DC2 was added to NTS replication setting
for all of the non-local keyspaces in advance, even before starting any of
the new nodes.  The schema versions reported by `nodetool describecluster'
are consistent accross DCs, that is: all nodes are on the same version.

All new nodes use auto_bootstrap=true (in order for
allocate_tokens_for_keyspace=mydata_ks to take effect), the seeds ignore
this setting and report it.  The non-seed nodes didn't try to create the
system keyspaces on their own.

I would expect that even if we don't add the DC2 in advance, the new nodes
should be able to learn about existing system keyspaces and wouldn't try to
create their own.  Ultimately we will run `nodetool rebuild' on every node
in DC2, but I would like to understand why this schema disagreement
initially?

Thanks,
-- 
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
127-59-707

Re: Seed nodes of DC2 creating own versions of system keyspaces

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On Tue, Mar 6, 2018 at 8:28 PM, Jeff Jirsa <jj...@gmail.com> wrote:

>
> Sorry, I wasnt as precise as I should have been:
>
> In 3.0 and newer, a bootstrapping node will wait until it has schema
> before it bootstraps. HOWEVER, we make the ssystem_auth/system_distributed,
> etc keyspaces as a node starts up, before it requests the schema from the
> rest of the cluster.
>
> You will see some schema exchanges go through the cluster as new 3.0 nodes
> come online, but it's a no-op schema change.
>

Well, this I also see from the code, but it doesn't answer the question of
"why". :)

Is this again because of the very first seed node corner case?  Will it
hang indefinitely waiting for schema from other nodes if it would try?

--
Alex

Re: Seed nodes of DC2 creating own versions of system keyspaces

Posted by Jeff Jirsa <jj...@gmail.com>.
On Tue, Mar 6, 2018 at 9:50 AM, Oleksandr Shulgin <
oleksandr.shulgin@zalando.de> wrote:

> On 6 Mar 2018 16:55, "Jeff Jirsa" <jj...@gmail.com> wrote:
>
> On Mar 6, 2018, at 12:32 AM, Oleksandr Shulgin <
> oleksandr.shulgin@zalando.de> wrote:
>
> On 5 Mar 2018 16:13, "Jeff Jirsa" <jj...@gmail.com> wrote:
>
> On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin <
> oleksandr.shulgin@zalando.de> wrote:
>
> We were deploying a second DC today with 3 seed nodes (30 nodes in total)
> and we have noticed that all seed nodes reported the following:
>
> INFO  10:20:50 Create new Keyspace: KeyspaceMetadata{name=system_traces,
> params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{
> class=org.apache.cassandra.locator.SimpleStrategy,
> replication_factor=2}}, ...
>
> followed by similar lines for system_distributed and system_auth.  Is this
> to be expected?
>
> They’re written with timestamp=0 to ensure they’re created at least once,
> but if you’ve ever issued an ALTER to the table or keyspace, your modified
> version will win through normal schema reconciliation process.
>
>
> OK.  Any specific reason why non-bootstrapping nodes don't wait for schema
> propagation before joining the ring?
>
>
>
> They do in 3.0 and newer, the built in keyspaces still get auto created
> before that happens
>
>
> We are seeing this on 3.0.15, but if it's no longer the case with newer
> versions, then fine.
>
>
>
Sorry, I wasnt as precise as I should have been:

In 3.0 and newer, a bootstrapping node will wait until it has schema before
it bootstraps. HOWEVER, we make the ssystem_auth/system_distributed, etc
keyspaces as a node starts up, before it requests the schema from the rest
of the cluster.

You will see some schema exchanges go through the cluster as new 3.0 nodes
come online, but it's a no-op schema change.

Re: Seed nodes of DC2 creating own versions of system keyspaces

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On 6 Mar 2018 16:55, "Jeff Jirsa" <jj...@gmail.com> wrote:

On Mar 6, 2018, at 12:32 AM, Oleksandr Shulgin <ol...@zalando.de>
wrote:

On 5 Mar 2018 16:13, "Jeff Jirsa" <jj...@gmail.com> wrote:

On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin <ol...@zalando.de>
wrote:

We were deploying a second DC today with 3 seed nodes (30 nodes in total)
and we have noticed that all seed nodes reported the following:

INFO  10:20:50 Create new Keyspace: KeyspaceMetadata{name=system_traces,
params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{
class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2}},
...

followed by similar lines for system_distributed and system_auth.  Is this
to be expected?

They’re written with timestamp=0 to ensure they’re created at least once,
but if you’ve ever issued an ALTER to the table or keyspace, your modified
version will win through normal schema reconciliation process.


OK.  Any specific reason why non-bootstrapping nodes don't wait for schema
propagation before joining the ring?



They do in 3.0 and newer, the built in keyspaces still get auto created
before that happens


We are seeing this on 3.0.15, but if it's no longer the case with newer
versions, then fine.

Thanks,
--
Alex

Re: Seed nodes of DC2 creating own versions of system keyspaces

Posted by Jeff Jirsa <jj...@gmail.com>.

-- 
Jeff Jirsa


> On Mar 6, 2018, at 12:32 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:
> 
> On 5 Mar 2018 16:13, "Jeff Jirsa" <jj...@gmail.com> wrote:
>> On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:
>> We were deploying a second DC today with 3 seed nodes (30 nodes in total) and we have noticed that all seed nodes reported the following:
>> 
>> INFO  10:20:50 Create new Keyspace: KeyspaceMetadata{name=system_traces, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2}}, ...
>> 
>> followed by similar lines for system_distributed and system_auth.  Is this to be expected?
> They’re written with timestamp=0 to ensure they’re created at least once, but if you’ve ever issued an ALTER to the table or keyspace, your modified version will win through normal schema reconciliation process.
> 
> OK.  Any specific reason why non-bootstrapping nodes don't wait for schema propagation before joining the ring?
> 


They do in 3.0 and newer, the built in keyspaces still get auto created before that happens


Re: Seed nodes of DC2 creating own versions of system keyspaces

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On 5 Mar 2018 16:13, "Jeff Jirsa" <jj...@gmail.com> wrote:

On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin <ol...@zalando.de>
wrote:

We were deploying a second DC today with 3 seed nodes (30 nodes in total)
and we have noticed that all seed nodes reported the following:

INFO  10:20:50 Create new Keyspace: KeyspaceMetadata{name=system_traces,
params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{
class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2}},
...

followed by similar lines for system_distributed and system_auth.  Is this
to be expected?

They’re written with timestamp=0 to ensure they’re created at least once,
but if you’ve ever issued an ALTER to the table or keyspace, your modified
version will win through normal schema reconciliation process.


OK.  Any specific reason why non-bootstrapping nodes don't wait for schema
propagation before joining the ring?

--
Alex

Re: Seed nodes of DC2 creating own versions of system keyspaces

Posted by Jeff Jirsa <jj...@gmail.com>.


> On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:
> 
> Hi,
> 
> We were deploying a second DC today with 3 seed nodes (30 nodes in total) and we have noticed that all seed nodes reported the following:
> 
> INFO  10:20:50 Create new Keyspace: KeyspaceMetadata{name=system_traces, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2}}, ...
> 
> followed by similar lines for system_distributed and system_auth.  Is this to be expected?


They’re written with timestamp=0 to ensure they’re created at least once, but if you’ve ever issued an ALTER to the table or keyspace, your modified version will win through normal schema reconciliation process.


> 
> Cassandra version is 3.0.15.  The DC2 was added to NTS replication setting for all of the non-local keyspaces in advance, even before starting any of the new nodes.  The schema versions reported by `nodetool describecluster' are consistent accross DCs, that is: all nodes are on the same version.
> 
> All new nodes use auto_bootstrap=true (in order for allocate_tokens_for_keyspace=mydata_ks to take effect), the seeds ignore this setting and report it.  The non-seed nodes didn't try to create the system keyspaces on their own.
> 
> I would expect that even if we don't add the DC2 in advance, the new nodes should be able to learn about existing system keyspaces and wouldn't try to create their own.  Ultimately we will run `nodetool rebuild' on every node in DC2, but I would like to understand why this schema disagreement initially?
> 
> Thanks,
> -- 
> Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 127-59-707
>