You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Frisch, Michael" <Mi...@nuance.com> on 2012/03/01 14:53:48 UTC

Secondary indexes don't go away after metadata change

I have a few column families that I decided to get rid of the secondary indexes on.  I see that there aren't any new index SSTables being created, but all of the old ones remain (some from as far back as September).  Is it safe to just delete then when the node is offline?  Should I run clean-up or scrub?

Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don't exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?

The nodes in question have either been upgraded from v0.8.1 => v1.0.2 (scrubbed at this time) => v1.0.6 or from v1.0.2 => v1.0.6.  The secondary index was dropped when the nodes were version 1.0.6.  The new node added was also 1.0.6.

- Mike

Re: Secondary indexes don't go away after metadata change

Posted by aaron morton <aa...@thelastpickle.com>.
Migrations are a delta (e.g. update CF).

I have a quick look at the code; it does appear to be cancelling any in progress index builds when an index is dropped. There may be other code that does it though. 

Perhaps indexes should not be built until the node has schema agreement with others. Not sure on the implications of that. 

Anyways can you please file a bug here https://issues.apache.org/jira/browse/CASSANDRA 

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/03/2012, at 10:32 AM, Frisch, Michael wrote:

> Sure enough it does.  Looking back in the logs when the node was first coming online I can see it applying migrations and submitting index builds on indexes that are deleted in the newest version of the schema.  This may be a silly question but shouldn’t it just apply the most recent version of the schema on a new node?  Is there a reason to apply the migrations?
>  
> - Mike
>  
> From: aaron morton [mailto:aaron@thelastpickle.com] 
> Sent: Tuesday, March 06, 2012 4:14 AM
> To: user@cassandra.apache.org
> Subject: Re: Secondary indexes don't go away after metadata change
>  
> When the new node comes online the history of schema changes are streamed to it. I've not looked at the code but it could be that schema migrations are creating Indexes. That are then deleted from the schema but not from the DB it's self.
>  
> Does that fit your scenario ? When the new node comes online does it log migrations been applied and then indexes been created ?
>  
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 6/03/2012, at 10:56 AM, Frisch, Michael wrote:
> 
> 
> Thank you very much for your response.  It is true that the older, previously existing nodes are not snapshotting the indexes that I had removed.  I’ll go ahead and just delete those SSTables from the data directory.  They may be around still because they were created back when we used 0.8.
>  
> The more troubling issue is with adding new nodes to the cluster though.  It built indexes for column families that have had all indexes dropped weeks or months in the past.  It also will snapshot the index SSTables that it created.  The index files are non-empty as well, some are hundreds of megabytes.
>  
> All nodes have the same schema, none list themselves as having the rows indexed.  I cannot drop the indexes via the CLI either because it says that they don’t exist.  It’s quite perplexing.
>  
> - Mike
>  
>  
> From: aaron morton [mailto:aaron@thelastpickle.com] 
> Sent: Monday, March 05, 2012 3:58 AM
> To: user@cassandra.apache.org
> Subject: Re: Secondary indexes don't go away after metadata change
>  
> The secondary index CF's are marked as no longer required / marked as compacted. under 1.x they would then be deleted reasonably quickly, and definitely deleted after a restart. 
>  
> Is there a zero length .Compacted file there ? 
>  
> Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don’t exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?
> Check you have a consistent schema using describe cluster in the CLI. And check the schema is what you think it is using show schema. 
>  
> Another trick is to do a snapshot. Only the files in use are included the snapshot. 
>  
> Hope that helps. 
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 2/03/2012, at 2:53 AM, Frisch, Michael wrote:
> 
> 
> 
> I have a few column families that I decided to get rid of the secondary indexes on.  I see that there aren’t any new index SSTables being created, but all of the old ones remain (some from as far back as September).  Is it safe to just delete then when the node is offline?  Should I run clean-up or scrub?
>  
> Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don’t exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?
>  
> The nodes in question have either been upgraded from v0.8.1 => v1.0.2 (scrubbed at this time) => v1.0.6 or from v1.0.2 => v1.0.6.  The secondary index was dropped when the nodes were version 1.0.6.  The new node added was also 1.0.6.
>  
> - Mike


RE: Secondary indexes don't go away after metadata change

Posted by "Frisch, Michael" <Mi...@nuance.com>.
Sure enough it does.  Looking back in the logs when the node was first coming online I can see it applying migrations and submitting index builds on indexes that are deleted in the newest version of the schema.  This may be a silly question but shouldn't it just apply the most recent version of the schema on a new node?  Is there a reason to apply the migrations?

- Mike

From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: Tuesday, March 06, 2012 4:14 AM
To: user@cassandra.apache.org
Subject: Re: Secondary indexes don't go away after metadata change

When the new node comes online the history of schema changes are streamed to it. I've not looked at the code but it could be that schema migrations are creating Indexes. That are then deleted from the schema but not from the DB it's self.

Does that fit your scenario ? When the new node comes online does it log migrations been applied and then indexes been created ?

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 10:56 AM, Frisch, Michael wrote:


Thank you very much for your response.  It is true that the older, previously existing nodes are not snapshotting the indexes that I had removed.  I'll go ahead and just delete those SSTables from the data directory.  They may be around still because they were created back when we used 0.8.

The more troubling issue is with adding new nodes to the cluster though.  It built indexes for column families that have had all indexes dropped weeks or months in the past.  It also will snapshot the index SSTables that it created.  The index files are non-empty as well, some are hundreds of megabytes.

All nodes have the same schema, none list themselves as having the rows indexed.  I cannot drop the indexes via the CLI either because it says that they don't exist.  It's quite perplexing.

- Mike


From: aaron morton [mailto:aaron@thelastpickle.com]<mailto:[mailto:aaron@thelastpickle.com]>
Sent: Monday, March 05, 2012 3:58 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Secondary indexes don't go away after metadata change

The secondary index CF's are marked as no longer required / marked as compacted. under 1.x they would then be deleted reasonably quickly, and definitely deleted after a restart.

Is there a zero length .Compacted file there ?

Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don't exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?
Check you have a consistent schema using describe cluster in the CLI. And check the schema is what you think it is using show schema.

Another trick is to do a snapshot. Only the files in use are included the snapshot.

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/03/2012, at 2:53 AM, Frisch, Michael wrote:



I have a few column families that I decided to get rid of the secondary indexes on.  I see that there aren't any new index SSTables being created, but all of the old ones remain (some from as far back as September).  Is it safe to just delete then when the node is offline?  Should I run clean-up or scrub?

Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don't exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?

The nodes in question have either been upgraded from v0.8.1 => v1.0.2 (scrubbed at this time) => v1.0.6 or from v1.0.2 => v1.0.6.  The secondary index was dropped when the nodes were version 1.0.6.  The new node added was also 1.0.6.

- Mike


Re: Secondary indexes don't go away after metadata change

Posted by aaron morton <aa...@thelastpickle.com>.
When the new node comes online the history of schema changes are streamed to it. I've not looked at the code but it could be that schema migrations are creating Indexes. That are then deleted from the schema but not from the DB it's self.

Does that fit your scenario ? When the new node comes online does it log migrations been applied and then indexes been created ?

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 10:56 AM, Frisch, Michael wrote:

> Thank you very much for your response.  It is true that the older, previously existing nodes are not snapshotting the indexes that I had removed.  I’ll go ahead and just delete those SSTables from the data directory.  They may be around still because they were created back when we used 0.8.
>  
> The more troubling issue is with adding new nodes to the cluster though.  It built indexes for column families that have had all indexes dropped weeks or months in the past.  It also will snapshot the index SSTables that it created.  The index files are non-empty as well, some are hundreds of megabytes.
>  
> All nodes have the same schema, none list themselves as having the rows indexed.  I cannot drop the indexes via the CLI either because it says that they don’t exist.  It’s quite perplexing.
>  
> - Mike
>  
>  
> From: aaron morton [mailto:aaron@thelastpickle.com] 
> Sent: Monday, March 05, 2012 3:58 AM
> To: user@cassandra.apache.org
> Subject: Re: Secondary indexes don't go away after metadata change
>  
> The secondary index CF's are marked as no longer required / marked as compacted. under 1.x they would then be deleted reasonably quickly, and definitely deleted after a restart. 
>  
> Is there a zero length .Compacted file there ? 
>  
> Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don’t exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?
> Check you have a consistent schema using describe cluster in the CLI. And check the schema is what you think it is using show schema. 
>  
> Another trick is to do a snapshot. Only the files in use are included the snapshot. 
>  
> Hope that helps. 
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 2/03/2012, at 2:53 AM, Frisch, Michael wrote:
> 
> 
> I have a few column families that I decided to get rid of the secondary indexes on.  I see that there aren’t any new index SSTables being created, but all of the old ones remain (some from as far back as September).  Is it safe to just delete then when the node is offline?  Should I run clean-up or scrub?
>  
> Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don’t exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?
>  
> The nodes in question have either been upgraded from v0.8.1 => v1.0.2 (scrubbed at this time) => v1.0.6 or from v1.0.2 => v1.0.6.  The secondary index was dropped when the nodes were version 1.0.6.  The new node added was also 1.0.6.
>  
> - Mike


RE: Secondary indexes don't go away after metadata change

Posted by "Frisch, Michael" <Mi...@nuance.com>.
Thank you very much for your response.  It is true that the older, previously existing nodes are not snapshotting the indexes that I had removed.  I'll go ahead and just delete those SSTables from the data directory.  They may be around still because they were created back when we used 0.8.

The more troubling issue is with adding new nodes to the cluster though.  It built indexes for column families that have had all indexes dropped weeks or months in the past.  It also will snapshot the index SSTables that it created.  The index files are non-empty as well, some are hundreds of megabytes.

All nodes have the same schema, none list themselves as having the rows indexed.  I cannot drop the indexes via the CLI either because it says that they don't exist.  It's quite perplexing.

- Mike


From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: Monday, March 05, 2012 3:58 AM
To: user@cassandra.apache.org
Subject: Re: Secondary indexes don't go away after metadata change

The secondary index CF's are marked as no longer required / marked as compacted. under 1.x they would then be deleted reasonably quickly, and definitely deleted after a restart.

Is there a zero length .Compacted file there ?

Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don't exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?
Check you have a consistent schema using describe cluster in the CLI. And check the schema is what you think it is using show schema.

Another trick is to do a snapshot. Only the files in use are included the snapshot.

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/03/2012, at 2:53 AM, Frisch, Michael wrote:


I have a few column families that I decided to get rid of the secondary indexes on.  I see that there aren't any new index SSTables being created, but all of the old ones remain (some from as far back as September).  Is it safe to just delete then when the node is offline?  Should I run clean-up or scrub?

Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don't exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?

The nodes in question have either been upgraded from v0.8.1 => v1.0.2 (scrubbed at this time) => v1.0.6 or from v1.0.2 => v1.0.6.  The secondary index was dropped when the nodes were version 1.0.6.  The new node added was also 1.0.6.

- Mike


Re: Secondary indexes don't go away after metadata change

Posted by aaron morton <aa...@thelastpickle.com>.
The secondary index CF's are marked as no longer required / marked as compacted. under 1.x they would then be deleted reasonably quickly, and definitely deleted after a restart. 

Is there a zero length .Compacted file there ? 

> Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don’t exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?
Check you have a consistent schema using describe cluster in the CLI. And check the schema is what you think it is using show schema. 

Another trick is to do a snapshot. Only the files in use are included the snapshot. 

Hope that helps. 
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/03/2012, at 2:53 AM, Frisch, Michael wrote:

> I have a few column families that I decided to get rid of the secondary indexes on.  I see that there aren’t any new index SSTables being created, but all of the old ones remain (some from as far back as September).  Is it safe to just delete then when the node is offline?  Should I run clean-up or scrub?
>  
> Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don’t exist any longer.  Is this supposed to happen?  Would this have happened if I had deleted the old SSTables from the previously existing nodes?
>  
> The nodes in question have either been upgraded from v0.8.1 => v1.0.2 (scrubbed at this time) => v1.0.6 or from v1.0.2 => v1.0.6.  The secondary index was dropped when the nodes were version 1.0.6.  The new node added was also 1.0.6.
>  
> - Mike