You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Cathy Daw (JIRA)" <ji...@apache.org> on 2011/07/23 02:59:09 UTC

[jira] [Created] (CASSANDRA-2942) If you drop a CF when one node is down the files are orphaned on the downed node

If you drop a CF when one node is down the files are orphaned on the downed node
--------------------------------------------------------------------------------

                 Key: CASSANDRA-2942
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2942
             Project: Cassandra
          Issue Type: Bug
            Reporter: Cathy Daw
            Priority: Minor



* Bring up 3 node cluster
* From node1: Run Stress Tool
{code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
* Shutdown node3
* From node1: drop the Standard1 CF in Keyspace1
* Shutdown node2 and node3
* Bring up node1 and node2. Check that the Standard1 files are gone.
{code}
ls -al /var/lib/cassandra/data/Keyspace1/
{code}
* Bring up node3. The log file shows the drop column family occurs
{code}
 INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
{code}
* Restart node3 to clear out dropped tables from the filesystem
{code}
root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
total 36
drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
-rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
-rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
-rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
-rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
-rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
{code}
*Bug:  The files for Standard1 are orphaned on node3*



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2942) If you drop a CF when one node is down the files are orphaned on the downed node

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092955#comment-13092955 ] 

Hudson commented on CASSANDRA-2942:
-----------------------------------

Integrated in Cassandra #1054 (See [https://builds.apache.org/job/Cassandra/1054/])
    reduce window where dropped CF sstables may not be deleted
patch by jbellis; reviewed by slebresne for CASSANDRA-2942

jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1162849
Files : 
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLog.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/DeletionService.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/util/FileUtils.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java


> If you drop a CF when one node is down the files are orphaned on the downed node
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2942
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2942
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Cathy Daw
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: 2942.txt
>
>
> * Bring up 3 node cluster
> * From node1: Run Stress Tool
> {code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
> * Shutdown node3
> * From node1: drop the Standard1 CF in Keyspace1
> * Shutdown node2 and node3
> * Bring up node1 and node2. Check that the Standard1 files are gone.
> {code}
> ls -al /var/lib/cassandra/data/Keyspace1/
> {code}
> * Bring up node3. The log file shows the drop column family occurs
> {code}
>  INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
> {code}
> * Restart node3 to clear out dropped tables from the filesystem
> {code}
> root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
> total 36
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
> drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
> -rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
> -rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
> -rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
> -rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
> -rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
> {code}
> *Bug:  The files for Standard1 are orphaned on node3*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2942) If you drop a CF when one node is down the files are orphaned on the downed node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089239#comment-13089239 ] 

Jonathan Ellis commented on CASSANDRA-2942:
-------------------------------------------

bq. May be fixed by CASSANDRA-2521

2521 makes it substantially better, but there's still a window where you can miss deletes.

The "real" fix is to commitlog-ify schema changes, but that's outside our scope for the forseeable future.

Adding "wait for outstanding SSTableDeletingTasks" to our jvm shutdown hook would be almost as good.

> If you drop a CF when one node is down the files are orphaned on the downed node
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2942
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2942
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Cathy Daw
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>
> * Bring up 3 node cluster
> * From node1: Run Stress Tool
> {code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
> * Shutdown node3
> * From node1: drop the Standard1 CF in Keyspace1
> * Shutdown node2 and node3
> * Bring up node1 and node2. Check that the Standard1 files are gone.
> {code}
> ls -al /var/lib/cassandra/data/Keyspace1/
> {code}
> * Bring up node3. The log file shows the drop column family occurs
> {code}
>  INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
> {code}
> * Restart node3 to clear out dropped tables from the filesystem
> {code}
> root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
> total 36
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
> drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
> -rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
> -rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
> -rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
> -rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
> -rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
> {code}
> *Bug:  The files for Standard1 are orphaned on node3*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-2942) If you drop a CF when one node is down the files are orphaned on the downed node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2942:
-----------------------------------------

    Assignee: Sylvain Lebresne

You should be able to reproduce this even on a single node -- just drop a CF, then restart.  It only cleans out marked-for-delete files from known CFs.

May be fixed by CASSANDRA-2521.  Otherwise we can add "go ahead and clear out marked-for-delete files, even if they don't belong to an active CF" logic.

> If you drop a CF when one node is down the files are orphaned on the downed node
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2942
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2942
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Cathy Daw
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>
> * Bring up 3 node cluster
> * From node1: Run Stress Tool
> {code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
> * Shutdown node3
> * From node1: drop the Standard1 CF in Keyspace1
> * Shutdown node2 and node3
> * Bring up node1 and node2. Check that the Standard1 files are gone.
> {code}
> ls -al /var/lib/cassandra/data/Keyspace1/
> {code}
> * Bring up node3. The log file shows the drop column family occurs
> {code}
>  INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
> {code}
> * Restart node3 to clear out dropped tables from the filesystem
> {code}
> root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
> total 36
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
> drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
> -rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
> -rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
> -rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
> -rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
> -rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
> {code}
> *Bug:  The files for Standard1 are orphaned on node3*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2942) Dropped columnfamilies can leave orphaned data files that do not get cleared on restart

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2942:
--------------------------------------

    Description: 
* Bring up 3 node cluster
* From node1: Run Stress Tool
{code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
* Shutdown node3
* From node1: drop the Standard1 CF in Keyspace1
* Shutdown node2 and node3
* Bring up node1 and node2. Check that the Standard1 files are gone.
{code}
ls -al /var/lib/cassandra/data/Keyspace1/
{code}
* Bring up node3. The log file shows the drop column family occurs
{code}
 INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
{code}
* Restart node3 to clear out dropped tables from the filesystem
{code}
root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
total 36
drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
-rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
-rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
-rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
-rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
-rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
{code}
*Bug:  The files for Standard1 are orphaned on node3*



  was:

* Bring up 3 node cluster
* From node1: Run Stress Tool
{code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
* Shutdown node3
* From node1: drop the Standard1 CF in Keyspace1
* Shutdown node2 and node3
* Bring up node1 and node2. Check that the Standard1 files are gone.
{code}
ls -al /var/lib/cassandra/data/Keyspace1/
{code}
* Bring up node3. The log file shows the drop column family occurs
{code}
 INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
{code}
* Restart node3 to clear out dropped tables from the filesystem
{code}
root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
total 36
drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
-rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
-rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
-rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
-rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
-rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
{code}
*Bug:  The files for Standard1 are orphaned on node3*



        Summary: Dropped columnfamilies can leave orphaned data files that do not get cleared on restart  (was: If you drop a CF when one node is down the files are orphaned on the downed node)
    
> Dropped columnfamilies can leave orphaned data files that do not get cleared on restart
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2942
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2942
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Cathy Daw
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: 2942.txt
>
>
> * Bring up 3 node cluster
> * From node1: Run Stress Tool
> {code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
> * Shutdown node3
> * From node1: drop the Standard1 CF in Keyspace1
> * Shutdown node2 and node3
> * Bring up node1 and node2. Check that the Standard1 files are gone.
> {code}
> ls -al /var/lib/cassandra/data/Keyspace1/
> {code}
> * Bring up node3. The log file shows the drop column family occurs
> {code}
>  INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
> {code}
> * Restart node3 to clear out dropped tables from the filesystem
> {code}
> root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
> total 36
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
> drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
> -rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
> -rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
> -rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
> -rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
> -rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
> {code}
> *Bug:  The files for Standard1 are orphaned on node3*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2942) If you drop a CF when one node is down the files are orphaned on the downed node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2942:
--------------------------------------

    Attachment: 2942.txt

patch to wait for StorageService.tasks on shutdown.  Also moves CL segment deletion there.

> If you drop a CF when one node is down the files are orphaned on the downed node
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2942
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2942
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Cathy Daw
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: 2942.txt
>
>
> * Bring up 3 node cluster
> * From node1: Run Stress Tool
> {code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
> * Shutdown node3
> * From node1: drop the Standard1 CF in Keyspace1
> * Shutdown node2 and node3
> * Bring up node1 and node2. Check that the Standard1 files are gone.
> {code}
> ls -al /var/lib/cassandra/data/Keyspace1/
> {code}
> * Bring up node3. The log file shows the drop column family occurs
> {code}
>  INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
> {code}
> * Restart node3 to clear out dropped tables from the filesystem
> {code}
> root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
> total 36
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
> drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
> -rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
> -rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
> -rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
> -rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
> -rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
> {code}
> *Bug:  The files for Standard1 are orphaned on node3*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2942) If you drop a CF when one node is down the files are orphaned on the downed node

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092884#comment-13092884 ] 

Sylvain Lebresne commented on CASSANDRA-2942:
---------------------------------------------

nit: we could log an info message when awaitTermination returns false. It also look like DeletionService could just go away with with this.

But otherwise, +1. I agree it is good enough and not worth going for more complicated.

> If you drop a CF when one node is down the files are orphaned on the downed node
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2942
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2942
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Cathy Daw
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: 2942.txt
>
>
> * Bring up 3 node cluster
> * From node1: Run Stress Tool
> {code} stress --num-keys=10 --columns=10 --consistency-level=ALL --average-size-values --replication-factor=3 --nodes=node1,node2 {code}
> * Shutdown node3
> * From node1: drop the Standard1 CF in Keyspace1
> * Shutdown node2 and node3
> * Bring up node1 and node2. Check that the Standard1 files are gone.
> {code}
> ls -al /var/lib/cassandra/data/Keyspace1/
> {code}
> * Bring up node3. The log file shows the drop column family occurs
> {code}
>  INFO 00:51:25,742 Applying migration 9a76f880-b4c5-11e0-0000-8901a7c5c9ce Drop column family: Keyspace1.Standard1
> {code}
> * Restart node3 to clear out dropped tables from the filesystem
> {code}
> root@cathy3:~/cass-0.8/bin# ls -al /var/lib/cassandra/data/Keyspace1/
> total 36
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 .
> drwxr-xr-x 6 root root 4096 Jul 23 00:48 ..
> -rw-r--r-- 1 root root    0 Jul 23 00:51 Standard1-g-1-Compacted
> -rw-r--r-- 2 root root 5770 Jul 23 00:51 Standard1-g-1-Data.db
> -rw-r--r-- 2 root root   32 Jul 23 00:51 Standard1-g-1-Filter.db
> -rw-r--r-- 2 root root  120 Jul 23 00:51 Standard1-g-1-Index.db
> -rw-r--r-- 2 root root 4276 Jul 23 00:51 Standard1-g-1-Statistics.db
> drwxr-xr-x 3 root root 4096 Jul 23 00:51 snapshots
> {code}
> *Bug:  The files for Standard1 are orphaned on node3*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira