You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Edward Capriolo (JIRA)" <ji...@apache.org> on 2010/11/15 18:15:14 UTC

[jira] Created: (CASSANDRA-1746) Cleanups should be less impacting

Cleanups should be less impacting
---------------------------------

                 Key: CASSANDRA-1746
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1746
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Edward Capriolo


When a new node is added its neighbours require cleanup. Cleanup is very performance impacting and for larger data sets takes a long time. You really do not get all the benefits of the new node until the neighbours are cleaned up.
Suggestion:
Configuration option that can be changed from JMX compaction_auto_cleanup := {true,false} set to false by default.
During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine we can remove it. 

This would save us from the heavy hammer of cleanup compaction. It would also be less book keeping for administrators.  
Most people would want to leave this at false, join new node, wait a few days. If the node has not failed by now, it likely will not. Set the flag to true and cleanup will happen over time. Users can still force clean up if they wish.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Resolved] (CASSANDRA-1746) Cleanups should be less impacting

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1746.
---------------------------------------

    Resolution: Won't Fix

We don't have an efficient way to ask "does key exist in hints" (hints are keyed by target).

> Cleanups should be less impacting
> ---------------------------------
>
>                 Key: CASSANDRA-1746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1746
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Priority: Minor
>
> When a new node is added its neighbours require cleanup. Cleanup is very performance impacting and for larger data sets takes a long time. You really do not get all the benefits of the new node until the neighbours are cleaned up.
> Suggestion:
> Configuration option that can be changed from JMX compaction_auto_cleanup := {true,false} set to false by default.
> During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine we can remove it. 
> This would save us from the heavy hammer of cleanup compaction. It would also be less book keeping for administrators.  
> Most people would want to leave this at false, join new node, wait a few days. If the node has not failed by now, it likely will not. Set the flag to true and cleanup will happen over time. Users can still force clean up if they wish.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-1746) Cleanups should be less impacting

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1746.
---------------------------------------

    Resolution: Won't Fix

bq. a user can not avoid to do this, because once they join a node and call cleanup they will have that one big table.

not so, cleanup is per-sstable since (iirc) 0.7.0

> Cleanups should be less impacting
> ---------------------------------
>
>                 Key: CASSANDRA-1746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1746
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Priority: Minor
>
> When a new node is added its neighbours require cleanup. Cleanup is very performance impacting and for larger data sets takes a long time. You really do not get all the benefits of the new node until the neighbours are cleaned up.
> Suggestion:
> Configuration option that can be changed from JMX compaction_auto_cleanup := {true,false} set to false by default.
> During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine we can remove it. 
> This would save us from the heavy hammer of cleanup compaction. It would also be less book keeping for administrators.  
> Most people would want to leave this at false, join new node, wait a few days. If the node has not failed by now, it likely will not. Set the flag to true and cleanup will happen over time. Users can still force clean up if they wish.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1746) Cleanups should be less impacting

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932129#action_12932129 ] 

Edward Capriolo commented on CASSANDRA-1746:
--------------------------------------------

http://wiki.apache.org/cassandra/HintedHandoff. It is unclear (to me) if and how CL.ANY rows get removed. Does it make sense that if a hint is being saved on a non-replica node it should be deleted after the hint is delivered? Should that be another issue?

Can we work around CL.ANY with addition to the logic:

During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine AND THE KEY IS NOT PRESENT IN A HINT TABLE we can remove it. 

> Cleanups should be less impacting
> ---------------------------------
>
>                 Key: CASSANDRA-1746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1746
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Priority: Minor
>
> When a new node is added its neighbours require cleanup. Cleanup is very performance impacting and for larger data sets takes a long time. You really do not get all the benefits of the new node until the neighbours are cleaned up.
> Suggestion:
> Configuration option that can be changed from JMX compaction_auto_cleanup := {true,false} set to false by default.
> During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine we can remove it. 
> This would save us from the heavy hammer of cleanup compaction. It would also be less book keeping for administrators.  
> Most people would want to leave this at false, join new node, wait a few days. If the node has not failed by now, it likely will not. Set the flag to true and cleanup will happen over time. Users can still force clean up if they wish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1746) Cleanups should be less impacting

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated CASSANDRA-1746:
---------------------------------------

    Priority: Minor  (was: Major)

> Cleanups should be less impacting
> ---------------------------------
>
>                 Key: CASSANDRA-1746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1746
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Priority: Minor
>
> When a new node is added its neighbours require cleanup. Cleanup is very performance impacting and for larger data sets takes a long time. You really do not get all the benefits of the new node until the neighbours are cleaned up.
> Suggestion:
> Configuration option that can be changed from JMX compaction_auto_cleanup := {true,false} set to false by default.
> During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine we can remove it. 
> This would save us from the heavy hammer of cleanup compaction. It would also be less book keeping for administrators.  
> Most people would want to leave this at false, join new node, wait a few days. If the node has not failed by now, it likely will not. Set the flag to true and cleanup will happen over time. Users can still force clean up if they wish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Issue Comment Edited] (CASSANDRA-1746) Cleanups should be less impacting

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027335#comment-13027335 ] 

Edward Capriolo edited comment on CASSANDRA-1746 at 4/30/11 2:58 PM:
---------------------------------------------------------------------

Sorry to re-open. I was thinking about this more. Hinted handoff is a best-effort system. Additionally, new options added to hinted handoff give the options to disable hinted handoff entirely or more interesting to this debate, stop collecting handoffs after a while.

If we are willing to stop delivering handoffs, minor compactions that remove them are not much different. 

Write operations at ANY are a problem, but not many use cases are writing at ANY. If someone is writing at ANY they can chose not to use this feature.

Also common knowledge says "you do not need to run major compaction anymore" because it will create one large SSTable which will take longer to remove in the next round of tombstoning. However a user can not avoid to do this, because once they join a node and call cleanup they will have that one big table.



      was (Author: appodictic):
    Sorry to re-open. I was thinking about this more. Hinted handoff is a best-effort system. Additionally, new options added to hinted handoff give the options to disable hinted handoff entirely or more interesting to this debate, stop collecting handoffs after a while.

If we are willing to stop delivering handoffs, minor compactions that remove them are not much different. 

Write operations at ANY are a problem, but not many use cases are writing at ANY. If someone is writing at ANY they can chose not to use this feature.

Also common knowledge says "you do not need to run major compaction anymore" because it will create one large SSTable which will take longer to remove in the next round of tombstoning. However a user can not help NOT to do this because once they join a node and call cleanup they will have that one big table they were trying to avoid.


  
> Cleanups should be less impacting
> ---------------------------------
>
>                 Key: CASSANDRA-1746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1746
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Priority: Minor
>
> When a new node is added its neighbours require cleanup. Cleanup is very performance impacting and for larger data sets takes a long time. You really do not get all the benefits of the new node until the neighbours are cleaned up.
> Suggestion:
> Configuration option that can be changed from JMX compaction_auto_cleanup := {true,false} set to false by default.
> During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine we can remove it. 
> This would save us from the heavy hammer of cleanup compaction. It would also be less book keeping for administrators.  
> Most people would want to leave this at false, join new node, wait a few days. If the node has not failed by now, it likely will not. Set the flag to true and cleanup will happen over time. Users can still force clean up if they wish.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (CASSANDRA-1746) Cleanups should be less impacting

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo reopened CASSANDRA-1746:
----------------------------------------


Sorry to re-open. I was thinking about this more. Hinted handoff is a best-effort system. Additionally, new options added to hinted handoff give the options to disable hinted handoff entirely or more interesting to this debate, stop collecting handoffs after a while.

If we are willing to stop delivering handoffs, minor compactions that remove them are not much different. 

Write operations at ANY are a problem, but not many use cases are writing at ANY. If someone is writing at ANY they can chose not to use this feature.

Also common knowledge says "you do not need to run major compaction anymore" because it will create one large SSTable which will take longer to remove in the next round of tombstoning. However a user can not help NOT to do this because once they join a node and call cleanup they will have that one big table they were trying to avoid.



> Cleanups should be less impacting
> ---------------------------------
>
>                 Key: CASSANDRA-1746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1746
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Priority: Minor
>
> When a new node is added its neighbours require cleanup. Cleanup is very performance impacting and for larger data sets takes a long time. You really do not get all the benefits of the new node until the neighbours are cleaned up.
> Suggestion:
> Configuration option that can be changed from JMX compaction_auto_cleanup := {true,false} set to false by default.
> During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine we can remove it. 
> This would save us from the heavy hammer of cleanup compaction. It would also be less book keeping for administrators.  
> Most people would want to leave this at false, join new node, wait a few days. If the node has not failed by now, it likely will not. Set the flag to true and cleanup will happen over time. Users can still force clean up if they wish.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1746) Cleanups should be less impacting

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932121#action_12932121 ] 

Jonathan Ellis commented on CASSANDRA-1746:
-------------------------------------------

The reason cleanup and compaction are different things is that rows that do not belong to the current node can be generated by CL.ANY writes as well as "left behind" by token changes.

> Cleanups should be less impacting
> ---------------------------------
>
>                 Key: CASSANDRA-1746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1746
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Priority: Minor
>
> When a new node is added its neighbours require cleanup. Cleanup is very performance impacting and for larger data sets takes a long time. You really do not get all the benefits of the new node until the neighbours are cleaned up.
> Suggestion:
> Configuration option that can be changed from JMX compaction_auto_cleanup := {true,false} set to false by default.
> During non major compaction if compaction_auto_cleanup flag is set to TRUE, we look at the natural endpoints for the key we are compacting. If the key does not belong on this machine we can remove it. 
> This would save us from the heavy hammer of cleanup compaction. It would also be less book keeping for administrators.  
> Most people would want to leave this at false, join new node, wait a few days. If the node has not failed by now, it likely will not. Set the flag to true and cleanup will happen over time. Users can still force clean up if they wish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.