You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/06/17 16:40:47 UTC

[jira] [Created] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

Add startup option renew the NodeId (for counters)
--------------------------------------------------

                 Key: CASSANDRA-2788
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
             Project: Cassandra
          Issue Type: Improvement
    Affects Versions: 0.8.0
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
            Priority: Minor
             Fix For: 0.8.2
         Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch

If an sstable of a counter column family is corrupted, the only safe solution a user have right now is to:
# Remove the NodeId System table to force the node to regenerate a new NodeId (and thus stop incrementing on it's previous, corrupted, subcount)
# Remove all the sstables for that column family on that node (this is important because otherwise the node will never get "repaired" for it's previous subcount)

This is far from being ideal, but I think this is the price we pay for avoiding the read-before-write. In any case, the first step (remove the NodeId system table) happens to remove the list of the old NodeId this node has, which could prevent us for merging the other potential previous nodeId. This is ok but sub-optimal. This ticket proposes to add a new startup flag to make the node renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052174#comment-13052174 ] 

Hudson commented on CASSANDRA-2788:
-----------------------------------

Integrated in Cassandra-0.8 #178 (See [https://builds.apache.org/job/Cassandra-0.8/178/])
    

> Add startup option renew the NodeId (for counters)
> --------------------------------------------------
>
>                 Key: CASSANDRA-2788
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: counters
>             Fix For: 0.8.2
>
>         Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch
>
>
> If an sstable of a counter column family is corrupted, the only safe solution a user have right now is to:
> # Remove the NodeId System table to force the node to regenerate a new NodeId (and thus stop incrementing on it's previous, corrupted, subcount)
> # Remove all the sstables for that column family on that node (this is important because otherwise the node will never get "repaired" for it's previous subcount)
> This is far from being ideal, but I think this is the price we pay for avoiding the read-before-write. In any case, the first step (remove the NodeId system table) happens to remove the list of the old NodeId this node has, which could prevent us for merging the other potential previous nodeId. This is ok but sub-optimal. This ticket proposes to add a new startup flag to make the node renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2788:
----------------------------------------

    Fix Version/s:     (was: 0.8.2)
                   0.8.1

> Add startup option renew the NodeId (for counters)
> --------------------------------------------------
>
>                 Key: CASSANDRA-2788
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: counters
>             Fix For: 0.8.1
>
>         Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch
>
>
> If an sstable of a counter column family is corrupted, the only safe solution a user have right now is to:
> # Remove the NodeId System table to force the node to regenerate a new NodeId (and thus stop incrementing on it's previous, corrupted, subcount)
> # Remove all the sstables for that column family on that node (this is important because otherwise the node will never get "repaired" for it's previous subcount)
> This is far from being ideal, but I think this is the price we pay for avoiding the read-before-write. In any case, the first step (remove the NodeId system table) happens to remove the list of the old NodeId this node has, which could prevent us for merging the other potential previous nodeId. This is ok but sub-optimal. This ticket proposes to add a new startup flag to make the node renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2788:
----------------------------------------

    Attachment: 0001-Option-to-renew-the-NodeId-on-startup.patch

> Add startup option renew the NodeId (for counters)
> --------------------------------------------------
>
>                 Key: CASSANDRA-2788
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: counters
>             Fix For: 0.8.2
>
>         Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch
>
>
> If an sstable of a counter column family is corrupted, the only safe solution a user have right now is to:
> # Remove the NodeId System table to force the node to regenerate a new NodeId (and thus stop incrementing on it's previous, corrupted, subcount)
> # Remove all the sstables for that column family on that node (this is important because otherwise the node will never get "repaired" for it's previous subcount)
> This is far from being ideal, but I think this is the price we pay for avoiding the read-before-write. In any case, the first step (remove the NodeId system table) happens to remove the list of the old NodeId this node has, which could prevent us for merging the other potential previous nodeId. This is ok but sub-optimal. This ticket proposes to add a new startup flag to make the node renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051159#comment-13051159 ] 

Jonathan Ellis commented on CASSANDRA-2788:
-------------------------------------------

Pasting Sylvain's explanation from IRC:

{quote}
Let's me take a small example: Suppose two node A and B. Initially their node_id will be respectively A1 and B1. Each counter will thus have two components, A1 and B1.

Now suppose you renew the node_id of A -> A2 because of a corruption. Soon enough, the counters will have 3 components A1, A2 and B1. Renew that yet another time and the counter context will be A1, A2, A3 and B1. It grows, which is not cool.
But because we know that nobody will ever increment A1 and A2 anymore (A3 is the active node id for A), we can merge them (we have to wait for gc_grace and stuff for that be correct etc... but we do it)

So basically we try to keep the context as small as can be. If you nuke NodeIdInfo, right now the code won't be able to do that anymore and you will stay with a bigger that necessary context for all the counters.

So just renewing is more efficient in that sense. But nuking the system table is still 'correct' as far as returning the correct count is involved.
{quoted}

> Add startup option renew the NodeId (for counters)
> --------------------------------------------------
>
>                 Key: CASSANDRA-2788
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: counters
>             Fix For: 0.8.2
>
>         Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch
>
>
> If an sstable of a counter column family is corrupted, the only safe solution a user have right now is to:
> # Remove the NodeId System table to force the node to regenerate a new NodeId (and thus stop incrementing on it's previous, corrupted, subcount)
> # Remove all the sstables for that column family on that node (this is important because otherwise the node will never get "repaired" for it's previous subcount)
> This is far from being ideal, but I think this is the price we pay for avoiding the read-before-write. In any case, the first step (remove the NodeId system table) happens to remove the list of the old NodeId this node has, which could prevent us for merging the other potential previous nodeId. This is ok but sub-optimal. This ticket proposes to add a new startup flag to make the node renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051159#comment-13051159 ] 

Jonathan Ellis edited comment on CASSANDRA-2788 at 6/17/11 4:11 PM:
--------------------------------------------------------------------

Pasting Sylvain's explanation from IRC:

{quote}
Let's me take a small example: Suppose two node A and B. Initially their node_id will be respectively A1 and B1. Each counter will thus have two components, A1 and B1.

Now suppose you renew the node_id of A -> A2 because of a corruption. Soon enough, the counters will have 3 components A1, A2 and B1. Renew that yet another time and the counter context will be A1, A2, A3 and B1. It grows, which is not cool.
But because we know that nobody will ever increment A1 and A2 anymore (A3 is the active node id for A), we can merge them (we have to wait for gc_grace and stuff for that be correct etc... but we do it)

So basically we try to keep the context as small as can be. If you nuke NodeIdInfo, right now the code won't be able to do that anymore and you will stay with a bigger that necessary context for all the counters.

So just renewing is more efficient in that sense. But nuking the system table is still 'correct' as far as returning the correct count is involved.
{quote}

      was (Author: jbellis):
    Pasting Sylvain's explanation from IRC:

{quote}
Let's me take a small example: Suppose two node A and B. Initially their node_id will be respectively A1 and B1. Each counter will thus have two components, A1 and B1.

Now suppose you renew the node_id of A -> A2 because of a corruption. Soon enough, the counters will have 3 components A1, A2 and B1. Renew that yet another time and the counter context will be A1, A2, A3 and B1. It grows, which is not cool.
But because we know that nobody will ever increment A1 and A2 anymore (A3 is the active node id for A), we can merge them (we have to wait for gc_grace and stuff for that be correct etc... but we do it)

So basically we try to keep the context as small as can be. If you nuke NodeIdInfo, right now the code won't be able to do that anymore and you will stay with a bigger that necessary context for all the counters.

So just renewing is more efficient in that sense. But nuking the system table is still 'correct' as far as returning the correct count is involved.
{quoted}
  
> Add startup option renew the NodeId (for counters)
> --------------------------------------------------
>
>                 Key: CASSANDRA-2788
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: counters
>             Fix For: 0.8.2
>
>         Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch
>
>
> If an sstable of a counter column family is corrupted, the only safe solution a user have right now is to:
> # Remove the NodeId System table to force the node to regenerate a new NodeId (and thus stop incrementing on it's previous, corrupted, subcount)
> # Remove all the sstables for that column family on that node (this is important because otherwise the node will never get "repaired" for it's previous subcount)
> This is far from being ideal, but I think this is the price we pay for avoiding the read-before-write. In any case, the first step (remove the NodeId system table) happens to remove the list of the old NodeId this node has, which could prevent us for merging the other potential previous nodeId. This is ok but sub-optimal. This ticket proposes to add a new startup flag to make the node renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051158#comment-13051158 ] 

Jonathan Ellis commented on CASSANDRA-2788:
-------------------------------------------

+1

> Add startup option renew the NodeId (for counters)
> --------------------------------------------------
>
>                 Key: CASSANDRA-2788
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: counters
>             Fix For: 0.8.2
>
>         Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch
>
>
> If an sstable of a counter column family is corrupted, the only safe solution a user have right now is to:
> # Remove the NodeId System table to force the node to regenerate a new NodeId (and thus stop incrementing on it's previous, corrupted, subcount)
> # Remove all the sstables for that column family on that node (this is important because otherwise the node will never get "repaired" for it's previous subcount)
> This is far from being ideal, but I think this is the price we pay for avoiding the read-before-write. In any case, the first step (remove the NodeId system table) happens to remove the list of the old NodeId this node has, which could prevent us for merging the other potential previous nodeId. This is ok but sub-optimal. This ticket proposes to add a new startup flag to make the node renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira