You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ned Wolpert <ne...@imemories.com> on 2010/03/29 19:27:45 UTC

Question about node failure...

Folks-

Can someone point out what happens during a node failure. Here is the
Specific usecase:

  - Cassandra cluster with 4 nodes, replication factor of 3
  - One node fails.
  - At this point, data that existed on the one failed node has copies on 2
live nodes.
  - The failed node never comes back

First question: At what point does Cassandra re-migrate that data that only
exists on 2 nodes to another node to retain the replication factor of 3?

Second question: Given the above case, if a brand new node is added to the
cluster, does anything happen to the data that now only exists on 2 nodes?

Thanks.

-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe

Re: Question about node failure...

Posted by Jonathan Ellis <jb...@gmail.com>.

On Mon, Apr 5, 2010 at 5:20 PM, Rob Coli <rc...@digg.com> wrote:
> On 4/5/10 2:11 PM, Jonathan Ellis wrote:
>>
>> On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta<ts...@gmail.com>
>>  wrote:
>>>
>>> Perhaps it would be good to have convenience workflow for replacing
>>> broken host ("squashing lemons")? I would assume that most common use
>>
>>  [ snip ]
>> Does anyone have numbers on how badly "nodetool repair" sucks vs
>> bootstrap + removetoken?  If it's within a reasonable factor of
>> performance, then I'd say that's the easiest solution.
>
> As I understand it, a node which is in the midst of a "repair" operation is
> actually in a meaningfully different state from a node which is
> bootstrapping. The "repair"ing node can serve blank (?) data in the case
> where it is asked for data it should have but doesn't yet, with a
> ConsistencyLevel of ONE. AFAIK, there is no way to make a bootstrapping node
> return invalid responses in this way.

True enough.  Created https://issues.apache.org/jira/browse/CASSANDRA-957

Re: Question about node failure...

Posted by Rob Coli <rc...@digg.com>.

On 4/5/10 2:11 PM, Jonathan Ellis wrote:
> On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta<ts...@gmail.com>  wrote:
>> Perhaps it would be good to have convenience workflow for replacing
>> broken host ("squashing lemons")? I would assume that most common use
>  [ snip ]
> Does anyone have numbers on how badly "nodetool repair" sucks vs
> bootstrap + removetoken?  If it's within a reasonable factor of
> performance, then I'd say that's the easiest solution.

As I understand it, a node which is in the midst of a "repair" operation 
is actually in a meaningfully different state from a node which is 
bootstrapping. The "repair"ing node can serve blank (?) data in the case 
where it is asked for data it should have but doesn't yet, with a 
ConsistencyLevel of ONE. AFAIK, there is no way to make a bootstrapping 
node return invalid responses in this way.

Any details from people who are more familiar with the particular code 
path in question would of course be appreciated. :)

=Rob

Re: Question about node failure...

Posted by Jonathan Ellis <jb...@gmail.com>.

On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta <ts...@gmail.com> wrote:
> Perhaps it would be good to have convenience workflow for replacing
> broken host ("squashing lemons")? I would assume that most common use
> case is to effectively replace host that can't be repaired (or perhaps
> it might sometimes be best way to do it anyway), by combination of
> removing failed host, bringing in new one. Handling this is as
> high-level logical operation could be more efficient than doing it
> step by step.

Does anyone have numbers on how badly "nodetool repair" sucks vs
bootstrap + removetoken?  If it's within a reasonable factor of
performance, then I'd say that's the easiest solution.

Re: Question about node failure...

Posted by Tatu Saloranta <ts...@gmail.com>.

On Mon, Mar 29, 2010 at 10:40 AM, Ned Wolpert <ne...@imemories.com> wrote:
> So,  what does "anti-entropy repair" do then?

Fix discrepancies between live nodes? (caused by transient failures presumably)

> Sounds like you have to 'decommission' the dead node, then I thought run
> 'nodeprobe repair' to get the data adjusted back to a replication factor of
> 3, right?
>
> Also, what is the method to decommission a dead node? pass in the IP address
> of the dead node to nodeprobe on a member of the cluster? I've only used
> 'decommission' to remove the node I ran it on from the cluster... not a
> different node.
>
> It seems like if you decommission a node it should fix the replication
> factor for data that was on that node in this case...

Perhaps it would be good to have convenience workflow for replacing
broken host ("squashing lemons")? I would assume that most common use
case is to effectively replace host that can't be repaired (or perhaps
it might sometimes be best way to do it anyway), by combination of
removing failed host, bringing in new one. Handling this is as
high-level logical operation could be more efficient than doing it
step by step.

-+ Tatu +-

Re: Question about node failure...

Posted by Ned Wolpert <ne...@imemories.com>.

So,  what does "anti-entropy repair" do then?

Sounds like you have to 'decommission' the dead node, then I thought run
'nodeprobe repair' to get the data adjusted back to a replication factor of
3, right?

Also, what is the method to decommission a dead node? pass in the IP address
of the dead node to nodeprobe on a member of the cluster? I've only used
'decommission' to remove the node I ran it on from the cluster... not a
different node.

It seems like if you decommission a node it should fix the replication
factor for data that was on that node in this case...

On Mon, Mar 29, 2010 at 10:32 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Mon, Mar 29, 2010 at 12:27 PM, Ned Wolpert <ne...@imemories.com>
> wrote:
> > Folks-
> >
> > Can someone point out what happens during a node failure. Here is the
> > Specific usecase:
> >
> >   - Cassandra cluster with 4 nodes, replication factor of 3
> >   - One node fails.
> >   - At this point, data that existed on the one failed node has copies on
> 2
> > live nodes.
> >   - The failed node never comes back
> >
> > First question: At what point does Cassandra re-migrate that data that
> only
> > exists on 2 nodes to another node to retain the replication factor of 3?
>
> When you tell it to decommission the dead one.
>
> > Second question: Given the above case, if a brand new node is added to
> the
> > cluster, does anything happen to the data that now only exists on 2
> nodes?
>
> No, Cassandra doesn't automatically assume that "this node is never
> coming back" w/o intervention, by design.  (Temporary failures are
> much more common than permanent ones.)
>
> -Jonathan
>

-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe

Re: Question about node failure...

Posted by Jonathan Ellis <jb...@gmail.com>.

On Mon, Mar 29, 2010 at 12:27 PM, Ned Wolpert <ne...@imemories.com> wrote:
> Folks-
>
> Can someone point out what happens during a node failure. Here is the
> Specific usecase:
>
>   - Cassandra cluster with 4 nodes, replication factor of 3
>   - One node fails.
>   - At this point, data that existed on the one failed node has copies on 2
> live nodes.
>   - The failed node never comes back
>
> First question: At what point does Cassandra re-migrate that data that only
> exists on 2 nodes to another node to retain the replication factor of 3?

When you tell it to decommission the dead one.

> Second question: Given the above case, if a brand new node is added to the
> cluster, does anything happen to the data that now only exists on 2 nodes?

No, Cassandra doesn't automatically assume that "this node is never
coming back" w/o intervention, by design.  (Temporary failures are
much more common than permanent ones.)

-Jonathan