You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Hadmut Danisch <ha...@danisch.de> on 2015/11/26 16:10:08 UTC

Three questions about cassandra

Hi, 

I'm currently reading through heaps of docs and web pages to learn
cassandra, but there's still three questions I could not find answers
for, maybe someone could help:


1. What happens, if a node is down for some time (hours, days,
   weeks,...) for whatever reason (hardware, power, or network
   failure, maintenance...) and gets back online?

   Does the node remain in its former state and thus become
   inconsistent, have outdated data, or does it update the changes
   that occured during its downtime from other nodes?

   Can nodes be easily offline for some time, then return and proceed,
   or do they have to be added as a fresh node replacement (of their
   own) to start from scratch?



2. cassandra allows to choose from several data consistency levels,
   especially allowing write access that does not update all nodes
   (i.e. QUORUM, ONE, TWO, THREE). 

   What happens with those nodes who did not get an update? Will they
   synchronize with the updated nodes automatically, or will they
   remain in their old state (forever or until next explicit write
   access)?





3. What exactly happens, when a new node is added to a cluster? Will
   all records now belonging to the new node be automatically shifted
   from others?

   Web page
   http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
   describes a "streaming process", which sounds as if a new node was
   busy to collect it's belongings from others, but it also says to
   perform a
   
   nodetool cleanup

   on all the old nodes, which would "remove the keys no longer
   belonging to those nodes", which rather sounds like a simple drop,
   i.e. having those records lost. 

   So does cassandra safely fill new nodes, or do they start as empty
   ones and their data is lost?



Thank you!

regards
Hadmut

Re: Three questions about cassandra

Posted by daemeon reiydelle <da...@gmail.com>.
There is a window after a node goes down that changes that node should have
gotten will be kept. If the node is down LONGER than that, it will server
stale data. If the consistency is greater than two, its data will be
ignored (if consistency one, its data could be the first returned, if
consistency two then the application needs to be able to handle such a
situation. Nodetool repair needs to be run in this case to get data
consistent. Cleanup does more than make things pretty, but it will do that.

The comment about disabling the thrift listener is related to preventing
the node serving old data if the timeout I mention above has expired
between the time the node comes on line and the time the repair is
completed.

One of the advantages of using e.g. Ansible is that it can be configured to
whack an errant node's thrift listener BEFORE it starts the node's Cass
instance. Agent based tools like Puppet and Chef can have this magic
performed. This automatically start Cass vs. NOT automatically starting the
service sometimes makes for interesting religious wars. And obviously if
the node didn't stop but just lost network connections, there are
advantages to agent based tools.





*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Nov 27, 2015 at 3:51 AM, Hadmut Danisch <ha...@danisch.de> wrote:

> Thanks!
>
> Hadmut
>

Re: Three questions about cassandra

Posted by Hadmut Danisch <ha...@danisch.de>.
Thanks! 

Hadmut

Re: Three questions about cassandra

Posted by Jeff Jirsa <je...@crowdstrike.com>.
1) It comes online in its former state. The operator is responsible for consistency beyond that point. Common solutions would be `nodetool repair` (and if you get really smart, you can start the daemon with the thrift/native listeners disabled, run repair, and then enable listeners, so that when it DOES serve requests, they’re not out of date)

2) Consistency level tells cassandra how many replicas it will wait to acknowledge the write - it doesn’t necessarily tell us how many replicas will/won’t get the write (even writing at QUORUM, it’s likely that replicas will get the write). Those that do not may get the writes later via read repair, or explicit repair (`nodetool repair`).

3) Yes, joining nodes acquire a part of the token range, and data will be streamed to the joining node





On 11/26/15, 7:10 AM, "Hadmut Danisch" <ha...@danisch.de> wrote:

>Hi, 
>
>I'm currently reading through heaps of docs and web pages to learn
>cassandra, but there's still three questions I could not find answers
>for, maybe someone could help:
>
>
>1. What happens, if a node is down for some time (hours, days,
>   weeks,...) for whatever reason (hardware, power, or network
>   failure, maintenance...) and gets back online?
>
>   Does the node remain in its former state and thus become
>   inconsistent, have outdated data, or does it update the changes
>   that occured during its downtime from other nodes?
>
>   Can nodes be easily offline for some time, then return and proceed,
>   or do they have to be added as a fresh node replacement (of their
>   own) to start from scratch?
>
>
>
>2. cassandra allows to choose from several data consistency levels,
>   especially allowing write access that does not update all nodes
>   (i.e. QUORUM, ONE, TWO, THREE). 
>
>   What happens with those nodes who did not get an update? Will they
>   synchronize with the updated nodes automatically, or will they
>   remain in their old state (forever or until next explicit write
>   access)?
>
>
>
>
>
>3. What exactly happens, when a new node is added to a cluster? Will
>   all records now belonging to the new node be automatically shifted
>   from others?
>
>   Web page
>   http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>   describes a "streaming process", which sounds as if a new node was
>   busy to collect it's belongings from others, but it also says to
>   perform a
>   
>   nodetool cleanup
>
>   on all the old nodes, which would "remove the keys no longer
>   belonging to those nodes", which rather sounds like a simple drop,
>   i.e. having those records lost. 
>
>   So does cassandra safely fill new nodes, or do they start as empty
>   ones and their data is lost?
>
>
>
>Thank you!
>
>regards
>Hadmut