You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Carlos Pérez Miguel <cp...@gmail.com> on 2013/01/16 13:55:41 UTC

read path, I have missed something

Hi,

I am trying to understand the read path in Cassandra. I've read Cassandra's
documentation and it seems that the read path is like this:

- Client contacts with a proxy node which performs the operation over
certain object
- Proxy node sends requests to every replica of that object
- Replica nodes answers eventually if they are up
- After the first R replicas answer, the proxy node returns value to the
client.
- If some of the replicas are non updated and readrepair is active, proxy
node updates those replicas.

Ok, so far so good.

But now I found some incoherences that I don't understand:

Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
each with replication factor 3, read_repair_chance=0.0, autobootstrap=false
and caching=NONE
We have keyspace KS1 and colunfamily CF1.

With this configuration, we know that if any node crashes and erases its
data directories it will be necesary to run nodetool repair in that node in
order to repair that node and gather information from its replica
companions.

So, let's suppose that x1, x2 and x3 are the endpoint which stores the data
KS1.CF1['data1']
If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1']
with consistency level ALL, the operation will fail. That is ok to my
understanding.

If we restart x1 node and doesn't execute nodetool repair and repeat the
operation get KS1.CF1['data1'] using consistency ALL, we will obtain the
original data! Why? one of the nodes doesn't have any data about
KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer,
even if one doesn't have data, the operation ends correctly.

Now let's repeat the same procedure with the rest of nodes, that is:

1- stop x1, erase data, logs, cache and commitlog from x1
2- restart x1 adn don't repair it
3- stop x2, erase data, logs, cache and commitlog from x2
4- restart x2 adn don't repair it
5- stop x3, erase data, logs, cache and commitlog from x3
6- restart x3 adn don't repair it
7- execute get KS1.CF1['data1'] with consistency level ALL -> still return
the correct data!

Where did that data come from? the endpoint is supposed to be empty of
data. I tried this using cassandra-cli and cassandra's ruby client and the
result is always the same. What did I miss?

Thank you for reading until the end, ;)

Bye

Carlos Pérez Miguel

Re: read path, I have missed something

Posted by aaron morton <aa...@thelastpickle.com>.

> In this case, how read repair happens in the replicas?
By default 90% of the reads will only read from 1 replica, and 10% will read from all. However the client request will *only* wait for one replica to return a value. And it has to be the replica that was asked to return the full data, not just a digest. 

> what should be the ideal value of read_repair_chance in this case?
It depends on what you see. If you are happy with the consistency you are getting then leave it with the default values. 
If you feel it's un necessary, because either you write at ALL or regularly run  repair and can handle the inconsistency you could turn it off completely. 

> how often do we need to run the repair on these nodes?
Read Repair (controlled by the read_reapir_chance) is an automatic process designed to reduce the probability of getting a Digest Mismatch at CL > ONE. When a digest mis match happens the read has to be run again, and that takes time. 

You should continue to use nodetool repair on each node once every gc_grace_seconds to ensure tombstones are distributed. 

Cheers

-----------------
Aaron Morton nodetool
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 9:01 PM, santi kumar <sa...@gmail.com> wrote:

> Sorry to intrude in this thread, but my intention is to get a clarity on read_repair_chance.
> 
> Our reads doesn't need near real time data, so all our reads use CL.ONE. In this case, how read repair happens in the replicas? what should be the ideal value of read_repair_chance in this case?
> how often do we need to run the repair on these nodes?
> 
> Thanks
> Santi
> 
> 
> On Thu, Jan 17, 2013 at 12:00 AM, Renato Marroquín Mogrovejo <re...@gmail.com> wrote:
> Thanks for the explanation Sylvain!
> 
> 2013/1/16 Sylvain Lebresne <sy...@datastax.com>:
> >> I mean if a node is down, then
> >> we get that node up and running again, wouldn't it be synchronized
> >> automatically?
> >
> >
> > It will, thanks to hinted handoff (not gossip, gossip only handle the ring
> > topology and a bunch of metadata, it doesn't deal with data synchronization
> > at all). But hinted handoff are not bulletproof (if only because hinted
> > handoff expire after some time if they are not delivered). And you're right,
> > that's probably why Carlos' example worked as he observed it, especially
> > since he didn't mentioned reads between his stop/erase/restart steps.
> > Anyway, my description of read_repair_chance is still correct if someone
> > wonder about that :)
> >
> > --
> > Sylvain
> >
> >
> >>
> >> Thanks!
> >>
> >>
> >> Renato M.
> >>
> >> 2013/1/16 Carlos Pérez Miguel <cp...@gmail.com>:
> >> > ahhhh, ok. Now I understand where the data came from. When using CL.ALL
> >> > read_repair always repairs inconsistent data.
> >> >
> >> > Thanks a lot, Sylvain.
> >> >
> >> >
> >> > Carlos Pérez Miguel
> >> >
> >> >
> >> > 2013/1/17 Sylvain Lebresne <sy...@datastax.com>
> >> >>
> >> >> You're missing the correct definition of read_repair_chance.
> >> >>
> >> >> When you do a read at CL.ALL, all replicas are wait upon and the
> >> >> results
> >> >> from all those replicas are compared. From that, we can extract which
> >> >> nodes
> >> >> are not up to date, i.e. which ones can be read repair. And if some
> >> >> node
> >> >> need to be repair, we do it. Always, whatever the value of
> >> >> read_repair_chance is.
> >> >>
> >> >> Now if you do a read at CL.ONE, if you only end up querying 1 replica,
> >> >> you
> >> >> will never be able to do read repair. That's where read_repair_chance
> >> >> come
> >> >> into play. What it really control, is how often we query *more* replica
> >> >> than
> >> >> strictly required by the consistency level. And it happens that the
> >> >> reason
> >> >> you would want to do that is because of read repair and hence the
> >> >> option
> >> >> name. But read repair potentially kicks in anytime more than replica
> >> >> answer
> >> >> a query. One corollary is that read_repair_chance has no impact
> >> >> whatsoever
> >> >> at CL.ALL.
> >> >>
> >> >> --
> >> >> Sylvain
> >> >>
> >> >>
> >> >> On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel
> >> >> <cp...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> I am trying to understand the read path in Cassandra. I've read
> >> >>> Cassandra's documentation and it seems that the read path is like
> >> >>> this:
> >> >>>
> >> >>> - Client contacts with a proxy node which performs the operation over
> >> >>> certain object
> >> >>> - Proxy node sends requests to every replica of that object
> >> >>> - Replica nodes answers eventually if they are up
> >> >>> - After the first R replicas answer, the proxy node returns value to
> >> >>> the
> >> >>> client.
> >> >>> - If some of the replicas are non updated and readrepair is active,
> >> >>> proxy
> >> >>> node updates those replicas.
> >> >>>
> >> >>> Ok, so far so good.
> >> >>>
> >> >>> But now I found some incoherences that I don't understand:
> >> >>>
> >> >>> Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
> >> >>> each with replication factor 3, read_repair_chance=0.0,
> >> >>> autobootstrap=false and caching=NONE
> >> >>> We have keyspace KS1 and colunfamily CF1.
> >> >>>
> >> >>> With this configuration, we know that if any node crashes and erases
> >> >>> its
> >> >>> data directories it will be necesary to run nodetool repair in that
> >> >>> node in
> >> >>> order to repair that node and gather information from its replica
> >> >>> companions.
> >> >>>
> >> >>> So, let's suppose that x1, x2 and x3 are the endpoint which stores the
> >> >>> data KS1.CF1['data1']
> >> >>> If x1 crashes (loosing all its data), and we execute get
> >> >>> KS1.CF1['data1']
> >> >>> with consistency level ALL, the operation will fail. That is ok to my
> >> >>> understanding.
> >> >>>
> >> >>> If we restart x1 node and doesn't execute nodetool repair and repeat
> >> >>> the
> >> >>> operation get KS1.CF1['data1'] using consistency ALL, we will obtain
> >> >>> the
> >> >>> original data! Why? one of the nodes doesn't have any data about
> >> >>> KS1.CF1['data1']. Ok, let's suppose that as all the required nodes
> >> >>> answer,
> >> >>> even if one doesn't have data, the operation ends correctly.
> >> >>>
> >> >>> Now let's repeat the same procedure with the rest of nodes, that is:
> >> >>>
> >> >>> 1- stop x1, erase data, logs, cache and commitlog from x1
> >> >>> 2- restart x1 adn don't repair it
> >> >>> 3- stop x2, erase data, logs, cache and commitlog from x2
> >> >>> 4- restart x2 adn don't repair it
> >> >>> 5- stop x3, erase data, logs, cache and commitlog from x3
> >> >>> 6- restart x3 adn don't repair it
> >> >>> 7- execute get KS1.CF1['data1'] with consistency level ALL -> still
> >> >>> return the correct data!
> >> >>>
> >> >>> Where did that data come from? the endpoint is supposed to be empty of
> >> >>> data. I tried this using cassandra-cli and cassandra's ruby client and
> >> >>> the
> >> >>> result is always the same. What did I miss?
> >> >>>
> >> >>> Thank you for reading until the end, ;)
> >> >>>
> >> >>> Bye
> >> >>>
> >> >>> Carlos Pérez Miguel
> >> >>
> >> >>
> >>
> >
>

Re: read path, I have missed something

Posted by santi kumar <sa...@gmail.com>.

Sorry to intrude in this thread, but my intention is to get a clarity on
read_repair_chance.

Our reads doesn't need near real time data, so all our reads use CL.ONE. In
this case, how read repair happens in the replicas? what should be the
ideal value of read_repair_chance in this case?
how often do we need to run the repair on these nodes?

Thanks
Santi


On Thu, Jan 17, 2013 at 12:00 AM, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Thanks for the explanation Sylvain!
>
> 2013/1/16 Sylvain Lebresne <sy...@datastax.com>:
> >> I mean if a node is down, then
> >> we get that node up and running again, wouldn't it be synchronized
> >> automatically?
> >
> >
> > It will, thanks to hinted handoff (not gossip, gossip only handle the
> ring
> > topology and a bunch of metadata, it doesn't deal with data
> synchronization
> > at all). But hinted handoff are not bulletproof (if only because hinted
> > handoff expire after some time if they are not delivered). And you're
> right,
> > that's probably why Carlos' example worked as he observed it, especially
> > since he didn't mentioned reads between his stop/erase/restart steps.
> > Anyway, my description of read_repair_chance is still correct if someone
> > wonder about that :)
> >
> > --
> > Sylvain
> >
> >
> >>
> >> Thanks!
> >>
> >>
> >> Renato M.
> >>
> >> 2013/1/16 Carlos Pérez Miguel <cp...@gmail.com>:
> >> > ahhhh, ok. Now I understand where the data came from. When using
> CL.ALL
> >> > read_repair always repairs inconsistent data.
> >> >
> >> > Thanks a lot, Sylvain.
> >> >
> >> >
> >> > Carlos Pérez Miguel
> >> >
> >> >
> >> > 2013/1/17 Sylvain Lebresne <sy...@datastax.com>
> >> >>
> >> >> You're missing the correct definition of read_repair_chance.
> >> >>
> >> >> When you do a read at CL.ALL, all replicas are wait upon and the
> >> >> results
> >> >> from all those replicas are compared. From that, we can extract which
> >> >> nodes
> >> >> are not up to date, i.e. which ones can be read repair. And if some
> >> >> node
> >> >> need to be repair, we do it. Always, whatever the value of
> >> >> read_repair_chance is.
> >> >>
> >> >> Now if you do a read at CL.ONE, if you only end up querying 1
> replica,
> >> >> you
> >> >> will never be able to do read repair. That's where read_repair_chance
> >> >> come
> >> >> into play. What it really control, is how often we query *more*
> replica
> >> >> than
> >> >> strictly required by the consistency level. And it happens that the
> >> >> reason
> >> >> you would want to do that is because of read repair and hence the
> >> >> option
> >> >> name. But read repair potentially kicks in anytime more than replica
> >> >> answer
> >> >> a query. One corollary is that read_repair_chance has no impact
> >> >> whatsoever
> >> >> at CL.ALL.
> >> >>
> >> >> --
> >> >> Sylvain
> >> >>
> >> >>
> >> >> On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel
> >> >> <cp...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> I am trying to understand the read path in Cassandra. I've read
> >> >>> Cassandra's documentation and it seems that the read path is like
> >> >>> this:
> >> >>>
> >> >>> - Client contacts with a proxy node which performs the operation
> over
> >> >>> certain object
> >> >>> - Proxy node sends requests to every replica of that object
> >> >>> - Replica nodes answers eventually if they are up
> >> >>> - After the first R replicas answer, the proxy node returns value to
> >> >>> the
> >> >>> client.
> >> >>> - If some of the replicas are non updated and readrepair is active,
> >> >>> proxy
> >> >>> node updates those replicas.
> >> >>>
> >> >>> Ok, so far so good.
> >> >>>
> >> >>> But now I found some incoherences that I don't understand:
> >> >>>
> >> >>> Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
> >> >>> each with replication factor 3, read_repair_chance=0.0,
> >> >>> autobootstrap=false and caching=NONE
> >> >>> We have keyspace KS1 and colunfamily CF1.
> >> >>>
> >> >>> With this configuration, we know that if any node crashes and erases
> >> >>> its
> >> >>> data directories it will be necesary to run nodetool repair in that
> >> >>> node in
> >> >>> order to repair that node and gather information from its replica
> >> >>> companions.
> >> >>>
> >> >>> So, let's suppose that x1, x2 and x3 are the endpoint which stores
> the
> >> >>> data KS1.CF1['data1']
> >> >>> If x1 crashes (loosing all its data), and we execute get
> >> >>> KS1.CF1['data1']
> >> >>> with consistency level ALL, the operation will fail. That is ok to
> my
> >> >>> understanding.
> >> >>>
> >> >>> If we restart x1 node and doesn't execute nodetool repair and repeat
> >> >>> the
> >> >>> operation get KS1.CF1['data1'] using consistency ALL, we will obtain
> >> >>> the
> >> >>> original data! Why? one of the nodes doesn't have any data about
> >> >>> KS1.CF1['data1']. Ok, let's suppose that as all the required nodes
> >> >>> answer,
> >> >>> even if one doesn't have data, the operation ends correctly.
> >> >>>
> >> >>> Now let's repeat the same procedure with the rest of nodes, that is:
> >> >>>
> >> >>> 1- stop x1, erase data, logs, cache and commitlog from x1
> >> >>> 2- restart x1 adn don't repair it
> >> >>> 3- stop x2, erase data, logs, cache and commitlog from x2
> >> >>> 4- restart x2 adn don't repair it
> >> >>> 5- stop x3, erase data, logs, cache and commitlog from x3
> >> >>> 6- restart x3 adn don't repair it
> >> >>> 7- execute get KS1.CF1['data1'] with consistency level ALL -> still
> >> >>> return the correct data!
> >> >>>
> >> >>> Where did that data come from? the endpoint is supposed to be empty
> of
> >> >>> data. I tried this using cassandra-cli and cassandra's ruby client
> and
> >> >>> the
> >> >>> result is always the same. What did I miss?
> >> >>>
> >> >>> Thank you for reading until the end, ;)
> >> >>>
> >> >>> Bye
> >> >>>
> >> >>> Carlos Pérez Miguel
> >> >>
> >> >>
> >>
> >
>

Re: read path, I have missed something

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.

Thanks for the explanation Sylvain!

2013/1/16 Sylvain Lebresne <sy...@datastax.com>:
>> I mean if a node is down, then
>> we get that node up and running again, wouldn't it be synchronized
>> automatically?
>
>
> It will, thanks to hinted handoff (not gossip, gossip only handle the ring
> topology and a bunch of metadata, it doesn't deal with data synchronization
> at all). But hinted handoff are not bulletproof (if only because hinted
> handoff expire after some time if they are not delivered). And you're right,
> that's probably why Carlos' example worked as he observed it, especially
> since he didn't mentioned reads between his stop/erase/restart steps.
> Anyway, my description of read_repair_chance is still correct if someone
> wonder about that :)
>
> --
> Sylvain
>
>
>>
>> Thanks!
>>
>>
>> Renato M.
>>
>> 2013/1/16 Carlos Pérez Miguel <cp...@gmail.com>:
>> > ahhhh, ok. Now I understand where the data came from. When using CL.ALL
>> > read_repair always repairs inconsistent data.
>> >
>> > Thanks a lot, Sylvain.
>> >
>> >
>> > Carlos Pérez Miguel
>> >
>> >
>> > 2013/1/17 Sylvain Lebresne <sy...@datastax.com>
>> >>
>> >> You're missing the correct definition of read_repair_chance.
>> >>
>> >> When you do a read at CL.ALL, all replicas are wait upon and the
>> >> results
>> >> from all those replicas are compared. From that, we can extract which
>> >> nodes
>> >> are not up to date, i.e. which ones can be read repair. And if some
>> >> node
>> >> need to be repair, we do it. Always, whatever the value of
>> >> read_repair_chance is.
>> >>
>> >> Now if you do a read at CL.ONE, if you only end up querying 1 replica,
>> >> you
>> >> will never be able to do read repair. That's where read_repair_chance
>> >> come
>> >> into play. What it really control, is how often we query *more* replica
>> >> than
>> >> strictly required by the consistency level. And it happens that the
>> >> reason
>> >> you would want to do that is because of read repair and hence the
>> >> option
>> >> name. But read repair potentially kicks in anytime more than replica
>> >> answer
>> >> a query. One corollary is that read_repair_chance has no impact
>> >> whatsoever
>> >> at CL.ALL.
>> >>
>> >> --
>> >> Sylvain
>> >>
>> >>
>> >> On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel
>> >> <cp...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I am trying to understand the read path in Cassandra. I've read
>> >>> Cassandra's documentation and it seems that the read path is like
>> >>> this:
>> >>>
>> >>> - Client contacts with a proxy node which performs the operation over
>> >>> certain object
>> >>> - Proxy node sends requests to every replica of that object
>> >>> - Replica nodes answers eventually if they are up
>> >>> - After the first R replicas answer, the proxy node returns value to
>> >>> the
>> >>> client.
>> >>> - If some of the replicas are non updated and readrepair is active,
>> >>> proxy
>> >>> node updates those replicas.
>> >>>
>> >>> Ok, so far so good.
>> >>>
>> >>> But now I found some incoherences that I don't understand:
>> >>>
>> >>> Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
>> >>> each with replication factor 3, read_repair_chance=0.0,
>> >>> autobootstrap=false and caching=NONE
>> >>> We have keyspace KS1 and colunfamily CF1.
>> >>>
>> >>> With this configuration, we know that if any node crashes and erases
>> >>> its
>> >>> data directories it will be necesary to run nodetool repair in that
>> >>> node in
>> >>> order to repair that node and gather information from its replica
>> >>> companions.
>> >>>
>> >>> So, let's suppose that x1, x2 and x3 are the endpoint which stores the
>> >>> data KS1.CF1['data1']
>> >>> If x1 crashes (loosing all its data), and we execute get
>> >>> KS1.CF1['data1']
>> >>> with consistency level ALL, the operation will fail. That is ok to my
>> >>> understanding.
>> >>>
>> >>> If we restart x1 node and doesn't execute nodetool repair and repeat
>> >>> the
>> >>> operation get KS1.CF1['data1'] using consistency ALL, we will obtain
>> >>> the
>> >>> original data! Why? one of the nodes doesn't have any data about
>> >>> KS1.CF1['data1']. Ok, let's suppose that as all the required nodes
>> >>> answer,
>> >>> even if one doesn't have data, the operation ends correctly.
>> >>>
>> >>> Now let's repeat the same procedure with the rest of nodes, that is:
>> >>>
>> >>> 1- stop x1, erase data, logs, cache and commitlog from x1
>> >>> 2- restart x1 adn don't repair it
>> >>> 3- stop x2, erase data, logs, cache and commitlog from x2
>> >>> 4- restart x2 adn don't repair it
>> >>> 5- stop x3, erase data, logs, cache and commitlog from x3
>> >>> 6- restart x3 adn don't repair it
>> >>> 7- execute get KS1.CF1['data1'] with consistency level ALL -> still
>> >>> return the correct data!
>> >>>
>> >>> Where did that data come from? the endpoint is supposed to be empty of
>> >>> data. I tried this using cassandra-cli and cassandra's ruby client and
>> >>> the
>> >>> result is always the same. What did I miss?
>> >>>
>> >>> Thank you for reading until the end, ;)
>> >>>
>> >>> Bye
>> >>>
>> >>> Carlos Pérez Miguel
>> >>
>> >>
>>
>

Re: read path, I have missed something

Posted by Sylvain Lebresne <sy...@datastax.com>.

>
> I mean if a node is down, then
> we get that node up and running again, wouldn't it be synchronized
> automatically?
>

It will, thanks to hinted handoff (not gossip, gossip only handle the ring
topology and a bunch of metadata, it doesn't deal with data synchronization
at all). But hinted handoff are not bulletproof (if only because hinted
handoff expire after some time if they are not delivered). And you're
right, that's probably why Carlos' example worked as he observed it,
especially since he didn't mentioned reads between his stop/erase/restart
steps. Anyway, my description of read_repair_chance is still correct if
someone wonder about that :)

--
Sylvain



> Thanks!
>
>
> Renato M.
>
> 2013/1/16 Carlos Pérez Miguel <cp...@gmail.com>:
> > ahhhh, ok. Now I understand where the data came from. When using CL.ALL
> > read_repair always repairs inconsistent data.
> >
> > Thanks a lot, Sylvain.
> >
> >
> > Carlos Pérez Miguel
> >
> >
> > 2013/1/17 Sylvain Lebresne <sy...@datastax.com>
> >>
> >> You're missing the correct definition of read_repair_chance.
> >>
> >> When you do a read at CL.ALL, all replicas are wait upon and the results
> >> from all those replicas are compared. From that, we can extract which
> nodes
> >> are not up to date, i.e. which ones can be read repair. And if some node
> >> need to be repair, we do it. Always, whatever the value of
> >> read_repair_chance is.
> >>
> >> Now if you do a read at CL.ONE, if you only end up querying 1 replica,
> you
> >> will never be able to do read repair. That's where read_repair_chance
> come
> >> into play. What it really control, is how often we query *more* replica
> than
> >> strictly required by the consistency level. And it happens that the
> reason
> >> you would want to do that is because of read repair and hence the option
> >> name. But read repair potentially kicks in anytime more than replica
> answer
> >> a query. One corollary is that read_repair_chance has no impact
> whatsoever
> >> at CL.ALL.
> >>
> >> --
> >> Sylvain
> >>
> >>
> >> On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel <
> cperezmig@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am trying to understand the read path in Cassandra. I've read
> >>> Cassandra's documentation and it seems that the read path is like this:
> >>>
> >>> - Client contacts with a proxy node which performs the operation over
> >>> certain object
> >>> - Proxy node sends requests to every replica of that object
> >>> - Replica nodes answers eventually if they are up
> >>> - After the first R replicas answer, the proxy node returns value to
> the
> >>> client.
> >>> - If some of the replicas are non updated and readrepair is active,
> proxy
> >>> node updates those replicas.
> >>>
> >>> Ok, so far so good.
> >>>
> >>> But now I found some incoherences that I don't understand:
> >>>
> >>> Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
> >>> each with replication factor 3, read_repair_chance=0.0,
> >>> autobootstrap=false and caching=NONE
> >>> We have keyspace KS1 and colunfamily CF1.
> >>>
> >>> With this configuration, we know that if any node crashes and erases
> its
> >>> data directories it will be necesary to run nodetool repair in that
> node in
> >>> order to repair that node and gather information from its replica
> >>> companions.
> >>>
> >>> So, let's suppose that x1, x2 and x3 are the endpoint which stores the
> >>> data KS1.CF1['data1']
> >>> If x1 crashes (loosing all its data), and we execute get
> KS1.CF1['data1']
> >>> with consistency level ALL, the operation will fail. That is ok to my
> >>> understanding.
> >>>
> >>> If we restart x1 node and doesn't execute nodetool repair and repeat
> the
> >>> operation get KS1.CF1['data1'] using consistency ALL, we will obtain
> the
> >>> original data! Why? one of the nodes doesn't have any data about
> >>> KS1.CF1['data1']. Ok, let's suppose that as all the required nodes
> answer,
> >>> even if one doesn't have data, the operation ends correctly.
> >>>
> >>> Now let's repeat the same procedure with the rest of nodes, that is:
> >>>
> >>> 1- stop x1, erase data, logs, cache and commitlog from x1
> >>> 2- restart x1 adn don't repair it
> >>> 3- stop x2, erase data, logs, cache and commitlog from x2
> >>> 4- restart x2 adn don't repair it
> >>> 5- stop x3, erase data, logs, cache and commitlog from x3
> >>> 6- restart x3 adn don't repair it
> >>> 7- execute get KS1.CF1['data1'] with consistency level ALL -> still
> >>> return the correct data!
> >>>
> >>> Where did that data come from? the endpoint is supposed to be empty of
> >>> data. I tried this using cassandra-cli and cassandra's ruby client and
> the
> >>> result is always the same. What did I miss?
> >>>
> >>> Thank you for reading until the end, ;)
> >>>
> >>> Bye
> >>>
> >>> Carlos Pérez Miguel
> >>
> >>
>
>

Re: read path, I have missed something

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.

Hi there,

I am sorry to get into this thread with more questions but isn't the
gossip protocol in charge of making the read_repair automatically
anytime a new node comes into the ring? I mean if a node is down, then
we get that node up and running again, wouldn't it be synchronized
automatically?
Thanks!


Renato M.

2013/1/16 Carlos Pérez Miguel <cp...@gmail.com>:
> ahhhh, ok. Now I understand where the data came from. When using CL.ALL
> read_repair always repairs inconsistent data.
>
> Thanks a lot, Sylvain.
>
>
> Carlos Pérez Miguel
>
>
> 2013/1/17 Sylvain Lebresne <sy...@datastax.com>
>>
>> You're missing the correct definition of read_repair_chance.
>>
>> When you do a read at CL.ALL, all replicas are wait upon and the results
>> from all those replicas are compared. From that, we can extract which nodes
>> are not up to date, i.e. which ones can be read repair. And if some node
>> need to be repair, we do it. Always, whatever the value of
>> read_repair_chance is.
>>
>> Now if you do a read at CL.ONE, if you only end up querying 1 replica, you
>> will never be able to do read repair. That's where read_repair_chance come
>> into play. What it really control, is how often we query *more* replica than
>> strictly required by the consistency level. And it happens that the reason
>> you would want to do that is because of read repair and hence the option
>> name. But read repair potentially kicks in anytime more than replica answer
>> a query. One corollary is that read_repair_chance has no impact whatsoever
>> at CL.ALL.
>>
>> --
>> Sylvain
>>
>>
>> On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel <cp...@gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to understand the read path in Cassandra. I've read
>>> Cassandra's documentation and it seems that the read path is like this:
>>>
>>> - Client contacts with a proxy node which performs the operation over
>>> certain object
>>> - Proxy node sends requests to every replica of that object
>>> - Replica nodes answers eventually if they are up
>>> - After the first R replicas answer, the proxy node returns value to the
>>> client.
>>> - If some of the replicas are non updated and readrepair is active, proxy
>>> node updates those replicas.
>>>
>>> Ok, so far so good.
>>>
>>> But now I found some incoherences that I don't understand:
>>>
>>> Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
>>> each with replication factor 3, read_repair_chance=0.0,
>>> autobootstrap=false and caching=NONE
>>> We have keyspace KS1 and colunfamily CF1.
>>>
>>> With this configuration, we know that if any node crashes and erases its
>>> data directories it will be necesary to run nodetool repair in that node in
>>> order to repair that node and gather information from its replica
>>> companions.
>>>
>>> So, let's suppose that x1, x2 and x3 are the endpoint which stores the
>>> data KS1.CF1['data1']
>>> If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1']
>>> with consistency level ALL, the operation will fail. That is ok to my
>>> understanding.
>>>
>>> If we restart x1 node and doesn't execute nodetool repair and repeat the
>>> operation get KS1.CF1['data1'] using consistency ALL, we will obtain the
>>> original data! Why? one of the nodes doesn't have any data about
>>> KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer,
>>> even if one doesn't have data, the operation ends correctly.
>>>
>>> Now let's repeat the same procedure with the rest of nodes, that is:
>>>
>>> 1- stop x1, erase data, logs, cache and commitlog from x1
>>> 2- restart x1 adn don't repair it
>>> 3- stop x2, erase data, logs, cache and commitlog from x2
>>> 4- restart x2 adn don't repair it
>>> 5- stop x3, erase data, logs, cache and commitlog from x3
>>> 6- restart x3 adn don't repair it
>>> 7- execute get KS1.CF1['data1'] with consistency level ALL -> still
>>> return the correct data!
>>>
>>> Where did that data come from? the endpoint is supposed to be empty of
>>> data. I tried this using cassandra-cli and cassandra's ruby client and the
>>> result is always the same. What did I miss?
>>>
>>> Thank you for reading until the end, ;)
>>>
>>> Bye
>>>
>>> Carlos Pérez Miguel
>>
>>
>

Re: read path, I have missed something

Posted by Carlos Pérez Miguel <cp...@gmail.com>.

ahhhh, ok. Now I understand where the data came from. When using CL.ALL
read_repair always repairs inconsistent data.

Thanks a lot, Sylvain.


Carlos Pérez Miguel


2013/1/17 Sylvain Lebresne <sy...@datastax.com>

> You're missing the correct definition of read_repair_chance.
>
> When you do a read at CL.ALL, all replicas are wait upon and the results
> from all those replicas are compared. From that, we can extract which nodes
> are not up to date, i.e. which ones can be read repair. And if some node
> need to be repair, we do it. Always, whatever the value of
> read_repair_chance is.
>
> Now if you do a read at CL.ONE, if you only end up querying 1 replica, you
> will never be able to do read repair. That's where read_repair_chance come
> into play. What it really control, is how often we query *more* replica
> than strictly required by the consistency level. And it happens that the
> reason you would want to do that is because of read repair and hence the
> option name. But read repair potentially kicks in anytime more than replica
> answer a query. One corollary is that read_repair_chance has no impact
> whatsoever at CL.ALL.
>
> --
> Sylvain
>
>
> On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel <cp...@gmail.com>wrote:
>
>> Hi,
>>
>> I am trying to understand the read path in Cassandra. I've read
>> Cassandra's documentation and it seems that the read path is like this:
>>
>> - Client contacts with a proxy node which performs the operation over
>> certain object
>> - Proxy node sends requests to every replica of that object
>> - Replica nodes answers eventually if they are up
>> - After the first R replicas answer, the proxy node returns value to the
>> client.
>> - If some of the replicas are non updated and readrepair is active, proxy
>> node updates those replicas.
>>
>> Ok, so far so good.
>>
>> But now I found some incoherences that I don't understand:
>>
>> Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
>> each with replication factor 3, read_repair_chance=0.0,
>> autobootstrap=false and caching=NONE
>> We have keyspace KS1 and colunfamily CF1.
>>
>> With this configuration, we know that if any node crashes and erases its
>> data directories it will be necesary to run nodetool repair in that node in
>> order to repair that node and gather information from its replica
>> companions.
>>
>> So, let's suppose that x1, x2 and x3 are the endpoint which stores the
>> data KS1.CF1['data1']
>> If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1']
>> with consistency level ALL, the operation will fail. That is ok to my
>> understanding.
>>
>> If we restart x1 node and doesn't execute nodetool repair and repeat the
>> operation get KS1.CF1['data1'] using consistency ALL, we will obtain the
>> original data! Why? one of the nodes doesn't have any data about
>> KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer,
>> even if one doesn't have data, the operation ends correctly.
>>
>> Now let's repeat the same procedure with the rest of nodes, that is:
>>
>> 1- stop x1, erase data, logs, cache and commitlog from x1
>> 2- restart x1 adn don't repair it
>> 3- stop x2, erase data, logs, cache and commitlog from x2
>> 4- restart x2 adn don't repair it
>> 5- stop x3, erase data, logs, cache and commitlog from x3
>> 6- restart x3 adn don't repair it
>> 7- execute get KS1.CF1['data1'] with consistency level ALL -> still
>> return the correct data!
>>
>> Where did that data come from? the endpoint is supposed to be empty of
>> data. I tried this using cassandra-cli and cassandra's ruby client and the
>> result is always the same. What did I miss?
>>
>> Thank you for reading until the end, ;)
>>
>> Bye
>>
>> Carlos Pérez Miguel
>>
>
>

Re: read path, I have missed something

Posted by Sylvain Lebresne <sy...@datastax.com>.

You're missing the correct definition of read_repair_chance.

When you do a read at CL.ALL, all replicas are wait upon and the results
from all those replicas are compared. From that, we can extract which nodes
are not up to date, i.e. which ones can be read repair. And if some node
need to be repair, we do it. Always, whatever the value of
read_repair_chance is.

Now if you do a read at CL.ONE, if you only end up querying 1 replica, you
will never be able to do read repair. That's where read_repair_chance come
into play. What it really control, is how often we query *more* replica
than strictly required by the consistency level. And it happens that the
reason you would want to do that is because of read repair and hence the
option name. But read repair potentially kicks in anytime more than replica
answer a query. One corollary is that read_repair_chance has no impact
whatsoever at CL.ALL.

--
Sylvain


On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel <cp...@gmail.com>wrote:

> Hi,
>
> I am trying to understand the read path in Cassandra. I've read
> Cassandra's documentation and it seems that the read path is like this:
>
> - Client contacts with a proxy node which performs the operation over
> certain object
> - Proxy node sends requests to every replica of that object
> - Replica nodes answers eventually if they are up
> - After the first R replicas answer, the proxy node returns value to the
> client.
> - If some of the replicas are non updated and readrepair is active, proxy
> node updates those replicas.
>
> Ok, so far so good.
>
> But now I found some incoherences that I don't understand:
>
> Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
> each with replication factor 3, read_repair_chance=0.0,
> autobootstrap=false and caching=NONE
> We have keyspace KS1 and colunfamily CF1.
>
> With this configuration, we know that if any node crashes and erases its
> data directories it will be necesary to run nodetool repair in that node in
> order to repair that node and gather information from its replica
> companions.
>
> So, let's suppose that x1, x2 and x3 are the endpoint which stores the
> data KS1.CF1['data1']
> If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1']
> with consistency level ALL, the operation will fail. That is ok to my
> understanding.
>
> If we restart x1 node and doesn't execute nodetool repair and repeat the
> operation get KS1.CF1['data1'] using consistency ALL, we will obtain the
> original data! Why? one of the nodes doesn't have any data about
> KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer,
> even if one doesn't have data, the operation ends correctly.
>
> Now let's repeat the same procedure with the rest of nodes, that is:
>
> 1- stop x1, erase data, logs, cache and commitlog from x1
> 2- restart x1 adn don't repair it
> 3- stop x2, erase data, logs, cache and commitlog from x2
> 4- restart x2 adn don't repair it
> 5- stop x3, erase data, logs, cache and commitlog from x3
> 6- restart x3 adn don't repair it
> 7- execute get KS1.CF1['data1'] with consistency level ALL -> still return
> the correct data!
>
> Where did that data come from? the endpoint is supposed to be empty of
> data. I tried this using cassandra-cli and cassandra's ruby client and the
> result is always the same. What did I miss?
>
> Thank you for reading until the end, ;)
>
> Bye
>
> Carlos Pérez Miguel
>