You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Drew Kutcharian <dr...@venarc.com> on 2012/01/06 19:03:38 UTC

How to reliably achieve unique constraints with Cassandra?

Hi Everyone,

What's the best way to reliably have unique constraints like functionality with Cassandra? I have the following (which I think should be very common) use case.

User CF
Row Key: user email
Columns: userId: UUID, etc...

UserAttribute1 CF:
Row Key: userId (which is the uuid that's mapped to user email)
Columns: ...

UserAttribute2 CF:
Row Key: userId (which is the uuid that's mapped to user email)
Columns: ...

The issue is we need to guarantee that no two people register with the same email address. In addition, without locking, potentially a malicious user can "hijack" another user's account by registering using the user's email address.

I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues.

In addition, I know this can be done with other tools such as Redis (use Redis for this use case, and Cassandra for everything else), but I'm interested in hearing if anyone has solved this issue using Cassandra only.

Thanks,

Drew

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Jeremiah Jordan <je...@morningstar.com>.

Since a Zookeeper cluster is a quorum based system similar to Cassandra, 
it only goes down when n/2 nodes go down.  And the same way you have to 
stop writing to Cassandra if N/2 nodes are down (if using QUoRUM), your 
App will have to wait for the Zookeeper cluster to come online again 
before it can proceed.

On 01/06/2012 12:03 PM, Drew Kutcharian wrote:
> Hi Everyone,
>
> What's the best way to reliably have unique constraints like functionality with Cassandra? I have the following (which I think should be very common) use case.
>
> User CF
> Row Key: user email
> Columns: userId: UUID, etc...
>
> UserAttribute1 CF:
> Row Key: userId (which is the uuid that's mapped to user email)
> Columns: ...
>
> UserAttribute2 CF:
> Row Key: userId (which is the uuid that's mapped to user email)
> Columns: ...
>
> The issue is we need to guarantee that no two people register with the same email address. In addition, without locking, potentially a malicious user can "hijack" another user's account by registering using the user's email address.
>
> I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues.
>
> In addition, I know this can be done with other tools such as Redis (use Redis for this use case, and Cassandra for everything else), but I'm interested in hearing if anyone has solved this issue using Cassandra only.
>
> Thanks,
>
> Drew

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Mohit Anchlia <mo...@gmail.com>.

I don't think if you read and write with QUORUM

On Fri, Jan 6, 2012 at 11:01 AM, Drew Kutcharian <dr...@venarc.com> wrote:
> Yes, my issue is with handling concurrent requests. I'm not sure how your logic will work with eventual consistency. I'm going to have the same issue in the "tracker" CF too, no?
>
>
> On Jan 6, 2012, at 10:38 AM, Mohit Anchlia wrote:
>
>> On Fri, Jan 6, 2012 at 10:03 AM, Drew Kutcharian <dr...@venarc.com> wrote:
>>> Hi Everyone,
>>>
>>> What's the best way to reliably have unique constraints like functionality with Cassandra? I have the following (which I think should be very common) use case.
>>>
>>> User CF
>>> Row Key: user email
>>> Columns: userId: UUID, etc...
>>>
>>> UserAttribute1 CF:
>>> Row Key: userId (which is the uuid that's mapped to user email)
>>> Columns: ...
>>>
>>> UserAttribute2 CF:
>>> Row Key: userId (which is the uuid that's mapped to user email)
>>> Columns: ...
>>>
>>> The issue is we need to guarantee that no two people register with the same email address. In addition, without locking, potentially a malicious user can "hijack" another user's account by registering using the user's email address.
>>
>> It could be as simple as reading before writing to make sure that
>> email doesn't exist. But I think you are looking at how to handle 2
>> concurrent requests for same email? Only way I can think of is:
>>
>> 1) Create new CF say tracker
>> 2) write email and time uuid to CF tracker
>> 3) read from CF tracker
>> 4) if you find a row other than yours then wait and read again from
>> tracker after few ms
>> 5) read from USER CF
>> 6) write if no rows in USER CF
>> 7) delete from tracker
>>
>> Please note you might have to modify this logic a little bit, but this
>> should give you some ideas of how to approach this problem without
>> locking.
>>
>> Regarding hijacking accounts, can you elaborate little more?
>>>
>>> I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues.
>>>
>>> In addition, I know this can be done with other tools such as Redis (use Redis for this use case, and Cassandra for everything else), but I'm interested in hearing if anyone has solved this issue using Cassandra only.
>>>
>>> Thanks,
>>>
>>> Drew
>

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Drew Kutcharian <dr...@venarc.com>.

Yes, my issue is with handling concurrent requests. I'm not sure how your logic will work with eventual consistency. I'm going to have the same issue in the "tracker" CF too, no?


On Jan 6, 2012, at 10:38 AM, Mohit Anchlia wrote:

> On Fri, Jan 6, 2012 at 10:03 AM, Drew Kutcharian <dr...@venarc.com> wrote:
>> Hi Everyone,
>> 
>> What's the best way to reliably have unique constraints like functionality with Cassandra? I have the following (which I think should be very common) use case.
>> 
>> User CF
>> Row Key: user email
>> Columns: userId: UUID, etc...
>> 
>> UserAttribute1 CF:
>> Row Key: userId (which is the uuid that's mapped to user email)
>> Columns: ...
>> 
>> UserAttribute2 CF:
>> Row Key: userId (which is the uuid that's mapped to user email)
>> Columns: ...
>> 
>> The issue is we need to guarantee that no two people register with the same email address. In addition, without locking, potentially a malicious user can "hijack" another user's account by registering using the user's email address.
> 
> It could be as simple as reading before writing to make sure that
> email doesn't exist. But I think you are looking at how to handle 2
> concurrent requests for same email? Only way I can think of is:
> 
> 1) Create new CF say tracker
> 2) write email and time uuid to CF tracker
> 3) read from CF tracker
> 4) if you find a row other than yours then wait and read again from
> tracker after few ms
> 5) read from USER CF
> 6) write if no rows in USER CF
> 7) delete from tracker
> 
> Please note you might have to modify this logic a little bit, but this
> should give you some ideas of how to approach this problem without
> locking.
> 
> Regarding hijacking accounts, can you elaborate little more?
>> 
>> I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues.
>> 
>> In addition, I know this can be done with other tools such as Redis (use Redis for this use case, and Cassandra for everything else), but I'm interested in hearing if anyone has solved this issue using Cassandra only.
>> 
>> Thanks,
>> 
>> Drew

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Mohit Anchlia <mo...@gmail.com>.

On Fri, Jan 6, 2012 at 1:41 PM, Bryce Allen <ba...@ci.uchicago.edu> wrote:
> I don't think it's just clock drift. There is also the period of time
> between when the client selects a timestamp, and when the data ends up
> committed to cassandra. That drift seems harder to control, when the
> nodes and/or clients are under load.

As suggested you control that by sleeping before reading. You are
worried about the edge case but this should work well for the use case
posted by original poster. For eg: How many people will try to create
account with the same email at the same time that will have issue
where none of the safety checks would work?

Your use case might be different and probably no tolerance whatsoever.
In that case C* probably is not the right thing to use anycase.

>
> I agree that it would be nice to have something like this in Cassandra
> core, but from the JIRA tickets it looks like this has been tried
> before, and for various reasons was not added. It's definitely
> non-trivial to get right.
>
> On Fri, 6 Jan 2012 13:33:02 -0800
> Mohit Anchlia <mo...@gmail.com> wrote:
>> This looks like right way to do it. But remember this still doesn't
>> gurantee if your clocks drifts way too much. But it's trade-off with
>> having to manage one additional component or use something internal to
>> C*. It would be good to see similar functionality implemented in C* so
>> that clients don't have to deal with it explicitly.
>>
>> On Fri, Jan 6, 2012 at 1:16 PM, Bryce Allen <ba...@ci.uchicago.edu>
>> wrote:
>> > This looks like it:
>> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html
>> >
>> > There's also some interesting JIRA tickets related to locking/CAS:
>> > https://issues.apache.org/jira/browse/CASSANDRA-2686
>> > https://issues.apache.org/jira/browse/CASSANDRA-48
>> >
>> > -Bryce
>> >
>> > On Fri, 06 Jan 2012 14:53:21 -0600
>> > Jeremiah Jordan <je...@morningstar.com> wrote:
>> >> Correct, any kind of locking in Cassandra requires clocks that are
>> >> in sync, and requires you to wait "possible clock out of sync time"
>> >> before reading to check if you got the lock, to prevent the issue
>> >> you describe below.
>> >>
>> >> There was a pretty detailed discussion of locking with only
>> >> Cassandra a month or so back on this list.
>> >>
>> >> -Jeremiah
>> >>
>> >> On 01/06/2012 02:42 PM, Bryce Allen wrote:
>> >> > On Fri, 6 Jan 2012 10:38:17 -0800
>> >> > Mohit Anchlia<mo...@gmail.com>  wrote:
>> >> >> It could be as simple as reading before writing to make sure
>> >> >> that email doesn't exist. But I think you are looking at how to
>> >> >> handle 2 concurrent requests for same email? Only way I can
>> >> >> think of is:
>> >> >>
>> >> >> 1) Create new CF say tracker
>> >> >> 2) write email and time uuid to CF tracker
>> >> >> 3) read from CF tracker
>> >> >> 4) if you find a row other than yours then wait and read again
>> >> >> from tracker after few ms
>> >> >> 5) read from USER CF
>> >> >> 6) write if no rows in USER CF
>> >> >> 7) delete from tracker
>> >> >>
>> >> >> Please note you might have to modify this logic a little bit,
>> >> >> but this should give you some ideas of how to approach this
>> >> >> problem without locking.
>> >> > Distributed locking is pretty subtle; I haven't seen a correct
>> >> > solution that uses just Cassandra, even with QUORUM read/write. I
>> >> > suspect it's not possible.
>> >> >
>> >> > With the above proposal, in step 4 two processes could both have
>> >> > inserted an entry in the tracker before either gets a chance to
>> >> > check, so you need a way to order the requests. I don't think the
>> >> > timestamp works for ordering, because it's set by the client
>> >> > (even the internal timestamp is set by the client), and will
>> >> > likely be different from when the data is actually committed and
>> >> > available to read by other clients.
>> >> >
>> >> > For example:
>> >> >
>> >> > * At time 0ms, client 1 starts insert of user@example.org
>> >> > * At time 1ms, client 2 also starts insert for user@example.org
>> >> > * At time 2ms, client 2 data is committed
>> >> > * At time 3ms, client 2 reads tracker and sees that it's the only
>> >> > one, so enters the critical section
>> >> > * At time 4ms, client 1 data is committed
>> >> > * At time 5ms, client 2 reads tracker, and sees that is not the
>> >> > only one, but since it has the lowest timestamp (0ms vs 1ms), it
>> >> > enters the critical section.
>> >> >
>> >> > I don't think Cassandra counters work for ordering either.
>> >> >
>> >> > This approach is similar to the Zookeeper lock recipe:
>> >> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
>> >> > but zookeeper has sequence nodes, which provide a consistent way
>> >> > of ordering the requests. Zookeeper also avoids the busy waiting.
>> >> >
>> >> > I'd be happy to be proven wrong. But even if it is possible, if
>> >> > it involves a lot of complexity and busy waiting it's probably
>> >> > not worth it. There's a reason people are using Zookeeper with
>> >> > Cassandra.
>> >> >
>> >> > -Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Bryce Allen <ba...@ci.uchicago.edu>.

I don't think it's just clock drift. There is also the period of time
between when the client selects a timestamp, and when the data ends up
committed to cassandra. That drift seems harder to control, when the
nodes and/or clients are under load.

I agree that it would be nice to have something like this in Cassandra
core, but from the JIRA tickets it looks like this has been tried
before, and for various reasons was not added. It's definitely
non-trivial to get right.

On Fri, 6 Jan 2012 13:33:02 -0800
Mohit Anchlia <mo...@gmail.com> wrote:
> This looks like right way to do it. But remember this still doesn't
> gurantee if your clocks drifts way too much. But it's trade-off with
> having to manage one additional component or use something internal to
> C*. It would be good to see similar functionality implemented in C* so
> that clients don't have to deal with it explicitly.
> 
> On Fri, Jan 6, 2012 at 1:16 PM, Bryce Allen <ba...@ci.uchicago.edu>
> wrote:
> > This looks like it:
> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html
> >
> > There's also some interesting JIRA tickets related to locking/CAS:
> > https://issues.apache.org/jira/browse/CASSANDRA-2686
> > https://issues.apache.org/jira/browse/CASSANDRA-48
> >
> > -Bryce
> >
> > On Fri, 06 Jan 2012 14:53:21 -0600
> > Jeremiah Jordan <je...@morningstar.com> wrote:
> >> Correct, any kind of locking in Cassandra requires clocks that are
> >> in sync, and requires you to wait "possible clock out of sync time"
> >> before reading to check if you got the lock, to prevent the issue
> >> you describe below.
> >>
> >> There was a pretty detailed discussion of locking with only
> >> Cassandra a month or so back on this list.
> >>
> >> -Jeremiah
> >>
> >> On 01/06/2012 02:42 PM, Bryce Allen wrote:
> >> > On Fri, 6 Jan 2012 10:38:17 -0800
> >> > Mohit Anchlia<mo...@gmail.com>  wrote:
> >> >> It could be as simple as reading before writing to make sure
> >> >> that email doesn't exist. But I think you are looking at how to
> >> >> handle 2 concurrent requests for same email? Only way I can
> >> >> think of is:
> >> >>
> >> >> 1) Create new CF say tracker
> >> >> 2) write email and time uuid to CF tracker
> >> >> 3) read from CF tracker
> >> >> 4) if you find a row other than yours then wait and read again
> >> >> from tracker after few ms
> >> >> 5) read from USER CF
> >> >> 6) write if no rows in USER CF
> >> >> 7) delete from tracker
> >> >>
> >> >> Please note you might have to modify this logic a little bit,
> >> >> but this should give you some ideas of how to approach this
> >> >> problem without locking.
> >> > Distributed locking is pretty subtle; I haven't seen a correct
> >> > solution that uses just Cassandra, even with QUORUM read/write. I
> >> > suspect it's not possible.
> >> >
> >> > With the above proposal, in step 4 two processes could both have
> >> > inserted an entry in the tracker before either gets a chance to
> >> > check, so you need a way to order the requests. I don't think the
> >> > timestamp works for ordering, because it's set by the client
> >> > (even the internal timestamp is set by the client), and will
> >> > likely be different from when the data is actually committed and
> >> > available to read by other clients.
> >> >
> >> > For example:
> >> >
> >> > * At time 0ms, client 1 starts insert of user@example.org
> >> > * At time 1ms, client 2 also starts insert for user@example.org
> >> > * At time 2ms, client 2 data is committed
> >> > * At time 3ms, client 2 reads tracker and sees that it's the only
> >> > one, so enters the critical section
> >> > * At time 4ms, client 1 data is committed
> >> > * At time 5ms, client 2 reads tracker, and sees that is not the
> >> > only one, but since it has the lowest timestamp (0ms vs 1ms), it
> >> > enters the critical section.
> >> >
> >> > I don't think Cassandra counters work for ordering either.
> >> >
> >> > This approach is similar to the Zookeeper lock recipe:
> >> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
> >> > but zookeeper has sequence nodes, which provide a consistent way
> >> > of ordering the requests. Zookeeper also avoids the busy waiting.
> >> >
> >> > I'd be happy to be proven wrong. But even if it is possible, if
> >> > it involves a lot of complexity and busy waiting it's probably
> >> > not worth it. There's a reason people are using Zookeeper with
> >> > Cassandra.
> >> >
> >> > -Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Mohit Anchlia <mo...@gmail.com>.

This looks like right way to do it. But remember this still doesn't
gurantee if your clocks drifts way too much. But it's trade-off with
having to manage one additional component or use something internal to
C*. It would be good to see similar functionality implemented in C* so
that clients don't have to deal with it explicitly.

On Fri, Jan 6, 2012 at 1:16 PM, Bryce Allen <ba...@ci.uchicago.edu> wrote:
> This looks like it:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html
>
> There's also some interesting JIRA tickets related to locking/CAS:
> https://issues.apache.org/jira/browse/CASSANDRA-2686
> https://issues.apache.org/jira/browse/CASSANDRA-48
>
> -Bryce
>
> On Fri, 06 Jan 2012 14:53:21 -0600
> Jeremiah Jordan <je...@morningstar.com> wrote:
>> Correct, any kind of locking in Cassandra requires clocks that are in
>> sync, and requires you to wait "possible clock out of sync time"
>> before reading to check if you got the lock, to prevent the issue you
>> describe below.
>>
>> There was a pretty detailed discussion of locking with only Cassandra
>> a month or so back on this list.
>>
>> -Jeremiah
>>
>> On 01/06/2012 02:42 PM, Bryce Allen wrote:
>> > On Fri, 6 Jan 2012 10:38:17 -0800
>> > Mohit Anchlia<mo...@gmail.com>  wrote:
>> >> It could be as simple as reading before writing to make sure that
>> >> email doesn't exist. But I think you are looking at how to handle 2
>> >> concurrent requests for same email? Only way I can think of is:
>> >>
>> >> 1) Create new CF say tracker
>> >> 2) write email and time uuid to CF tracker
>> >> 3) read from CF tracker
>> >> 4) if you find a row other than yours then wait and read again from
>> >> tracker after few ms
>> >> 5) read from USER CF
>> >> 6) write if no rows in USER CF
>> >> 7) delete from tracker
>> >>
>> >> Please note you might have to modify this logic a little bit, but
>> >> this should give you some ideas of how to approach this problem
>> >> without locking.
>> > Distributed locking is pretty subtle; I haven't seen a correct
>> > solution that uses just Cassandra, even with QUORUM read/write. I
>> > suspect it's not possible.
>> >
>> > With the above proposal, in step 4 two processes could both have
>> > inserted an entry in the tracker before either gets a chance to
>> > check, so you need a way to order the requests. I don't think the
>> > timestamp works for ordering, because it's set by the client (even
>> > the internal timestamp is set by the client), and will likely be
>> > different from when the data is actually committed and available to
>> > read by other clients.
>> >
>> > For example:
>> >
>> > * At time 0ms, client 1 starts insert of user@example.org
>> > * At time 1ms, client 2 also starts insert for user@example.org
>> > * At time 2ms, client 2 data is committed
>> > * At time 3ms, client 2 reads tracker and sees that it's the only
>> > one, so enters the critical section
>> > * At time 4ms, client 1 data is committed
>> > * At time 5ms, client 2 reads tracker, and sees that is not the only
>> >    one, but since it has the lowest timestamp (0ms vs 1ms), it
>> > enters the critical section.
>> >
>> > I don't think Cassandra counters work for ordering either.
>> >
>> > This approach is similar to the Zookeeper lock recipe:
>> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
>> > but zookeeper has sequence nodes, which provide a consistent way of
>> > ordering the requests. Zookeeper also avoids the busy waiting.
>> >
>> > I'd be happy to be proven wrong. But even if it is possible, if it
>> > involves a lot of complexity and busy waiting it's probably not
>> > worth it. There's a reason people are using Zookeeper with
>> > Cassandra.
>> >
>> > -Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Bryce Allen <ba...@ci.uchicago.edu>.

This looks like it:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html

There's also some interesting JIRA tickets related to locking/CAS:
https://issues.apache.org/jira/browse/CASSANDRA-2686
https://issues.apache.org/jira/browse/CASSANDRA-48

-Bryce

On Fri, 06 Jan 2012 14:53:21 -0600
Jeremiah Jordan <je...@morningstar.com> wrote:
> Correct, any kind of locking in Cassandra requires clocks that are in 
> sync, and requires you to wait "possible clock out of sync time"
> before reading to check if you got the lock, to prevent the issue you
> describe below.
> 
> There was a pretty detailed discussion of locking with only Cassandra
> a month or so back on this list.
> 
> -Jeremiah
> 
> On 01/06/2012 02:42 PM, Bryce Allen wrote:
> > On Fri, 6 Jan 2012 10:38:17 -0800
> > Mohit Anchlia<mo...@gmail.com>  wrote:
> >> It could be as simple as reading before writing to make sure that
> >> email doesn't exist. But I think you are looking at how to handle 2
> >> concurrent requests for same email? Only way I can think of is:
> >>
> >> 1) Create new CF say tracker
> >> 2) write email and time uuid to CF tracker
> >> 3) read from CF tracker
> >> 4) if you find a row other than yours then wait and read again from
> >> tracker after few ms
> >> 5) read from USER CF
> >> 6) write if no rows in USER CF
> >> 7) delete from tracker
> >>
> >> Please note you might have to modify this logic a little bit, but
> >> this should give you some ideas of how to approach this problem
> >> without locking.
> > Distributed locking is pretty subtle; I haven't seen a correct
> > solution that uses just Cassandra, even with QUORUM read/write. I
> > suspect it's not possible.
> >
> > With the above proposal, in step 4 two processes could both have
> > inserted an entry in the tracker before either gets a chance to
> > check, so you need a way to order the requests. I don't think the
> > timestamp works for ordering, because it's set by the client (even
> > the internal timestamp is set by the client), and will likely be
> > different from when the data is actually committed and available to
> > read by other clients.
> >
> > For example:
> >
> > * At time 0ms, client 1 starts insert of user@example.org
> > * At time 1ms, client 2 also starts insert for user@example.org
> > * At time 2ms, client 2 data is committed
> > * At time 3ms, client 2 reads tracker and sees that it's the only
> > one, so enters the critical section
> > * At time 4ms, client 1 data is committed
> > * At time 5ms, client 2 reads tracker, and sees that is not the only
> >    one, but since it has the lowest timestamp (0ms vs 1ms), it
> > enters the critical section.
> >
> > I don't think Cassandra counters work for ordering either.
> >
> > This approach is similar to the Zookeeper lock recipe:
> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
> > but zookeeper has sequence nodes, which provide a consistent way of
> > ordering the requests. Zookeeper also avoids the busy waiting.
> >
> > I'd be happy to be proven wrong. But even if it is possible, if it
> > involves a lot of complexity and busy waiting it's probably not
> > worth it. There's a reason people are using Zookeeper with
> > Cassandra.
> >
> > -Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Jeremiah Jordan <je...@morningstar.com>.

Correct, any kind of locking in Cassandra requires clocks that are in 
sync, and requires you to wait "possible clock out of sync time" before 
reading to check if you got the lock, to prevent the issue you describe 
below.

There was a pretty detailed discussion of locking with only Cassandra a 
month or so back on this list.

-Jeremiah

On 01/06/2012 02:42 PM, Bryce Allen wrote:
> On Fri, 6 Jan 2012 10:38:17 -0800
> Mohit Anchlia<mo...@gmail.com>  wrote:
>> It could be as simple as reading before writing to make sure that
>> email doesn't exist. But I think you are looking at how to handle 2
>> concurrent requests for same email? Only way I can think of is:
>>
>> 1) Create new CF say tracker
>> 2) write email and time uuid to CF tracker
>> 3) read from CF tracker
>> 4) if you find a row other than yours then wait and read again from
>> tracker after few ms
>> 5) read from USER CF
>> 6) write if no rows in USER CF
>> 7) delete from tracker
>>
>> Please note you might have to modify this logic a little bit, but this
>> should give you some ideas of how to approach this problem without
>> locking.
> Distributed locking is pretty subtle; I haven't seen a correct solution
> that uses just Cassandra, even with QUORUM read/write. I suspect it's
> not possible.
>
> With the above proposal, in step 4 two processes could both have
> inserted an entry in the tracker before either gets a chance to check,
> so you need a way to order the requests. I don't think the timestamp
> works for ordering, because it's set by the client (even the internal
> timestamp is set by the client), and will likely be different from
> when the data is actually committed and available to read by other
> clients.
>
> For example:
>
> * At time 0ms, client 1 starts insert of user@example.org
> * At time 1ms, client 2 also starts insert for user@example.org
> * At time 2ms, client 2 data is committed
> * At time 3ms, client 2 reads tracker and sees that it's the only one,
>    so enters the critical section
> * At time 4ms, client 1 data is committed
> * At time 5ms, client 2 reads tracker, and sees that is not the only
>    one, but since it has the lowest timestamp (0ms vs 1ms), it enters
>    the critical section.
>
> I don't think Cassandra counters work for ordering either.
>
> This approach is similar to the Zookeeper lock recipe:
> http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
> but zookeeper has sequence nodes, which provide a consistent way of
> ordering the requests. Zookeeper also avoids the busy waiting.
>
> I'd be happy to be proven wrong. But even if it is possible, if it
> involves a lot of complexity and busy waiting it's probably not worth
> it. There's a reason people are using Zookeeper with Cassandra.
>
> -Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Bryce Allen <ba...@ci.uchicago.edu>.

On Fri, 6 Jan 2012 10:38:17 -0800
Mohit Anchlia <mo...@gmail.com> wrote:
> It could be as simple as reading before writing to make sure that
> email doesn't exist. But I think you are looking at how to handle 2
> concurrent requests for same email? Only way I can think of is:
> 
> 1) Create new CF say tracker
> 2) write email and time uuid to CF tracker
> 3) read from CF tracker
> 4) if you find a row other than yours then wait and read again from
> tracker after few ms
> 5) read from USER CF
> 6) write if no rows in USER CF
> 7) delete from tracker
> 
> Please note you might have to modify this logic a little bit, but this
> should give you some ideas of how to approach this problem without
> locking.

Distributed locking is pretty subtle; I haven't seen a correct solution
that uses just Cassandra, even with QUORUM read/write. I suspect it's
not possible.

With the above proposal, in step 4 two processes could both have
inserted an entry in the tracker before either gets a chance to check,
so you need a way to order the requests. I don't think the timestamp
works for ordering, because it's set by the client (even the internal
timestamp is set by the client), and will likely be different from
when the data is actually committed and available to read by other
clients.

For example:

* At time 0ms, client 1 starts insert of user@example.org
* At time 1ms, client 2 also starts insert for user@example.org
* At time 2ms, client 2 data is committed
* At time 3ms, client 2 reads tracker and sees that it's the only one,
  so enters the critical section
* At time 4ms, client 1 data is committed
* At time 5ms, client 2 reads tracker, and sees that is not the only
  one, but since it has the lowest timestamp (0ms vs 1ms), it enters
  the critical section.

I don't think Cassandra counters work for ordering either.

This approach is similar to the Zookeeper lock recipe:
http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
but zookeeper has sequence nodes, which provide a consistent way of
ordering the requests. Zookeeper also avoids the busy waiting.

I'd be happy to be proven wrong. But even if it is possible, if it
involves a lot of complexity and busy waiting it's probably not worth
it. There's a reason people are using Zookeeper with Cassandra.

-Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Mohit Anchlia <mo...@gmail.com>.

On Fri, Jan 6, 2012 at 10:03 AM, Drew Kutcharian <dr...@venarc.com> wrote:
> Hi Everyone,
>
> What's the best way to reliably have unique constraints like functionality with Cassandra? I have the following (which I think should be very common) use case.
>
> User CF
> Row Key: user email
> Columns: userId: UUID, etc...
>
> UserAttribute1 CF:
> Row Key: userId (which is the uuid that's mapped to user email)
> Columns: ...
>
> UserAttribute2 CF:
> Row Key: userId (which is the uuid that's mapped to user email)
> Columns: ...
>
> The issue is we need to guarantee that no two people register with the same email address. In addition, without locking, potentially a malicious user can "hijack" another user's account by registering using the user's email address.

It could be as simple as reading before writing to make sure that
email doesn't exist. But I think you are looking at how to handle 2
concurrent requests for same email? Only way I can think of is:

1) Create new CF say tracker
2) write email and time uuid to CF tracker
3) read from CF tracker
4) if you find a row other than yours then wait and read again from
tracker after few ms
5) read from USER CF
6) write if no rows in USER CF
7) delete from tracker

Please note you might have to modify this logic a little bit, but this
should give you some ideas of how to approach this problem without
locking.

Regarding hijacking accounts, can you elaborate little more?
>
> I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues.
>
> In addition, I know this can be done with other tools such as Redis (use Redis for this use case, and Cassandra for everything else), but I'm interested in hearing if anyone has solved this issue using Cassandra only.
>
> Thanks,
>
> Drew

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Drew Kutcharian <dr...@venarc.com>.

It makes great sense. You're a genius!!


On Jan 6, 2012, at 10:43 PM, Narendra Sharma wrote:

> Instead of trying to solve the generic problem of uniqueness, I would focus on the specific problem. 
> 
> For eg lets consider your usecase of user registration with email address as key. You can do following:
> 1. Create CF (Users) where row key is UUID and has user info specific columns.
> 2. Whenever user registers create a row in this CF with user status flag as waiting for confirmation.
> 3. Send email to the user's email address with link that contains the UUID (or encrypted UUID)
> 4. When user clicks on the link, use the UUID (or decrypted UUID) to lookup user
> 5. If the user exists with given UUID and status as waiting for confirmation then update the status  and create a entry in another CF (EmailUUIDIndex) representing email address to UUID mapping.
> 6. For authentication you can lookup in the index to get UUID and proceed.
> 7. If a malicious user registers with someone else's email id then he will never be able to confirm and will never have an entry in EmailUUIDIndex. As a additional check if the entry for email id exists in EmailUUIDIndex then the request for registration can be rejected right away.
> 
> Make sense?
> 
> -Naren
> 
> On Fri, Jan 6, 2012 at 4:00 PM, Drew Kutcharian <dr...@venarc.com> wrote:
> So what are the common RIGHT solutions/tools for this?
> 
> 
> On Jan 6, 2012, at 2:46 PM, Narendra Sharma wrote:
> 
>> >>>It's very surprising that no one seems to have solved such a common use case.
>> I would say people have solved it using RIGHT tools for the task.
>> 
>> 
>> 
>> On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian <dr...@venarc.com> wrote:
>> Thanks everyone for the replies. Seems like there is no easy way to handle this. It's very surprising that no one seems to have solved such a common use case.
>> 
>> -- Drew
>> 
>> On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:
>> 
>> > That's a good question, and I'm not sure - I'm fairly new to both ZK
>> > and Cassandra. I found this wiki page:
>> > http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
>> > and I think the lock recipe still works, even if a stale read happens.
>> > Assuming that wiki page is correct.
>> >
>> > There is still subtlety to locking with ZK though, see (Locks based
>> > on ephemeral nodes) from the zk mailing list in October:
>> > http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
>> >
>> > -Bryce
>> >
>> > On Fri, 6 Jan 2012 13:36:52 -0800
>> > Drew Kutcharian <dr...@venarc.com> wrote:
>> >> Bryce,
>> >>
>> >> I'm not sure about ZooKeeper, but I know if you have a partition
>> >> between HazelCast nodes, than the nodes can acquire the same lock
>> >> independently in each divided partition. How does ZooKeeper handle
>> >> this situation?
>> >>
>> >> -- Drew
>> >>
>> >>
>> >> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
>> >>
>> >>> On Fri, 6 Jan 2012 10:03:38 -0800
>> >>> Drew Kutcharian <dr...@venarc.com> wrote:
>> >>>> I know that this can be done using a lock manager such as ZooKeeper
>> >>>> or HazelCast, but the issue with using either of them is that if
>> >>>> ZooKeeper or HazelCast is down, then you can't be sure about the
>> >>>> reliability of the lock. So this potentially, in the very rare
>> >>>> instance where the lock manager is down and two users are
>> >>>> registering with the same email, can cause major issues.
>> >>>
>> >>> For most applications, if the lock managers is down, you don't
>> >>> acquire the lock, so you don't enter the critical section. Rather
>> >>> than allowing inconsistency, you become unavailable (at least to
>> >>> writes that require a lock).
>> >>>
>> >>> -Bryce
>> >>
>> 
>> 
>> 
>> 
>> -- 
>> Narendra Sharma
>> Software Engineer
>> http://www.aeris.com
>> http://narendrasharma.blogspot.com/
>> 
>> 
> 
> 
> 
> 
> -- 
> Narendra Sharma
> Software Engineer
> http://www.aeris.com
> http://narendrasharma.blogspot.com/
> 
>

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Narendra Sharma <na...@gmail.com>.

Instead of trying to solve the generic problem of uniqueness, I would focus
on the specific problem.

For eg lets consider your usecase of user registration with email address
as key. You can do following:
1. Create CF (Users) where row key is UUID and has user info specific
columns.
2. Whenever user registers create a row in this CF with user status flag as
waiting for confirmation.
3. Send email to the user's email address with link that contains the UUID
(or encrypted UUID)
4. When user clicks on the link, use the UUID (or decrypted UUID) to lookup
user
5. If the user exists with given UUID and status as waiting for
confirmation then update the status  and create a entry in another CF
(EmailUUIDIndex) representing email address to UUID mapping.
6. For authentication you can lookup in the index to get UUID and proceed.
7. If a malicious user registers with someone else's email id then he will
never be able to confirm and will never have an entry in EmailUUIDIndex. As
a additional check if the entry for email id exists in EmailUUIDIndex then
the request for registration can be rejected right away.

Make sense?

-Naren

On Fri, Jan 6, 2012 at 4:00 PM, Drew Kutcharian <dr...@venarc.com> wrote:

> So what are the common RIGHT solutions/tools for this?
>
>
> On Jan 6, 2012, at 2:46 PM, Narendra Sharma wrote:
>
> >>>It's very surprising that no one seems to have solved such a common use
> case.
> I would say people have solved it using RIGHT tools for the task.
>
>
>
> On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian <dr...@venarc.com> wrote:
>
>> Thanks everyone for the replies. Seems like there is no easy way to
>> handle this. It's very surprising that no one seems to have solved such a
>> common use case.
>>
>> -- Drew
>>
>> On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:
>>
>> > That's a good question, and I'm not sure - I'm fairly new to both ZK
>> > and Cassandra. I found this wiki page:
>> > http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
>> > and I think the lock recipe still works, even if a stale read happens.
>> > Assuming that wiki page is correct.
>> >
>> > There is still subtlety to locking with ZK though, see (Locks based
>> > on ephemeral nodes) from the zk mailing list in October:
>> >
>> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
>> >
>> > -Bryce
>> >
>> > On Fri, 6 Jan 2012 13:36:52 -0800
>> > Drew Kutcharian <dr...@venarc.com> wrote:
>> >> Bryce,
>> >>
>> >> I'm not sure about ZooKeeper, but I know if you have a partition
>> >> between HazelCast nodes, than the nodes can acquire the same lock
>> >> independently in each divided partition. How does ZooKeeper handle
>> >> this situation?
>> >>
>> >> -- Drew
>> >>
>> >>
>> >> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
>> >>
>> >>> On Fri, 6 Jan 2012 10:03:38 -0800
>> >>> Drew Kutcharian <dr...@venarc.com> wrote:
>> >>>> I know that this can be done using a lock manager such as ZooKeeper
>> >>>> or HazelCast, but the issue with using either of them is that if
>> >>>> ZooKeeper or HazelCast is down, then you can't be sure about the
>> >>>> reliability of the lock. So this potentially, in the very rare
>> >>>> instance where the lock manager is down and two users are
>> >>>> registering with the same email, can cause major issues.
>> >>>
>> >>> For most applications, if the lock managers is down, you don't
>> >>> acquire the lock, so you don't enter the critical section. Rather
>> >>> than allowing inconsistency, you become unavailable (at least to
>> >>> writes that require a lock).
>> >>>
>> >>> -Bryce
>> >>
>>
>>
>
>
> --
> Narendra Sharma
> Software Engineer
> *http://www.aeris.com <http://www.persistentsys.com/>*
> *http://narendrasharma.blogspot.com/*
>
>
>
>

-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com <http://www.persistentsys.com>*
*http://narendrasharma.blogspot.com/*

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Drew Kutcharian <dr...@venarc.com>.

So what are the common RIGHT solutions/tools for this?


On Jan 6, 2012, at 2:46 PM, Narendra Sharma wrote:

> >>>It's very surprising that no one seems to have solved such a common use case.
> I would say people have solved it using RIGHT tools for the task.
> 
> 
> 
> On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian <dr...@venarc.com> wrote:
> Thanks everyone for the replies. Seems like there is no easy way to handle this. It's very surprising that no one seems to have solved such a common use case.
> 
> -- Drew
> 
> On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:
> 
> > That's a good question, and I'm not sure - I'm fairly new to both ZK
> > and Cassandra. I found this wiki page:
> > http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
> > and I think the lock recipe still works, even if a stale read happens.
> > Assuming that wiki page is correct.
> >
> > There is still subtlety to locking with ZK though, see (Locks based
> > on ephemeral nodes) from the zk mailing list in October:
> > http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
> >
> > -Bryce
> >
> > On Fri, 6 Jan 2012 13:36:52 -0800
> > Drew Kutcharian <dr...@venarc.com> wrote:
> >> Bryce,
> >>
> >> I'm not sure about ZooKeeper, but I know if you have a partition
> >> between HazelCast nodes, than the nodes can acquire the same lock
> >> independently in each divided partition. How does ZooKeeper handle
> >> this situation?
> >>
> >> -- Drew
> >>
> >>
> >> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
> >>
> >>> On Fri, 6 Jan 2012 10:03:38 -0800
> >>> Drew Kutcharian <dr...@venarc.com> wrote:
> >>>> I know that this can be done using a lock manager such as ZooKeeper
> >>>> or HazelCast, but the issue with using either of them is that if
> >>>> ZooKeeper or HazelCast is down, then you can't be sure about the
> >>>> reliability of the lock. So this potentially, in the very rare
> >>>> instance where the lock manager is down and two users are
> >>>> registering with the same email, can cause major issues.
> >>>
> >>> For most applications, if the lock managers is down, you don't
> >>> acquire the lock, so you don't enter the critical section. Rather
> >>> than allowing inconsistency, you become unavailable (at least to
> >>> writes that require a lock).
> >>>
> >>> -Bryce
> >>
> 
> 
> 
> 
> -- 
> Narendra Sharma
> Software Engineer
> http://www.aeris.com
> http://narendrasharma.blogspot.com/
> 
>

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Narendra Sharma <na...@gmail.com>.

>>>It's very surprising that no one seems to have solved such a common use
case.
I would say people have solved it using RIGHT tools for the task.



On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian <dr...@venarc.com> wrote:

> Thanks everyone for the replies. Seems like there is no easy way to handle
> this. It's very surprising that no one seems to have solved such a common
> use case.
>
> -- Drew
>
> On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:
>
> > That's a good question, and I'm not sure - I'm fairly new to both ZK
> > and Cassandra. I found this wiki page:
> > http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
> > and I think the lock recipe still works, even if a stale read happens.
> > Assuming that wiki page is correct.
> >
> > There is still subtlety to locking with ZK though, see (Locks based
> > on ephemeral nodes) from the zk mailing list in October:
> >
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
> >
> > -Bryce
> >
> > On Fri, 6 Jan 2012 13:36:52 -0800
> > Drew Kutcharian <dr...@venarc.com> wrote:
> >> Bryce,
> >>
> >> I'm not sure about ZooKeeper, but I know if you have a partition
> >> between HazelCast nodes, than the nodes can acquire the same lock
> >> independently in each divided partition. How does ZooKeeper handle
> >> this situation?
> >>
> >> -- Drew
> >>
> >>
> >> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
> >>
> >>> On Fri, 6 Jan 2012 10:03:38 -0800
> >>> Drew Kutcharian <dr...@venarc.com> wrote:
> >>>> I know that this can be done using a lock manager such as ZooKeeper
> >>>> or HazelCast, but the issue with using either of them is that if
> >>>> ZooKeeper or HazelCast is down, then you can't be sure about the
> >>>> reliability of the lock. So this potentially, in the very rare
> >>>> instance where the lock manager is down and two users are
> >>>> registering with the same email, can cause major issues.
> >>>
> >>> For most applications, if the lock managers is down, you don't
> >>> acquire the lock, so you don't enter the critical section. Rather
> >>> than allowing inconsistency, you become unavailable (at least to
> >>> writes that require a lock).
> >>>
> >>> -Bryce
> >>
>
>


-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com <http://www.persistentsys.com>*
*http://narendrasharma.blogspot.com/*

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Drew Kutcharian <dr...@venarc.com>.

Thanks everyone for the replies. Seems like there is no easy way to handle this. It's very surprising that no one seems to have solved such a common use case.

-- Drew

On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:

> That's a good question, and I'm not sure - I'm fairly new to both ZK
> and Cassandra. I found this wiki page:
> http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
> and I think the lock recipe still works, even if a stale read happens.
> Assuming that wiki page is correct.
> 
> There is still subtlety to locking with ZK though, see (Locks based
> on ephemeral nodes) from the zk mailing list in October:
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
> 
> -Bryce
> 
> On Fri, 6 Jan 2012 13:36:52 -0800
> Drew Kutcharian <dr...@venarc.com> wrote:
>> Bryce, 
>> 
>> I'm not sure about ZooKeeper, but I know if you have a partition
>> between HazelCast nodes, than the nodes can acquire the same lock
>> independently in each divided partition. How does ZooKeeper handle
>> this situation?
>> 
>> -- Drew
>> 
>> 
>> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
>> 
>>> On Fri, 6 Jan 2012 10:03:38 -0800
>>> Drew Kutcharian <dr...@venarc.com> wrote:
>>>> I know that this can be done using a lock manager such as ZooKeeper
>>>> or HazelCast, but the issue with using either of them is that if
>>>> ZooKeeper or HazelCast is down, then you can't be sure about the
>>>> reliability of the lock. So this potentially, in the very rare
>>>> instance where the lock manager is down and two users are
>>>> registering with the same email, can cause major issues.
>>> 
>>> For most applications, if the lock managers is down, you don't
>>> acquire the lock, so you don't enter the critical section. Rather
>>> than allowing inconsistency, you become unavailable (at least to
>>> writes that require a lock).
>>> 
>>> -Bryce
>>

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Bryce Allen <ba...@ci.uchicago.edu>.

That's a good question, and I'm not sure - I'm fairly new to both ZK
and Cassandra. I found this wiki page:
http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
and I think the lock recipe still works, even if a stale read happens.
Assuming that wiki page is correct.

There is still subtlety to locking with ZK though, see (Locks based
on ephemeral nodes) from the zk mailing list in October:
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0

-Bryce

On Fri, 6 Jan 2012 13:36:52 -0800
Drew Kutcharian <dr...@venarc.com> wrote:
> Bryce, 
> 
> I'm not sure about ZooKeeper, but I know if you have a partition
> between HazelCast nodes, than the nodes can acquire the same lock
> independently in each divided partition. How does ZooKeeper handle
> this situation?
> 
> -- Drew
> 
> 
> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
> 
> > On Fri, 6 Jan 2012 10:03:38 -0800
> > Drew Kutcharian <dr...@venarc.com> wrote:
> >> I know that this can be done using a lock manager such as ZooKeeper
> >> or HazelCast, but the issue with using either of them is that if
> >> ZooKeeper or HazelCast is down, then you can't be sure about the
> >> reliability of the lock. So this potentially, in the very rare
> >> instance where the lock manager is down and two users are
> >> registering with the same email, can cause major issues.
> > 
> > For most applications, if the lock managers is down, you don't
> > acquire the lock, so you don't enter the critical section. Rather
> > than allowing inconsistency, you become unavailable (at least to
> > writes that require a lock).
> > 
> > -Bryce
>

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Jeremiah Jordan <je...@morningstar.com>.

By using quorum.  One of the partitions will may be able to acquire 
locks, the other one won't...

On 01/06/2012 03:36 PM, Drew Kutcharian wrote:
> Bryce,
>
> I'm not sure about ZooKeeper, but I know if you have a partition between HazelCast nodes, than the nodes can acquire the same lock independently in each divided partition. How does ZooKeeper handle this situation?
>
> -- Drew
>
>
> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
>
>> On Fri, 6 Jan 2012 10:03:38 -0800
>> Drew Kutcharian<dr...@venarc.com>  wrote:
>>> I know that this can be done using a lock manager such as ZooKeeper
>>> or HazelCast, but the issue with using either of them is that if
>>> ZooKeeper or HazelCast is down, then you can't be sure about the
>>> reliability of the lock. So this potentially, in the very rare
>>> instance where the lock manager is down and two users are registering
>>> with the same email, can cause major issues.
>> For most applications, if the lock managers is down, you don't acquire
>> the lock, so you don't enter the critical section. Rather than allowing
>> inconsistency, you become unavailable (at least to writes that require
>> a lock).
>>
>> -Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Drew Kutcharian <dr...@venarc.com>.

Bryce, 

I'm not sure about ZooKeeper, but I know if you have a partition between HazelCast nodes, than the nodes can acquire the same lock independently in each divided partition. How does ZooKeeper handle this situation?

-- Drew


On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:

> On Fri, 6 Jan 2012 10:03:38 -0800
> Drew Kutcharian <dr...@venarc.com> wrote:
>> I know that this can be done using a lock manager such as ZooKeeper
>> or HazelCast, but the issue with using either of them is that if
>> ZooKeeper or HazelCast is down, then you can't be sure about the
>> reliability of the lock. So this potentially, in the very rare
>> instance where the lock manager is down and two users are registering
>> with the same email, can cause major issues.
> 
> For most applications, if the lock managers is down, you don't acquire
> the lock, so you don't enter the critical section. Rather than allowing
> inconsistency, you become unavailable (at least to writes that require
> a lock).
> 
> -Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Posted by Bryce Allen <ba...@ci.uchicago.edu>.

On Fri, 6 Jan 2012 10:03:38 -0800
Drew Kutcharian <dr...@venarc.com> wrote:
> I know that this can be done using a lock manager such as ZooKeeper
> or HazelCast, but the issue with using either of them is that if
> ZooKeeper or HazelCast is down, then you can't be sure about the
> reliability of the lock. So this potentially, in the very rare
> instance where the lock manager is down and two users are registering
> with the same email, can cause major issues.

For most applications, if the lock managers is down, you don't acquire
the lock, so you don't enter the critical section. Rather than allowing
inconsistency, you become unavailable (at least to writes that require
a lock).

-Bryce