You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Marek Lewandowski <ma...@gmail.com> on 2015/09/07 22:38:38 UTC

Some love for multi-partition LWT?

Hello there,

would you be interested in having multi-partition support for lightweight transactions in order words to have ability to express something like:

INSERT INTO … IF NOT EXISTS AND
UPDATE … IF EXISTS AND
UPDATE … IF colX = ‘xyz’

where each statement refers to a row living potentially on different set of nodes.
In yet another words: imagine batch with conditions, but which allows to specify multiple statements with conditions for rows in different partitions.

Do you find it very useful, moderately useful or you don’t need that because you have some superb business logic handling of such cases for example?

Let me know.
Regards,
Marek

Re: Some love for multi-partition LWT?

Posted by DuyHai Doan <do...@gmail.com>.
"Do you think it could work?"

At first glance, maybe, but it would involve a huge number of round trips
and a lot of contentions. You'll risk serious deadlocks.

Second, to prove that a solution works, you'll need to prove that it works
for ALL situations, not just a few. Proving something  wrong is easy,
you'll need just only one counter-example. Proving that something is true
in all cases is much much harder. The fact that I don't find any
counter-example right now to your proposal does not mean it's correct and
it will work. I may forget some corner cases.



On Tue, Sep 8, 2015 at 1:53 PM, Peter Lin <wo...@gmail.com> wrote:

>
> I would caution using paxos for distributed transaction in an
> inappropriate way. The model has to be logically and mathematically
> correct, otherwise you end up with corrupt data. In the worst case, it
> could cause cascading failure that brings down the cluster. I've seen
> distributed systems come to a grinding halt due to disagreement between
> nodes.
>
> As far as I know based on existing literature and first hand experience,
> there's only 2 ways: a central transaction manager or distributed lock.
>
> distributed locks are expensive and can result in cluster wide deadlock.
> Look at the data grid space if you want to see where and when distributed
> locks cause major headaches. Just ask anyone that has tried to use a
> distributed cache as a distributed in-memory database and they'll tell you
> how poorly that scales.
>
> I recommend spending a couple of months reading up on the topic, there's a
> wealth of literature on this. These are very old problems and lots of smart
> people have spent time figuring out what works and doesn't work.
>
> On Tue, Sep 8, 2015 at 7:30 AM, Marek Lewandowski <
> marekmlewandowski@gmail.com> wrote:
>
>> "This simple example shows how hard it is to implement multi-partition
>> Paxos rounds. The fact that you have multiple Paxos rounds that are
>> dependent on each other break the safety guarantees of the original Paxos
>> paper. "
>> What if this dependency is explicitly specified in proposal.
>> Assume that each proposal consists of all mutations in this case: {M1,M2}.
>> So N1,N2,N3 receive proposal {M1,M2} and nodes N4,N5,N6 receive proposal
>> {M1,M2}. (Yes, in this case I assume that nodes will get more data than
>> they need, but that's the cost).
>>
>> If C2 wins with higher ballot at nodes N4, N5, N6  after N1,N2,N3
>>  accepted {M1,M2} from C1, then C2 sees in progress proposal {M1, M2} which
>> it has to complete, so it will try nodes N1,N2,N3 and get agreement as well
>> as nodes N4,N5,N6 and eventually commit unless another coordinator jumps in
>> and trumps paxos rounds.
>>
>> Do you think it could work?
>>
>> "To guarantee you need a distributed lock or a different design like
>> datomic. Look at what rich hickey has done with datomic "
>> I'll look into that, thanks.
>>
>> 2015-09-08 12:52 GMT+02:00 <wo...@gmail.com>:
>>
>>>
>>> There's quite a bit of literature on the topic. Look at what is in
>>> acmqueue and you'll see what others are saying is accurate.
>>>
>>> To guarantee you need a distributed lock or a different design like
>>> datomic. Look at what rich hickey has done with datomic
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Sep 8, 2015, at 5:54 AM, DuyHai Doan <do...@gmail.com> wrote:
>>>
>>> "I could imagine that multiple paxos rounds could be played for
>>> different partitions and these rounds would be dependent on each other"
>>>
>>> Example of cluster of 10 nodes (N1 ... N10) and RF=3.
>>>
>>> Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1.
>>> It will imply 2 Paxos rounds with the same ballot B1, one round affecting
>>> N1, N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for
>>> mutation M2.
>>>
>>> Suppose that the prepare/promise, read/results phases are successful for
>>> both Paxos rounds (see here for different LWT phases
>>> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
>>> )
>>>
>>> The coordinator C1 then sends an propose() message to N1 ... N6 with
>>> respective mutations M1 and M2. N1, N2 and N3 will reply accept() but
>>> imagine another coordinator C2 has just proposed a higher ballot B2 (B2 >
>>> B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't
>>> reply) to C1.
>>>
>>> Then the whole multi-partition LWT will need to be aborted because we
>>> cannot proceed with mutation M2. But N1, N2 and N3 already accepted the
>>> mutation M1 and it could be committed by any subsequent Paxos round on N1,
>>> N2 and N3
>>>
>>> We certainly don't want that.
>>>
>>> So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of
>>> course the coordinator C1 could send itself another prepare() message with
>>> ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but
>>> we do not have ANY GUARANTEE that in the meantime, there is another Paxos
>>> round impacting N1, N2 and N3 with ballot higher than B1 that will commit
>>> the undesirable mutation M1...
>>>
>>> This simple example shows how hard it is to implement multi-partition
>>> Paxos rounds. The fact that you have multiple Paxos rounds that are
>>> dependent on each other break the safety guarantees of the original Paxos
>>> paper.
>>>
>>>  The only way I can see that will ensure safety in this case is to
>>> forbid any Paxos round on N1 ... N6 as long as the current rounds are not
>>> finished yet, and this is exactly what a distributed lock does.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski <
>>> marekmlewandowski@gmail.com> wrote:
>>>
>>>> Are you absolutely sure that lock is required? I could imagine that
>>>> multiple paxos rounds could be played for different partitions and these
>>>> rounds would be dependent on each other.
>>>>
>>>> Performance aside, can you please elaborate where do you see such need
>>>> for lock?
>>>> On 8 Sep 2015 00:05, "DuyHai Doan" <do...@gmail.com> wrote:
>>>>
>>>>> Multi partitions LWT is not supported currently on purpose. To support
>>>>> it, we would have to emulate a distributed lock which is pretty bad for
>>>>> performance.
>>>>>
>>>>> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
>>>>> marekmlewandowski@gmail.com> wrote:
>>>>>
>>>>>> Hello there,
>>>>>>
>>>>>> would you be interested in having multi-partition support for
>>>>>> lightweight transactions in order words to have ability to express
>>>>>> something like:
>>>>>>
>>>>>> INSERT INTO … IF NOT EXISTS AND
>>>>>> UPDATE … IF EXISTS AND
>>>>>> UPDATE … IF colX = ‘xyz’
>>>>>>
>>>>>> where each statement refers to a row living potentially on different
>>>>>> set of nodes.
>>>>>> In yet another words: imagine batch with conditions, but which allows
>>>>>> to specify multiple statements with conditions for rows in different
>>>>>> partitions.
>>>>>>
>>>>>> Do you find it very useful, moderately useful or you don’t need that
>>>>>> because you have some superb business logic handling of such cases for
>>>>>> example?
>>>>>>
>>>>>> Let me know.
>>>>>> Regards,
>>>>>> Marek
>>>>>
>>>>>
>>>>>
>>>
>>
>>
>> --
>> Marek Lewandowski
>>
>
>

Re: Some love for multi-partition LWT?

Posted by Peter Lin <wo...@gmail.com>.
I would caution using paxos for distributed transaction in an inappropriate
way. The model has to be logically and mathematically correct, otherwise
you end up with corrupt data. In the worst case, it could cause cascading
failure that brings down the cluster. I've seen distributed systems come to
a grinding halt due to disagreement between nodes.

As far as I know based on existing literature and first hand experience,
there's only 2 ways: a central transaction manager or distributed lock.

distributed locks are expensive and can result in cluster wide deadlock.
Look at the data grid space if you want to see where and when distributed
locks cause major headaches. Just ask anyone that has tried to use a
distributed cache as a distributed in-memory database and they'll tell you
how poorly that scales.

I recommend spending a couple of months reading up on the topic, there's a
wealth of literature on this. These are very old problems and lots of smart
people have spent time figuring out what works and doesn't work.

On Tue, Sep 8, 2015 at 7:30 AM, Marek Lewandowski <
marekmlewandowski@gmail.com> wrote:

> "This simple example shows how hard it is to implement multi-partition
> Paxos rounds. The fact that you have multiple Paxos rounds that are
> dependent on each other break the safety guarantees of the original Paxos
> paper. "
> What if this dependency is explicitly specified in proposal.
> Assume that each proposal consists of all mutations in this case: {M1,M2}.
> So N1,N2,N3 receive proposal {M1,M2} and nodes N4,N5,N6 receive proposal
> {M1,M2}. (Yes, in this case I assume that nodes will get more data than
> they need, but that's the cost).
>
> If C2 wins with higher ballot at nodes N4, N5, N6  after N1,N2,N3
>  accepted {M1,M2} from C1, then C2 sees in progress proposal {M1, M2} which
> it has to complete, so it will try nodes N1,N2,N3 and get agreement as well
> as nodes N4,N5,N6 and eventually commit unless another coordinator jumps in
> and trumps paxos rounds.
>
> Do you think it could work?
>
> "To guarantee you need a distributed lock or a different design like
> datomic. Look at what rich hickey has done with datomic "
> I'll look into that, thanks.
>
> 2015-09-08 12:52 GMT+02:00 <wo...@gmail.com>:
>
>>
>> There's quite a bit of literature on the topic. Look at what is in
>> acmqueue and you'll see what others are saying is accurate.
>>
>> To guarantee you need a distributed lock or a different design like
>> datomic. Look at what rich hickey has done with datomic
>>
>>
>>
>> Sent from my iPhone
>>
>> On Sep 8, 2015, at 5:54 AM, DuyHai Doan <do...@gmail.com> wrote:
>>
>> "I could imagine that multiple paxos rounds could be played for
>> different partitions and these rounds would be dependent on each other"
>>
>> Example of cluster of 10 nodes (N1 ... N10) and RF=3.
>>
>> Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1.
>> It will imply 2 Paxos rounds with the same ballot B1, one round affecting
>> N1, N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for
>> mutation M2.
>>
>> Suppose that the prepare/promise, read/results phases are successful for
>> both Paxos rounds (see here for different LWT phases
>> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
>> )
>>
>> The coordinator C1 then sends an propose() message to N1 ... N6 with
>> respective mutations M1 and M2. N1, N2 and N3 will reply accept() but
>> imagine another coordinator C2 has just proposed a higher ballot B2 (B2 >
>> B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't
>> reply) to C1.
>>
>> Then the whole multi-partition LWT will need to be aborted because we
>> cannot proceed with mutation M2. But N1, N2 and N3 already accepted the
>> mutation M1 and it could be committed by any subsequent Paxos round on N1,
>> N2 and N3
>>
>> We certainly don't want that.
>>
>> So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of
>> course the coordinator C1 could send itself another prepare() message with
>> ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but
>> we do not have ANY GUARANTEE that in the meantime, there is another Paxos
>> round impacting N1, N2 and N3 with ballot higher than B1 that will commit
>> the undesirable mutation M1...
>>
>> This simple example shows how hard it is to implement multi-partition
>> Paxos rounds. The fact that you have multiple Paxos rounds that are
>> dependent on each other break the safety guarantees of the original Paxos
>> paper.
>>
>>  The only way I can see that will ensure safety in this case is to forbid
>> any Paxos round on N1 ... N6 as long as the current rounds are not finished
>> yet, and this is exactly what a distributed lock does.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski <
>> marekmlewandowski@gmail.com> wrote:
>>
>>> Are you absolutely sure that lock is required? I could imagine that
>>> multiple paxos rounds could be played for different partitions and these
>>> rounds would be dependent on each other.
>>>
>>> Performance aside, can you please elaborate where do you see such need
>>> for lock?
>>> On 8 Sep 2015 00:05, "DuyHai Doan" <do...@gmail.com> wrote:
>>>
>>>> Multi partitions LWT is not supported currently on purpose. To support
>>>> it, we would have to emulate a distributed lock which is pretty bad for
>>>> performance.
>>>>
>>>> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
>>>> marekmlewandowski@gmail.com> wrote:
>>>>
>>>>> Hello there,
>>>>>
>>>>> would you be interested in having multi-partition support for
>>>>> lightweight transactions in order words to have ability to express
>>>>> something like:
>>>>>
>>>>> INSERT INTO … IF NOT EXISTS AND
>>>>> UPDATE … IF EXISTS AND
>>>>> UPDATE … IF colX = ‘xyz’
>>>>>
>>>>> where each statement refers to a row living potentially on different
>>>>> set of nodes.
>>>>> In yet another words: imagine batch with conditions, but which allows
>>>>> to specify multiple statements with conditions for rows in different
>>>>> partitions.
>>>>>
>>>>> Do you find it very useful, moderately useful or you don’t need that
>>>>> because you have some superb business logic handling of such cases for
>>>>> example?
>>>>>
>>>>> Let me know.
>>>>> Regards,
>>>>> Marek
>>>>
>>>>
>>>>
>>
>
>
> --
> Marek Lewandowski
>

Re: Some love for multi-partition LWT?

Posted by Marek Lewandowski <ma...@gmail.com>.
"This simple example shows how hard it is to implement multi-partition
Paxos rounds. The fact that you have multiple Paxos rounds that are
dependent on each other break the safety guarantees of the original Paxos
paper. "
What if this dependency is explicitly specified in proposal.
Assume that each proposal consists of all mutations in this case: {M1,M2}.
So N1,N2,N3 receive proposal {M1,M2} and nodes N4,N5,N6 receive proposal
{M1,M2}. (Yes, in this case I assume that nodes will get more data than
they need, but that's the cost).

If C2 wins with higher ballot at nodes N4, N5, N6  after N1,N2,N3  accepted
{M1,M2} from C1, then C2 sees in progress proposal {M1, M2} which it has to
complete, so it will try nodes N1,N2,N3 and get agreement as well as nodes
N4,N5,N6 and eventually commit unless another coordinator jumps in and
trumps paxos rounds.

Do you think it could work?

"To guarantee you need a distributed lock or a different design like
datomic. Look at what rich hickey has done with datomic "
I'll look into that, thanks.

2015-09-08 12:52 GMT+02:00 <wo...@gmail.com>:

>
> There's quite a bit of literature on the topic. Look at what is in
> acmqueue and you'll see what others are saying is accurate.
>
> To guarantee you need a distributed lock or a different design like
> datomic. Look at what rich hickey has done with datomic
>
>
>
> Sent from my iPhone
>
> On Sep 8, 2015, at 5:54 AM, DuyHai Doan <do...@gmail.com> wrote:
>
> "I could imagine that multiple paxos rounds could be played for different
> partitions and these rounds would be dependent on each other"
>
> Example of cluster of 10 nodes (N1 ... N10) and RF=3.
>
> Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1.
> It will imply 2 Paxos rounds with the same ballot B1, one round affecting
> N1, N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for
> mutation M2.
>
> Suppose that the prepare/promise, read/results phases are successful for
> both Paxos rounds (see here for different LWT phases
> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
> )
>
> The coordinator C1 then sends an propose() message to N1 ... N6 with
> respective mutations M1 and M2. N1, N2 and N3 will reply accept() but
> imagine another coordinator C2 has just proposed a higher ballot B2 (B2 >
> B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't
> reply) to C1.
>
> Then the whole multi-partition LWT will need to be aborted because we
> cannot proceed with mutation M2. But N1, N2 and N3 already accepted the
> mutation M1 and it could be committed by any subsequent Paxos round on N1,
> N2 and N3
>
> We certainly don't want that.
>
> So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of
> course the coordinator C1 could send itself another prepare() message with
> ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but
> we do not have ANY GUARANTEE that in the meantime, there is another Paxos
> round impacting N1, N2 and N3 with ballot higher than B1 that will commit
> the undesirable mutation M1...
>
> This simple example shows how hard it is to implement multi-partition
> Paxos rounds. The fact that you have multiple Paxos rounds that are
> dependent on each other break the safety guarantees of the original Paxos
> paper.
>
>  The only way I can see that will ensure safety in this case is to forbid
> any Paxos round on N1 ... N6 as long as the current rounds are not finished
> yet, and this is exactly what a distributed lock does.
>
>
>
>
>
>
>
>
>
> On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski <
> marekmlewandowski@gmail.com> wrote:
>
>> Are you absolutely sure that lock is required? I could imagine that
>> multiple paxos rounds could be played for different partitions and these
>> rounds would be dependent on each other.
>>
>> Performance aside, can you please elaborate where do you see such need
>> for lock?
>> On 8 Sep 2015 00:05, "DuyHai Doan" <do...@gmail.com> wrote:
>>
>>> Multi partitions LWT is not supported currently on purpose. To support
>>> it, we would have to emulate a distributed lock which is pretty bad for
>>> performance.
>>>
>>> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
>>> marekmlewandowski@gmail.com> wrote:
>>>
>>>> Hello there,
>>>>
>>>> would you be interested in having multi-partition support for
>>>> lightweight transactions in order words to have ability to express
>>>> something like:
>>>>
>>>> INSERT INTO … IF NOT EXISTS AND
>>>> UPDATE … IF EXISTS AND
>>>> UPDATE … IF colX = ‘xyz’
>>>>
>>>> where each statement refers to a row living potentially on different
>>>> set of nodes.
>>>> In yet another words: imagine batch with conditions, but which allows
>>>> to specify multiple statements with conditions for rows in different
>>>> partitions.
>>>>
>>>> Do you find it very useful, moderately useful or you don’t need that
>>>> because you have some superb business logic handling of such cases for
>>>> example?
>>>>
>>>> Let me know.
>>>> Regards,
>>>> Marek
>>>
>>>
>>>
>


-- 
Marek Lewandowski

Re: Some love for multi-partition LWT?

Posted by wo...@gmail.com.
There's quite a bit of literature on the topic. Look at what is in acmqueue and you'll see what others are saying is accurate.

To guarantee you need a distributed lock or a different design like datomic. Look at what rich hickey has done with datomic 



Sent from my iPhone

> On Sep 8, 2015, at 5:54 AM, DuyHai Doan <do...@gmail.com> wrote:
> 
> "I could imagine that multiple paxos rounds could be played for different partitions and these rounds would be dependent on each other" 
> 
> Example of cluster of 10 nodes (N1 ... N10) and RF=3.
> 
> Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1. It will imply 2 Paxos rounds with the same ballot B1, one round affecting N1, N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for mutation M2.
> 
> Suppose that the prepare/promise, read/results phases are successful for both Paxos rounds (see here for different LWT phases http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0)
> 
> The coordinator C1 then sends an propose() message to N1 ... N6 with respective mutations M1 and M2. N1, N2 and N3 will reply accept() but imagine another coordinator C2 has just proposed a higher ballot B2 (B2 > B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't reply) to C1. 
> 
> Then the whole multi-partition LWT will need to be aborted because we cannot proceed with mutation M2. But N1, N2 and N3 already accepted the mutation M1 and it could be committed by any subsequent Paxos round on N1, N2 and N3
> 
> We certainly don't want that. 
> 
> So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of course the coordinator C1 could send itself another prepare() message with ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but we do not have ANY GUARANTEE that in the meantime, there is another Paxos round impacting N1, N2 and N3 with ballot higher than B1 that will commit the undesirable mutation M1...
> 
> This simple example shows how hard it is to implement multi-partition Paxos rounds. The fact that you have multiple Paxos rounds that are dependent on each other break the safety guarantees of the original Paxos paper. 
> 
>  The only way I can see that will ensure safety in this case is to forbid any Paxos round on N1 ... N6 as long as the current rounds are not finished yet, and this is exactly what a distributed lock does.
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski <ma...@gmail.com> wrote:
>> Are you absolutely sure that lock is required? I could imagine that multiple paxos rounds could be played for different partitions and these rounds would be dependent on each other.
>> 
>> Performance aside, can you please elaborate where do you see such need for lock?
>> 
>>> On 8 Sep 2015 00:05, "DuyHai Doan" <do...@gmail.com> wrote:
>>> Multi partitions LWT is not supported currently on purpose. To support it, we would have to emulate a distributed lock which is pretty bad for performance.
>>> 
>>>> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <ma...@gmail.com> wrote:
>>>> Hello there,
>>>> 
>>>> would you be interested in having multi-partition support for lightweight transactions in order words to have ability to express something like:
>>>> 
>>>> INSERT INTO �� IF NOT EXISTS AND
>>>> UPDATE �� IF EXISTS AND
>>>> UPDATE �� IF colX = ��xyz��
>>>> 
>>>> where each statement refers to a row living potentially on different set of nodes.
>>>> In yet another words: imagine batch with conditions, but which allows to specify multiple statements with conditions for rows in different partitions.
>>>> 
>>>> Do you find it very useful, moderately useful or you don��t need that because you have some superb business logic handling of such cases for example?
>>>> 
>>>> Let me know.
>>>> Regards,
>>>> Marek
> 

Re: Some love for multi-partition LWT?

Posted by DuyHai Doan <do...@gmail.com>.
"I could imagine that multiple paxos rounds could be played for different
partitions and these rounds would be dependent on each other"

Example of cluster of 10 nodes (N1 ... N10) and RF=3.

Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1. It
will imply 2 Paxos rounds with the same ballot B1, one round affecting N1,
N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for
mutation M2.

Suppose that the prepare/promise, read/results phases are successful for
both Paxos rounds (see here for different LWT phases
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0)

The coordinator C1 then sends an propose() message to N1 ... N6 with
respective mutations M1 and M2. N1, N2 and N3 will reply accept() but
imagine another coordinator C2 has just proposed a higher ballot B2 (B2 >
B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't
reply) to C1.

Then the whole multi-partition LWT will need to be aborted because we
cannot proceed with mutation M2. But N1, N2 and N3 already accepted the
mutation M1 and it could be committed by any subsequent Paxos round on N1,
N2 and N3

We certainly don't want that.

So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of
course the coordinator C1 could send itself another prepare() message with
ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but
we do not have ANY GUARANTEE that in the meantime, there is another Paxos
round impacting N1, N2 and N3 with ballot higher than B1 that will commit
the undesirable mutation M1...

This simple example shows how hard it is to implement multi-partition Paxos
rounds. The fact that you have multiple Paxos rounds that are dependent on
each other break the safety guarantees of the original Paxos paper.

 The only way I can see that will ensure safety in this case is to forbid
any Paxos round on N1 ... N6 as long as the current rounds are not finished
yet, and this is exactly what a distributed lock does.









On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski <
marekmlewandowski@gmail.com> wrote:

> Are you absolutely sure that lock is required? I could imagine that
> multiple paxos rounds could be played for different partitions and these
> rounds would be dependent on each other.
>
> Performance aside, can you please elaborate where do you see such need for
> lock?
> On 8 Sep 2015 00:05, "DuyHai Doan" <do...@gmail.com> wrote:
>
>> Multi partitions LWT is not supported currently on purpose. To support
>> it, we would have to emulate a distributed lock which is pretty bad for
>> performance.
>>
>> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
>> marekmlewandowski@gmail.com> wrote:
>>
>>> Hello there,
>>>
>>> would you be interested in having multi-partition support for
>>> lightweight transactions in order words to have ability to express
>>> something like:
>>>
>>> INSERT INTO … IF NOT EXISTS AND
>>> UPDATE … IF EXISTS AND
>>> UPDATE … IF colX = ‘xyz’
>>>
>>> where each statement refers to a row living potentially on different set
>>> of nodes.
>>> In yet another words: imagine batch with conditions, but which allows to
>>> specify multiple statements with conditions for rows in different
>>> partitions.
>>>
>>> Do you find it very useful, moderately useful or you don’t need that
>>> because you have some superb business logic handling of such cases for
>>> example?
>>>
>>> Let me know.
>>> Regards,
>>> Marek
>>
>>
>>

Re: Some love for multi-partition LWT?

Posted by Marek Lewandowski <ma...@gmail.com>.
Are you absolutely sure that lock is required? I could imagine that
multiple paxos rounds could be played for different partitions and these
rounds would be dependent on each other.

Performance aside, can you please elaborate where do you see such need for
lock?
On 8 Sep 2015 00:05, "DuyHai Doan" <do...@gmail.com> wrote:

> Multi partitions LWT is not supported currently on purpose. To support it,
> we would have to emulate a distributed lock which is pretty bad for
> performance.
>
> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
> marekmlewandowski@gmail.com> wrote:
>
>> Hello there,
>>
>> would you be interested in having multi-partition support for lightweight
>> transactions in order words to have ability to express something like:
>>
>> INSERT INTO … IF NOT EXISTS AND
>> UPDATE … IF EXISTS AND
>> UPDATE … IF colX = ‘xyz’
>>
>> where each statement refers to a row living potentially on different set
>> of nodes.
>> In yet another words: imagine batch with conditions, but which allows to
>> specify multiple statements with conditions for rows in different
>> partitions.
>>
>> Do you find it very useful, moderately useful or you don’t need that
>> because you have some superb business logic handling of such cases for
>> example?
>>
>> Let me know.
>> Regards,
>> Marek
>
>
>

Re: Some love for multi-partition LWT?

Posted by DuyHai Doan <do...@gmail.com>.
Multi partitions LWT is not supported currently on purpose. To support it,
we would have to emulate a distributed lock which is pretty bad for
performance.

On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
marekmlewandowski@gmail.com> wrote:

> Hello there,
>
> would you be interested in having multi-partition support for lightweight
> transactions in order words to have ability to express something like:
>
> INSERT INTO … IF NOT EXISTS AND
> UPDATE … IF EXISTS AND
> UPDATE … IF colX = ‘xyz’
>
> where each statement refers to a row living potentially on different set
> of nodes.
> In yet another words: imagine batch with conditions, but which allows to
> specify multiple statements with conditions for rows in different
> partitions.
>
> Do you find it very useful, moderately useful or you don’t need that
> because you have some superb business logic handling of such cases for
> example?
>
> Let me know.
> Regards,
> Marek