You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ralph Boehme <sl...@samba.org> on 2023/04/11 17:18:14 UTC

CAS operation result is unknown - proposal accepted by 1 but not a quorum

Hi folks!

Ralph here from the Samba team.

I'm currently doing research into Opensource distributed NoSQL key/value 
stores to be used by Samba as an more scalable alternative to Samba's 
own homegrown distributed key/value store called "ctdb" [1].

As an Opensource implementation of the SMB filesharing protocol from 
Microsoft, we have some specific requirements wrt to database behaviour:

- fast
- fast
- fast
- highly consistent, iow linearizable

We got away without a linearizable database as historically the SMB 
protocol and the SMB client implementations were built around the 
assumption that handle and session state at the server could be lost due 
to events like process or server crashes and client would implement a 
best effort strategy to recover client state.

Modern SMB3 offers stronger guarantees which require a strongly 
consistent ie linearizable database.

While prototyping a Python module for our pluggable database client in 
Samba I ran into the following issue with Cassandra:

   File "cassandra/cluster.py", line 2618, in 
cassandra.cluster.Session.execute
   File "cassandra/cluster.py", line 4901, in 
cassandra.cluster.ResponseFuture.result
cassandra.protocol.ErrorMessageSub: <Error from server: code=1700 
[Unknown] message="CAS operation result is unknown - proposal accepted 
by 1 but not a quorum.">

This happens when executing the following LWT:

         f'''
         INSERT INTO {dbname} (key, guid, owner, refcount)
         VALUES (?, ?, ?, ?)
         IF NOT EXISTS
         ''')

This is the first time I'm running Cassandra. I've just setup a three 
node test cluster and everything looks ok:

# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load        Tokens  Owns (effective)  Host ID 
                       Rack
UN  172.18.200.21  360,09 KiB  16      100,0% 
4590f3a6-4ca5-466f-a24d-edc54afa36f0  rack1
UN  172.18.200.23  326,92 KiB  16      100,0% 
9175fd4e-4d84-4899-878a-dd5266132ff8  rack1
UN  172.18.200.22  335,32 KiB  16      100,0% 
35e05369-cc8a-4642-b98d-a5fcc326502f  rack1

Can anyone shed some light on what I might be doing wrong?

Thanks!
-slow

[1] <https://wiki.samba.org/index.php/CTDB_and_Clustered_Samba>

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

Posted by Ralph Boehme <sl...@samba.org>.

On 4/12/23 15:30, Jeff Jirsa wrote:
> Are you always inserting into the same partition (with contention) or
> different ?

I'm actually updating the very same row. :)

> Which version are you using ?

# nodetool version
ReleaseVersion: 4.1.1

> The short tldr is that the failure modes of the existing paxos
> implementation (under contention, under latency, under cluster
> strain) can cause undefined states. I believe that a subsequent
> serial read will deterministically resolve the state (look at
> cassandra-12126), but that has a cost (both the extra operation and
> the code complexity)

I'm definitely driving contention here in my workload. I'm basically 
implementing locks using LWTs on a row and I'm running lock/unlock in a 
tight loop *from multiple clients*. As said, this already comes to a 
grinding halt with just 2 clients.

> The upcoming transactional rewrite will likely change this, but it’s
> still WIP (CEP-15)

Thanks. I'm aware of Acord and can't await to get my fingers on 
Cassandra 5.0. :) In the meantime I was hoping I could use Cassandra's 
LWTs to implement locking.

Thanks!
-slow

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

Posted by Jeff Jirsa <jj...@gmail.com>.

Are you always inserting into the same partition (with contention) or different ?

Which version are you using ? 

The short tldr is that the failure modes of the existing paxos implementation (under contention, under latency, under cluster strain) can cause undefined states. I believe that a subsequent serial read will deterministically resolve the state (look at cassandra-12126), but that has a cost (both the extra operation and the code complexity)

The upcoming transactional rewrite will likely change this, but it’s still WIP (CEP-15)

> On Apr 12, 2023, at 6:11 AM, Ralph Boehme <sl...@samba.org> wrote:
> 
> On 4/11/23 21:14, Ralph Boehme wrote:
>>> On 4/11/23 19:53, Bowen Song via user wrote:
>>> That error message sounds like one of the nodes timed out in the paxos propose stage.  You can check the system.log and gc.log and see if you can find anything unusual in them, such as network errors, out of sync clocks or long stop-the-world GC pauses.
>> hm, I'll check the logs, but I can reproduce this 100% on an idle test cluster just by running a simple test client that generates a smallish workload where just 2 processes on a single host hammer the Cassandra cluster with LWTs.
> 
> nothing in the logs really.
> 
>> Maybe LWTs are not meant to be used this way?
> 
> fwiw, this happens 100% within a few seconds with a worload where two clients hammer with LWTs on a single row.
> 
> Thanks!
> -slow
>

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

Posted by Ralph Boehme <sl...@samba.org>.

On 4/11/23 21:14, Ralph Boehme wrote:
> On 4/11/23 19:53, Bowen Song via user wrote:
>> That error message sounds like one of the nodes timed out in the paxos 
>> propose stage.  You can check the system.log and gc.log and see if you 
>> can find anything unusual in them, such as network errors, out of sync 
>> clocks or long stop-the-world GC pauses.
> 
> hm, I'll check the logs, but I can reproduce this 100% on an idle test 
> cluster just by running a simple test client that generates a smallish 
> workload where just 2 processes on a single host hammer the Cassandra 
> cluster with LWTs.

nothing in the logs really.

> Maybe LWTs are not meant to be used this way?

fwiw, this happens 100% within a few seconds with a worload where two 
clients hammer with LWTs on a single row.

Thanks!
-slow

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

Posted by Ralph Boehme <sl...@samba.org>.

On 4/11/23 19:53, Bowen Song via user wrote:
> That error message sounds like one of the nodes timed out in the paxos 
> propose stage.  You can check the system.log and gc.log and see if you 
> can find anything unusual in them, such as network errors, out of sync 
> clocks or long stop-the-world GC pauses.

hm, I'll check the logs, but I can reproduce this 100% on an idle test 
cluster just by running a simple test client that generates a smallish 
workload where just 2 processes on a single host hammer the Cassandra 
cluster with LWTs.

Maybe LWTs are not meant to be used this way?

> BTW, since you said you want it to be fast, I think it's worth 
> mentioning that LWT comes with additional cost and is much slower than a 
> straight forward INSERT/UPDATE. 

Sure, but we have to swallow that pill as we need linearizability.

> You should avoid using it if possible. 
> For example, if all of the Cassandra clients (samba servers) are running 
> on the same machine, it may be far more efficient to use a lock than LWT.

no, the goal is designing a huge scaleout SMB cluster spanning hundreds 
of nodes, used as multitennant cloud SMB frontend much like Microsoft 
Azure SMB.

Thanks!
-slow

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

Posted by Bowen Song via user <us...@cassandra.apache.org>.

That error message sounds like one of the nodes timed out in the paxos 
propose stage.  You can check the system.log and gc.log and see if you 
can find anything unusual in them, such as network errors, out of sync 
clocks or long stop-the-world GC pauses.


BTW, since you said you want it to be fast, I think it's worth 
mentioning that LWT comes with additional cost and is much slower than a 
straight forward INSERT/UPDATE. You should avoid using it if possible. 
For example, if all of the Cassandra clients (samba servers) are running 
on the same machine, it may be far more efficient to use a lock than LWT.


On 11/04/2023 18:18, Ralph Boehme wrote:
> Hi folks!
>
> Ralph here from the Samba team.
>
> I'm currently doing research into Opensource distributed NoSQL 
> key/value stores to be used by Samba as an more scalable alternative 
> to Samba's own homegrown distributed key/value store called "ctdb" [1].
>
> As an Opensource implementation of the SMB filesharing protocol from 
> Microsoft, we have some specific requirements wrt to database behaviour:
>
> - fast
> - fast
> - fast
> - highly consistent, iow linearizable
>
> We got away without a linearizable database as historically the SMB 
> protocol and the SMB client implementations were built around the 
> assumption that handle and session state at the server could be lost 
> due to events like process or server crashes and client would 
> implement a best effort strategy to recover client state.
>
> Modern SMB3 offers stronger guarantees which require a strongly 
> consistent ie linearizable database.
>
> While prototyping a Python module for our pluggable database client in 
> Samba I ran into the following issue with Cassandra:
>
>   File "cassandra/cluster.py", line 2618, in 
> cassandra.cluster.Session.execute
>   File "cassandra/cluster.py", line 4901, in 
> cassandra.cluster.ResponseFuture.result
> cassandra.protocol.ErrorMessageSub: <Error from server: code=1700 
> [Unknown] message="CAS operation result is unknown - proposal accepted 
> by 1 but not a quorum.">
>
> This happens when executing the following LWT:
>
>         f'''
>         INSERT INTO {dbname} (key, guid, owner, refcount)
>         VALUES (?, ?, ?, ?)
>         IF NOT EXISTS
>         ''')
>
> This is the first time I'm running Cassandra. I've just setup a three 
> node test cluster and everything looks ok:
>
> # nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load        Tokens  Owns (effective)  Host ID 
>                       Rack
> UN  172.18.200.21  360,09 KiB  16      100,0% 
> 4590f3a6-4ca5-466f-a24d-edc54afa36f0  rack1
> UN  172.18.200.23  326,92 KiB  16      100,0% 
> 9175fd4e-4d84-4899-878a-dd5266132ff8  rack1
> UN  172.18.200.22  335,32 KiB  16      100,0% 
> 35e05369-cc8a-4642-b98d-a5fcc326502f  rack1
>
> Can anyone shed some light on what I might be doing wrong?
>
> Thanks!
> -slow
>
> [1] <https://wiki.samba.org/index.php/CTDB_and_Clustered_Samba>
>