You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Mike Rhodes <co...@dx13.co.uk> on 2019/10/07 09:34:37 UTC

Re: [DISCUSS] FoundationDB read versions and CouchDB requests

All,

I think my email wasn't my clearest missive ever, so likely pretty easy to get lost in it :) 

I think my idea to include the read version in the document rev ID is likely a bad one. But, if we are already including it in the database seq value, and we've done the work to make that number transfer cleanly across FDB instances, there's probably some interesting API directions post 4.0 where we make more use of that value towards more efficient RYW at a database level.

I'm beginning to get the feeling of what this API might look like and avoid being painful/confusing. For example, given the more advanced nature of these APIs, I'd see them operating at the HTTP header level, where we can provide request and response headers with the same name to support sending/receiving database seq values across ~all read/write requests.

Taking Joan and Adam's points on board, my view is that we semi-shelve this discussion to enable focus on getting 4.0 out the door. But, I think we can start to introduce useful functionality based on this during the 4.x series if we're careful (i.e., avoiding breaking changes). Of course, we should probably step back to user pain points first as Joan implies, otherwise we're building for the sake of building and the opportunity cost is not negligible.

I think we might also want to hash out the "interchangable backend" question a bit more. As Adam says, FDB gives us a number of features that would be hard to replicate on different backends -- at least in the clustered case -- so nailing down a position there sounds important.

-- 
Mike.

On Mon, 30 Sep 2019, at 18:12, Adam Kocoloski wrote:
> Hi Joan,
> 
> Allowing clients to choose the DB sequence at which a read is performed 
> won’t have any effect on replication.
> 
> If we end up enhancing _bulk_docs so that it can use a single 
> transaction for all the documents in the batch then that’s where the 
> replicator might need to get smarter, e.g. by inferring that a range of 
> the _changes feed corresponds to a single transaction (based on 
> knowledge of the Sequence encoding) and ensuring that the transaction 
> is also written to the target atomically.
> 
> I hadn’t gotten as far as thinking about release numbers here, just 
> thinking about what’s possible. You’re right about the positioning of 
> 4.0, although that was largely an attempt to head off anyone thinking 
> about a wholesale replacement of the current API as part of the 
> FoundationDB work rather than a “no new features” ban. Cheers,
> 
> Adam
> 
> > On Sep 27, 2019, at 8:16 AM, Joan Touzet <wo...@apache.org> wrote:
> > 
> > 
> > 
> > On 2019-09-26 17:04, Adam Kocoloski wrote:
> >> 
> >>> On Sep 26, 2019, at 1:38 PM, Joan Touzet <wo...@apache.org> wrote:
> >>> 
> >>> On 2019-09-26 13:14, Adam Kocoloski wrote:
> >>>> Hi Joan, no need for apologies! Snipping out a few bits:
> >>>> 
> >>>>> One alternative is to always keep just one around, and constantly update
> >>>>> it every 5s, whether it's used or not (idle server).
> >>>> 
> >>>> Agreed, I see no reason to keep multiple old read versions around on a given CouchDB node. Updating it every second or two could be a nice thing to do (I wouldn’t wait 5 seconds because handing out a read version 4.95 seconds old isn’t very useful to anyone ;).
> >>>> 
> >>>>> This second option seems better, but as mentioned later we don't want it
> >>>>> to be a transparent FDB token (or convertible into one). This parallels
> >>>>> the nonce approach we use in _changes feeds to ensure a stable feed, yeah?
> >>>> 
> >>>> In our current design we _do_ expose FDB versions pretty directly as database update sequences (there’s a small prefix to allow for _changes to stay monotonically increasing when relocating a database to a new FDB cluster). I believe it’s worth us thinking about expanding the use of sequences to other places in the API as those are a concept that’s already pretty familiar to our users
> >>> 
> >>> Did users ever craft their own 2.x db update sequence tokens to abuse
> >>> the system? Probably not, because our clustering code was hard to
> >>> understand. Did users ever craft their own 1.x db update sequence
> >>> values? Yes, and it caused lots of problems.
> >> 
> >> I don't remember the problems that this caused in 1.x, but I can certainly imagine a too-clever user generating a sequence that doesn’t correspond to any consistent FDB version and supplying it. FoundationDB allows for this sort of thing with the ominous caveat: "The database cannot guarantee causal consistency if this method is used (the transaction’s reads will be causally consistent only if the provided read version has that property).” So … yeah.
> > 
> > Eek. I don't like introducing new sharp edges.
> > 
> >> 
> >>> Does this prevent implementing the CouchDB API on any other backend? In
> >>> which case, I'd be -1.... In other words, at the very least we need to
> >>> reinforce that the token is opaque and that manipulating it can produce
> >>> both undefined errors as well as potentially lead to (perceived?) data loss.
> >> 
> >> I mean, we’re already down the path where we are using various specific features of FoundationDB (versionstamps, atomic operations, and of course transactions) that would not necessarily be in an arbitrary key-value store. I suppose adding this enhancement would add to the list of requirements on an underlying storage engine, but if a storage engine couldn’t support transactions with snapshot isolation I’m not sure it’d be a good choice for us anyway. Even something as basic as atomic maintenance of the _all_docs and _changes indexes becomes a heroic effort without that.
> > 
> > Two questions:
> > 
> > I thought 4.0 was supposed to be "no new functionality, just 2.x/3.x
> > semantics on top of FDB?" Is this something you're looking at for a
> > later release?
> > 
> > And how do you foresee e.g. PouchDB keeping up if they're not going to
> > put FDB on a mobile device - will we necessarily be implementing API
> > endpoints that demand an FDB backend that can't ever exist on a
> > different implementation? How will this affect replication, for
> > instance, if at all?
> > 
> >>> If we eschew API changes for 4.0 then we need to decide on the default. And if
> >>>>> we're voting, I'd say making RYWs the default (never hanging onto a
> >>>>> handle) and then (ab-)using stale=ok or whatever state we have lying
> >>>>> around might be sufficient.
> >>>> 
> >>>> I definitely agree. We should not be using old read versions without the client’s knowledge unless it's for some internal process where we know all the tradeoffs.
> >>>> 
> >>>>> This is the really important data point here for me. While Cloudant
> >>>>> cares about 2-3 extra ms on the server side, many many MANY CouchDB
> >>>>> users don't. Can we benchmark what this looks like when running
> >>>>> FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What
> >>>>> about the average laptop/desktop? Or is it only 2-3ms on a beefy
> >>>>> Cloudant-sized server?
> >>>> 
> >>>> I don’t have hard performance numbers, but I expect that acquiring a read version in a small-scale deployment is faster than the same operation against a big FoundationDB deployment spanning zones in a cloud region. When you scale down e.g. to a single FDB process that process ends up playing all the roles that need to collaborate to decide on a read version and so the network latency gets taken out of the picture.
> >>> 
> >>> Then I'm concerned this is premature optimization.
> >> 
> >> A fair concern. What I really like about this is that the way to more efficient operations is exposing richer transactional semantics to users. How often do you get a deal like that!
> > 
> > Neat...but I'm always nervous when people spin new technology as a
> > win-win, there's always a tradeoff, so I reserve the right to be
> > skeptical until proven otherwise ;)
> > 
> >> 
> >> Cheers, Adam
> >> 
> > 
> > -Joan
> 
>

Re: [DISCUSS] FoundationDB read versions and CouchDB requests

Posted by Joan Touzet <wo...@apache.org>.

Thanks, Mike.

I know Jan has something to say on this topic, but please note he is on 
vacation through this week and may need to be nudged to post here once 
he returns ;)

-Joan

On 2019-10-07 5:34 a.m., Mike Rhodes wrote:
> All,
> 
> I think my email wasn't my clearest missive ever, so likely pretty easy to get lost in it :)
> 
> I think my idea to include the read version in the document rev ID is likely a bad one. But, if we are already including it in the database seq value, and we've done the work to make that number transfer cleanly across FDB instances, there's probably some interesting API directions post 4.0 where we make more use of that value towards more efficient RYW at a database level.
> 
> I'm beginning to get the feeling of what this API might look like and avoid being painful/confusing. For example, given the more advanced nature of these APIs, I'd see them operating at the HTTP header level, where we can provide request and response headers with the same name to support sending/receiving database seq values across ~all read/write requests.
> 
> Taking Joan and Adam's points on board, my view is that we semi-shelve this discussion to enable focus on getting 4.0 out the door. But, I think we can start to introduce useful functionality based on this during the 4.x series if we're careful (i.e., avoiding breaking changes). Of course, we should probably step back to user pain points first as Joan implies, otherwise we're building for the sake of building and the opportunity cost is not negligible.
> 
> I think we might also want to hash out the "interchangable backend" question a bit more. As Adam says, FDB gives us a number of features that would be hard to replicate on different backends -- at least in the clustered case -- so nailing down a position there sounds important.
>

Re: [DISCUSS] FoundationDB read versions and CouchDB requests

Posted by Jan Lehnardt <ja...@apache.org>.


> On 7. Oct 2019, at 11:34, Mike Rhodes <co...@dx13.co.uk> wrote:
> 
> All,
> 
> I think my email wasn't my clearest missive ever, so likely pretty easy to get lost in it :) 
> 
> I think my idea to include the read version in the document rev ID is likely a bad one. But, if we are already including it in the database seq value, and we've done the work to make that number transfer cleanly across FDB instances, there's probably some interesting API directions post 4.0 where we make more use of that value towards more efficient RYW at a database level.
> 
> I'm beginning to get the feeling of what this API might look like and avoid being painful/confusing. For example, given the more advanced nature of these APIs, I'd see them operating at the HTTP header level, where we can provide request and response headers with the same name to support sending/receiving database seq values across ~all read/write requests.
> 
> Taking Joan and Adam's points on board, my view is that we semi-shelve this discussion to enable focus on getting 4.0 out the door. But, I think we can start to introduce useful functionality based on this during the 4.x series if we're careful (i.e., avoiding breaking changes). Of course, we should probably step back to user pain points first as Joan implies, otherwise we're building for the sake of building and the opportunity cost is not negligible.
> 
> I think we might also want to hash out the "interchangable backend" question a bit more. As Adam says, FDB gives us a number of features that would be hard to replicate on different backends -- at least in the clustered case -- so nailing down a position there sounds important.

I think for now there is only need for one other backend and it is for lower-resource systems, and I’d be fine with requiring them to be not clustered.

I’m thinking of RasPI and Desktop Software scenarios.

Best
Jan
—


> 
> -- 
> Mike.
> 
> On Mon, 30 Sep 2019, at 18:12, Adam Kocoloski wrote:
>> Hi Joan,
>> 
>> Allowing clients to choose the DB sequence at which a read is performed 
>> won’t have any effect on replication.
>> 
>> If we end up enhancing _bulk_docs so that it can use a single 
>> transaction for all the documents in the batch then that’s where the 
>> replicator might need to get smarter, e.g. by inferring that a range of 
>> the _changes feed corresponds to a single transaction (based on 
>> knowledge of the Sequence encoding) and ensuring that the transaction 
>> is also written to the target atomically.
>> 
>> I hadn’t gotten as far as thinking about release numbers here, just 
>> thinking about what’s possible. You’re right about the positioning of 
>> 4.0, although that was largely an attempt to head off anyone thinking 
>> about a wholesale replacement of the current API as part of the 
>> FoundationDB work rather than a “no new features” ban. Cheers,
>> 
>> Adam
>> 
>>> On Sep 27, 2019, at 8:16 AM, Joan Touzet <wo...@apache.org> wrote:
>>> 
>>> 
>>> 
>>> On 2019-09-26 17:04, Adam Kocoloski wrote:
>>>> 
>>>>> On Sep 26, 2019, at 1:38 PM, Joan Touzet <wo...@apache.org> wrote:
>>>>> 
>>>>> On 2019-09-26 13:14, Adam Kocoloski wrote:
>>>>>> Hi Joan, no need for apologies! Snipping out a few bits:
>>>>>> 
>>>>>>> One alternative is to always keep just one around, and constantly update
>>>>>>> it every 5s, whether it's used or not (idle server).
>>>>>> 
>>>>>> Agreed, I see no reason to keep multiple old read versions around on a given CouchDB node. Updating it every second or two could be a nice thing to do (I wouldn’t wait 5 seconds because handing out a read version 4.95 seconds old isn’t very useful to anyone ;).
>>>>>> 
>>>>>>> This second option seems better, but as mentioned later we don't want it
>>>>>>> to be a transparent FDB token (or convertible into one). This parallels
>>>>>>> the nonce approach we use in _changes feeds to ensure a stable feed, yeah?
>>>>>> 
>>>>>> In our current design we _do_ expose FDB versions pretty directly as database update sequences (there’s a small prefix to allow for _changes to stay monotonically increasing when relocating a database to a new FDB cluster). I believe it’s worth us thinking about expanding the use of sequences to other places in the API as those are a concept that’s already pretty familiar to our users
>>>>> 
>>>>> Did users ever craft their own 2.x db update sequence tokens to abuse
>>>>> the system? Probably not, because our clustering code was hard to
>>>>> understand. Did users ever craft their own 1.x db update sequence
>>>>> values? Yes, and it caused lots of problems.
>>>> 
>>>> I don't remember the problems that this caused in 1.x, but I can certainly imagine a too-clever user generating a sequence that doesn’t correspond to any consistent FDB version and supplying it. FoundationDB allows for this sort of thing with the ominous caveat: "The database cannot guarantee causal consistency if this method is used (the transaction’s reads will be causally consistent only if the provided read version has that property).” So … yeah.
>>> 
>>> Eek. I don't like introducing new sharp edges.
>>> 
>>>> 
>>>>> Does this prevent implementing the CouchDB API on any other backend? In
>>>>> which case, I'd be -1.... In other words, at the very least we need to
>>>>> reinforce that the token is opaque and that manipulating it can produce
>>>>> both undefined errors as well as potentially lead to (perceived?) data loss.
>>>> 
>>>> I mean, we’re already down the path where we are using various specific features of FoundationDB (versionstamps, atomic operations, and of course transactions) that would not necessarily be in an arbitrary key-value store. I suppose adding this enhancement would add to the list of requirements on an underlying storage engine, but if a storage engine couldn’t support transactions with snapshot isolation I’m not sure it’d be a good choice for us anyway. Even something as basic as atomic maintenance of the _all_docs and _changes indexes becomes a heroic effort without that.
>>> 
>>> Two questions:
>>> 
>>> I thought 4.0 was supposed to be "no new functionality, just 2.x/3.x
>>> semantics on top of FDB?" Is this something you're looking at for a
>>> later release?
>>> 
>>> And how do you foresee e.g. PouchDB keeping up if they're not going to
>>> put FDB on a mobile device - will we necessarily be implementing API
>>> endpoints that demand an FDB backend that can't ever exist on a
>>> different implementation? How will this affect replication, for
>>> instance, if at all?
>>> 
>>>>> If we eschew API changes for 4.0 then we need to decide on the default. And if
>>>>>>> we're voting, I'd say making RYWs the default (never hanging onto a
>>>>>>> handle) and then (ab-)using stale=ok or whatever state we have lying
>>>>>>> around might be sufficient.
>>>>>> 
>>>>>> I definitely agree. We should not be using old read versions without the client’s knowledge unless it's for some internal process where we know all the tradeoffs.
>>>>>> 
>>>>>>> This is the really important data point here for me. While Cloudant
>>>>>>> cares about 2-3 extra ms on the server side, many many MANY CouchDB
>>>>>>> users don't. Can we benchmark what this looks like when running
>>>>>>> FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What
>>>>>>> about the average laptop/desktop? Or is it only 2-3ms on a beefy
>>>>>>> Cloudant-sized server?
>>>>>> 
>>>>>> I don’t have hard performance numbers, but I expect that acquiring a read version in a small-scale deployment is faster than the same operation against a big FoundationDB deployment spanning zones in a cloud region. When you scale down e.g. to a single FDB process that process ends up playing all the roles that need to collaborate to decide on a read version and so the network latency gets taken out of the picture.
>>>>> 
>>>>> Then I'm concerned this is premature optimization.
>>>> 
>>>> A fair concern. What I really like about this is that the way to more efficient operations is exposing richer transactional semantics to users. How often do you get a deal like that!
>>> 
>>> Neat...but I'm always nervous when people spin new technology as a
>>> win-win, there's always a tradeoff, so I reserve the right to be
>>> skeptical until proven otherwise ;)
>>> 
>>>> 
>>>> Cheers, Adam
>>>> 
>>> 
>>> -Joan
>> 
>> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/