You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Adam Kocoloski <ko...@apache.org> on 2019/04/16 00:40:57 UTC

[DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Hi all,

For once, I’m coming to you with a topic that is not strictly about FoundationDB :)

CouchDB offers a few config settings (some of them undocumented) to put a limit on how long the server is allowed to take to generate a response. The trouble with many of these timeouts is that, when they fire, they do not actually clean up all of the work that they initiated. A couple of examples:

- Each HTTP response coordinated by the “fabric” application spawns several ephemeral processes via “rexi" on different nodes in the cluster to retrieve data and send it back to the process coordinating the response. If the request timeout fires, the coordinating process will be killed off, but the ephemeral workers might not be. In a healthy cluster they’ll exit on their own when they finish their jobs, but there are conditions under which they can sit around for extended periods of time waiting for an overloaded gen_server (e.g. couch_server) to respond.

- Those named gen_servers (like couch_server) responsible for serializing access to important data structures will dutifully process messages received from old requests without any regard for (of even knowledge of) the fact that the client that sent the message timed out long ago. This can lead to a sort of death spiral in which the gen_server is ultimately spending ~all of its time serving dead clients and every client is timing out.

I’d like to see us introduce a documented maximum request duration for all requests except the _changes feed, and then use that information to aid in load shedding throughout the stack. We can audit the codebase for gen_server calls with long timeouts (I know of a few on the critical path that set their timeouts to `infinity`) and we can design servers that efficiently drop old requests, knowing that the client who made the request must have timed out. A couple of topics for discussion:

- the “gen_server that sheds old requests” is a very generic pattern, one that seems like it could be well-suited to its own behaviour. A cursory search of the internet didn’t turn up any prior art here, which surprises me a bit. I’m wondering if this is worth bringing up with the broader Erlang community.

- setting and enforcing timeouts is a healthy pattern for read-only requests as it gives a lot more feedback to clients about the health of the server. When it comes to updates things are a little bit more muddy, just because there remains a chance that an update can be committed, but the caller times out before learning of the successful commit. We should try to minimize the likelihood of that occurring.

Cheers, Adam

P.S. I did say that this wasn’t _strictly_ about FoundationDB, but of course FDB has a hard 5 second limit on all transactions, so it is a bit of a forcing function :).Even putting FoundationDB aside, I would still argue to pursue this path based on our Ops experience with the current codebase.

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Adam Kocoloski <ko...@apache.org>.

Hi Joan, great topic.

We don’t have enough realistic benchmarking data to be really specific yet, but my expectation is that the aggregate size of the underlying KV pairs is at least as important as number of documents in the batch. I have no doubt we’ll be able to ingest 1,000 1KB documents within a single operation, but bump those up to 1MB each and it’ll be a different story.

I do expect our write throughput with FoundationDB to be quite healthy, and scalable. The use of a separate transaction log allows the system to nicely accommodate shorts bursts of heavy data ingestion.

On this topic, it’s also worth mentioning that FoundationDB allows us (if we want) to recover the notion of a truly atomic batch write. FDB uses optimistic concurrency control underneath so it’s not a situation where other writers get blocked like they do in CouchDB 1.x. If we do use this feature then the atomic batch has to be under 10MB (including revision metadata and any indexes). Batch operations that are not flagged as atomic can be split into separate transactions, or grouped together for performance (I believe Paul’s current prototype defaults to a separate transaction per doc in the batch, and submits them all in parallel).

Cheers, Adam

> On Apr 26, 2019, at 4:19 PM, Joan Touzet <wo...@apache.org> wrote:
> 
> Hi Adam,
> 
> I'll bring up a concern from a recent client with whom I engaged.
> 
> They're on 1.x. On 1.x they have been doing 50k bulk update operations in a single request. 1.x doesn't time out. The updates are such that they guarantee that none will result in a conflict or be rejected, so all 50k are accepted. They do this so it appears atomic to the next reader - a read from another client can't occur in the middle of the big update, because we have a single couch_file in 1.x.
> 
> Obviously, in 2.x this doesn't work on two levels. First, there's multiple readers and writers across a cluster, so the big bulk operation doesn't act as a blocker until it's finished for any interposed reads. Second, you can't reliably finish 50k updates in a single batch in a cluster anyway, because you'll probably hit the fabric timeout, if not other cluster timeouts.
> 
> As a general rule of thumb, I advise people to keep bulk document updates to no more than batches of 1k at a time, with the understanding that in 2.x these are not treated as an atomic transaction (and they weren't strictly that way in 1.x, either, but never mind that...)
> 
> If we decide as a project that all operations must take less than 5 seconds, we're probably going to have to reduce the bulk update batch size even further. I'm betting 100 would be the upper bound on bulk updates.
> 
> Is this going to impose a significant performance penalty on bulk ops?
> 
> -Joan
> 
>> On 2019-04-26 3:30 p.m., Adam Kocoloski wrote:
>> Hi all,
>> The point I’m on is that we should take advantage of this extra bit of information that we acquire out-of-band (e.g. we just decide as a project that all operations take less than 5 seconds) and come up with smarter / cheaper / faster ways of doing load shedding based on that information.
>> For example, yes it could be interesting to use is_process_alive/1 to see if a client is still hanging around, and have the gen_server discard the work otherwise. It might also be too expensive to matter; I’m not sure anyone here has a good a priori sense of the cost of that call. But I’d certainly wager it’s more expensive than calling timer:now_diff/2 in the server and discarding any requests that were submitted more than 5 seconds ago.
>> Most of our timeout / cleanup solutions to date have been focused top-down, without making any assumptions about the behavior of the workers or servers underneath. I think we should try to approach this problem bottoms-up, forcing every call to complete within 5 seconds and handling timeouts correctly as they bubble up.
>> Adam
>>> On Apr 23, 2019, at 2:48 PM, Nick Vatamaniuc <va...@gmail.com> wrote:
>>> 
>>> We don't spawn (/link) or monitor remote processes, just monitor the local
>>> coordinator process. That should cheaper performance-wise. It's also for
>>> relatively long running streaming fabric requests (changes, all_docs). But
>>> you're right, perhaps doing these for shorter requests (doc updates, doc
>>> GETs) might become noticeable. Perhaps a pool of reusable monitoring
>>> processes work there...
>>> 
>>> For couch_server timeouts. I wonder if we can do a simpler thing and
>>> inspect the `From` part of each call and if the Pid is not alive drop the
>>> requestor at least avoid doing any expensive processing. For casts it might
>>> involve sending a sender Pid in the message. That doesn't address timeouts,
>>> just the case where the coordinating process went away while the message
>>> was stuck in the long message queue.
>>> 
>>>> On Mon, Apr 22, 2019 at 4:32 PM Robert Newson <rn...@apache.org> wrote:
>>>> 
>>>> My memory is fuzzy, but those items sound a lot like what happens with
>>>> rex, that motivated us (i.e, Adam) to build rexi, which deliberately does
>>>> less than the stock approach.
>>>> 
>>>> --
>>>>  Robert Samuel Newson
>>>>  rnewson@apache.org
>>>> 
>>>>> On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote:
>>>>> Hi everyone,
>>>>> 
>>>>> We partially implement the first part (cleaning rexi workers) for all
>>>>> the
>>>>> fabric streaming requests. Which should be all_docs, changes, view map,
>>>>> view reduce:
>>>>> 
>>>> https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d
>>>>> 
>>>>> The pattern there is the following:
>>>>> 
>>>>> - With every request spawn a monitoring process that is in charge of
>>>>> keeping track of all the workers as they are spawned.
>>>>> - If regular cleanup takes place, then this monitoring process is
>>>> killed,
>>>>> to avoid sending double the number of kill messages to workers.
>>>>> - If the coordinating process doesn't run cleanup and just dies, the
>>>>> monitoring process will performs cleanup on its behalf.
>>>>> 
>>>>> Cheers,
>>>>> -Nick
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson <rnewson@apache.org
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> My view is a) the server was unavailable for this request due to all
>>>> the
>>>>>> other requests it’s currently dealing with b) the connection was not
>>>> idle,
>>>>>> the client is not at fault.
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>>> On 18 Apr 2019, at 22:03, Done Collectively <sa...@inator.biz>
>>>> wrote:
>>>>>>> 
>>>>>>> Any reason 408 would be undesirable?
>>>>>>> 
>>>>>>> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rn...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> 503 imo.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Robert Samuel Newson
>>>>>>>> rnewson@apache.org
>>>>>>>> 
>>>>>>>>> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
>>>>>>>>> Yes, we should. Currently it’s a 500, maybe there’s something more
>>>>>>>> appropriate:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
>>>>>>>>> 
>>>>>>>>> Adam
>>>>>>>>> 
>>>>>>>>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> What happens when it turns out the client *hasn't* timed out and
>>>> we
>>>>>>>>>> just...hang up on them? Should we consider at least trying to send
>>>>>> back
>>>>>>>>>> some sort of HTTP status code?
>>>>>>>>>> 
>>>>>>>>>> -Joan
>>>>>>>>>> 
>>>>>>>>>>> On 2019-04-18 10:58, Garren Smith wrote:
>>>>>>>>>>> I'm +1 on this. With partition queries, we added a few more
>>>> timeouts
>>>>>>>> that
>>>>>>>>>>> can be enabled which Cloudant enable. So having the ability to
>>>> shed
>>>>>>>> old
>>>>>>>>>>> requests when these timeouts get hit would be great.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers
>>>>>>>>>>> Garren
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <
>>>> kocolosk@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> For once, I’m coming to you with a topic that is not strictly
>>>> about
>>>>>>>>>>>> FoundationDB :)
>>>>>>>>>>>> 
>>>>>>>>>>>> CouchDB offers a few config settings (some of them
>>>> undocumented) to
>>>>>>>> put a
>>>>>>>>>>>> limit on how long the server is allowed to take to generate a
>>>>>>>> response. The
>>>>>>>>>>>> trouble with many of these timeouts is that, when they fire,
>>>> they do
>>>>>>>> not
>>>>>>>>>>>> actually clean up all of the work that they initiated. A couple
>>>> of
>>>>>>>> examples:
>>>>>>>>>>>> 
>>>>>>>>>>>> - Each HTTP response coordinated by the “fabric” application
>>>> spawns
>>>>>>>>>>>> several ephemeral processes via “rexi" on different nodes in the
>>>>>>>> cluster to
>>>>>>>>>>>> retrieve data and send it back to the process coordinating the
>>>>>>>> response. If
>>>>>>>>>>>> the request timeout fires, the coordinating process will be
>>>> killed
>>>>>>>> off, but
>>>>>>>>>>>> the ephemeral workers might not be. In a healthy cluster they’ll
>>>>>>>> exit on
>>>>>>>>>>>> their own when they finish their jobs, but there are conditions
>>>>>>>> under which
>>>>>>>>>>>> they can sit around for extended periods of time waiting for an
>>>>>>>> overloaded
>>>>>>>>>>>> gen_server (e.g. couch_server) to respond.
>>>>>>>>>>>> 
>>>>>>>>>>>> - Those named gen_servers (like couch_server) responsible for
>>>>>>>> serializing
>>>>>>>>>>>> access to important data structures will dutifully process
>>>> messages
>>>>>>>>>>>> received from old requests without any regard for (of even
>>>> knowledge
>>>>>>>> of)
>>>>>>>>>>>> the fact that the client that sent the message timed out long
>>>> ago.
>>>>>>>> This can
>>>>>>>>>>>> lead to a sort of death spiral in which the gen_server is
>>>> ultimately
>>>>>>>>>>>> spending ~all of its time serving dead clients and every client
>>>> is
>>>>>>>> timing
>>>>>>>>>>>> out.
>>>>>>>>>>>> 
>>>>>>>>>>>> I’d like to see us introduce a documented maximum request
>>>> duration
>>>>>>>> for all
>>>>>>>>>>>> requests except the _changes feed, and then use that
>>>> information to
>>>>>>>> aid in
>>>>>>>>>>>> load shedding throughout the stack. We can audit the codebase
>>>> for
>>>>>>>>>>>> gen_server calls with long timeouts (I know of a few on the
>>>> critical
>>>>>>>> path
>>>>>>>>>>>> that set their timeouts to `infinity`) and we can design servers
>>>>>> that
>>>>>>>>>>>> efficiently drop old requests, knowing that the client who made
>>>> the
>>>>>>>> request
>>>>>>>>>>>> must have timed out. A couple of topics for discussion:
>>>>>>>>>>>> 
>>>>>>>>>>>> - the “gen_server that sheds old requests” is a very generic
>>>>>>>> pattern, one
>>>>>>>>>>>> that seems like it could be well-suited to its own behaviour. A
>>>>>>>> cursory
>>>>>>>>>>>> search of the internet didn’t turn up any prior art here, which
>>>>>>>> surprises
>>>>>>>>>>>> me a bit. I’m wondering if this is worth bringing up with the
>>>>>> broader
>>>>>>>>>>>> Erlang community.
>>>>>>>>>>>> 
>>>>>>>>>>>> - setting and enforcing timeouts is a healthy pattern for
>>>> read-only
>>>>>>>>>>>> requests as it gives a lot more feedback to clients about the
>>>> health
>>>>>>>> of the
>>>>>>>>>>>> server. When it comes to updates things are a little bit more
>>>> muddy,
>>>>>>>> just
>>>>>>>>>>>> because there remains a chance that an update can be committed,
>>>> but
>>>>>>>> the
>>>>>>>>>>>> caller times out before learning of the successful commit. We
>>>> should
>>>>>>>> try to
>>>>>>>>>>>> minimize the likelihood of that occurring.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers, Adam
>>>>>>>>>>>> 
>>>>>>>>>>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB,
>>>> but
>>>>>> of
>>>>>>>>>>>> course FDB has a hard 5 second limit on all transactions, so it
>>>> is a
>>>>>>>> bit of
>>>>>>>>>>>> a forcing function :).Even putting FoundationDB aside, I would
>>>> still
>>>>>>>> argue
>>>>>>>>>>>> to pursue this path based on our Ops experience with the current
>>>>>>>> codebase.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Joan Touzet <wo...@apache.org>.

Hi Adam,

I'll bring up a concern from a recent client with whom I engaged.

They're on 1.x. On 1.x they have been doing 50k bulk update operations 
in a single request. 1.x doesn't time out. The updates are such that 
they guarantee that none will result in a conflict or be rejected, so 
all 50k are accepted. They do this so it appears atomic to the next 
reader - a read from another client can't occur in the middle of the big 
update, because we have a single couch_file in 1.x.

Obviously, in 2.x this doesn't work on two levels. First, there's 
multiple readers and writers across a cluster, so the big bulk operation 
doesn't act as a blocker until it's finished for any interposed reads. 
Second, you can't reliably finish 50k updates in a single batch in a 
cluster anyway, because you'll probably hit the fabric timeout, if not 
other cluster timeouts.

As a general rule of thumb, I advise people to keep bulk document 
updates to no more than batches of 1k at a time, with the understanding 
that in 2.x these are not treated as an atomic transaction (and they 
weren't strictly that way in 1.x, either, but never mind that...)

If we decide as a project that all operations must take less than 5 
seconds, we're probably going to have to reduce the bulk update batch 
size even further. I'm betting 100 would be the upper bound on bulk updates.

Is this going to impose a significant performance penalty on bulk ops?

-Joan

On 2019-04-26 3:30 p.m., Adam Kocoloski wrote:
> Hi all,
> 
> The point I’m on is that we should take advantage of this extra bit of information that we acquire out-of-band (e.g. we just decide as a project that all operations take less than 5 seconds) and come up with smarter / cheaper / faster ways of doing load shedding based on that information.
> 
> For example, yes it could be interesting to use is_process_alive/1 to see if a client is still hanging around, and have the gen_server discard the work otherwise. It might also be too expensive to matter; I’m not sure anyone here has a good a priori sense of the cost of that call. But I’d certainly wager it’s more expensive than calling timer:now_diff/2 in the server and discarding any requests that were submitted more than 5 seconds ago.
> 
> Most of our timeout / cleanup solutions to date have been focused top-down, without making any assumptions about the behavior of the workers or servers underneath. I think we should try to approach this problem bottoms-up, forcing every call to complete within 5 seconds and handling timeouts correctly as they bubble up.
> 
> Adam
> 
>> On Apr 23, 2019, at 2:48 PM, Nick Vatamaniuc <va...@gmail.com> wrote:
>>
>> We don't spawn (/link) or monitor remote processes, just monitor the local
>> coordinator process. That should cheaper performance-wise. It's also for
>> relatively long running streaming fabric requests (changes, all_docs). But
>> you're right, perhaps doing these for shorter requests (doc updates, doc
>> GETs) might become noticeable. Perhaps a pool of reusable monitoring
>> processes work there...
>>
>> For couch_server timeouts. I wonder if we can do a simpler thing and
>> inspect the `From` part of each call and if the Pid is not alive drop the
>> requestor at least avoid doing any expensive processing. For casts it might
>> involve sending a sender Pid in the message. That doesn't address timeouts,
>> just the case where the coordinating process went away while the message
>> was stuck in the long message queue.
>>
>> On Mon, Apr 22, 2019 at 4:32 PM Robert Newson <rn...@apache.org> wrote:
>>
>>> My memory is fuzzy, but those items sound a lot like what happens with
>>> rex, that motivated us (i.e, Adam) to build rexi, which deliberately does
>>> less than the stock approach.
>>>
>>> --
>>>   Robert Samuel Newson
>>>   rnewson@apache.org
>>>
>>> On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote:
>>>> Hi everyone,
>>>>
>>>> We partially implement the first part (cleaning rexi workers) for all
>>>> the
>>>> fabric streaming requests. Which should be all_docs, changes, view map,
>>>> view reduce:
>>>>
>>> https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d
>>>>
>>>> The pattern there is the following:
>>>>
>>>> - With every request spawn a monitoring process that is in charge of
>>>> keeping track of all the workers as they are spawned.
>>>> - If regular cleanup takes place, then this monitoring process is
>>> killed,
>>>> to avoid sending double the number of kill messages to workers.
>>>> - If the coordinating process doesn't run cleanup and just dies, the
>>>> monitoring process will performs cleanup on its behalf.
>>>>
>>>> Cheers,
>>>> -Nick
>>>>
>>>>
>>>>
>>>> On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson <rnewson@apache.org
>>>>
>>>> wrote:
>>>>
>>>>> My view is a) the server was unavailable for this request due to all
>>> the
>>>>> other requests it’s currently dealing with b) the connection was not
>>> idle,
>>>>> the client is not at fault.
>>>>>
>>>>> B.
>>>>>
>>>>>> On 18 Apr 2019, at 22:03, Done Collectively <sa...@inator.biz>
>>> wrote:
>>>>>>
>>>>>> Any reason 408 would be undesirable?
>>>>>>
>>>>>> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rn...@apache.org>
>>>>> wrote:
>>>>>>
>>>>>>> 503 imo.
>>>>>>>
>>>>>>> --
>>>>>>> Robert Samuel Newson
>>>>>>> rnewson@apache.org
>>>>>>>
>>>>>>> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
>>>>>>>> Yes, we should. Currently it’s a 500, maybe there’s something more
>>>>>>> appropriate:
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
>>>>>>>>
>>>>>>>> Adam
>>>>>>>>
>>>>>>>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org>
>>> wrote:
>>>>>>>>>
>>>>>>>>> What happens when it turns out the client *hasn't* timed out and
>>> we
>>>>>>>>> just...hang up on them? Should we consider at least trying to send
>>>>> back
>>>>>>>>> some sort of HTTP status code?
>>>>>>>>>
>>>>>>>>> -Joan
>>>>>>>>>
>>>>>>>>> On 2019-04-18 10:58, Garren Smith wrote:
>>>>>>>>>> I'm +1 on this. With partition queries, we added a few more
>>> timeouts
>>>>>>> that
>>>>>>>>>> can be enabled which Cloudant enable. So having the ability to
>>> shed
>>>>>>> old
>>>>>>>>>> requests when these timeouts get hit would be great.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Garren
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <
>>> kocolosk@apache.org>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> For once, I’m coming to you with a topic that is not strictly
>>> about
>>>>>>>>>>> FoundationDB :)
>>>>>>>>>>>
>>>>>>>>>>> CouchDB offers a few config settings (some of them
>>> undocumented) to
>>>>>>> put a
>>>>>>>>>>> limit on how long the server is allowed to take to generate a
>>>>>>> response. The
>>>>>>>>>>> trouble with many of these timeouts is that, when they fire,
>>> they do
>>>>>>> not
>>>>>>>>>>> actually clean up all of the work that they initiated. A couple
>>> of
>>>>>>> examples:
>>>>>>>>>>>
>>>>>>>>>>> - Each HTTP response coordinated by the “fabric” application
>>> spawns
>>>>>>>>>>> several ephemeral processes via “rexi" on different nodes in the
>>>>>>> cluster to
>>>>>>>>>>> retrieve data and send it back to the process coordinating the
>>>>>>> response. If
>>>>>>>>>>> the request timeout fires, the coordinating process will be
>>> killed
>>>>>>> off, but
>>>>>>>>>>> the ephemeral workers might not be. In a healthy cluster they’ll
>>>>>>> exit on
>>>>>>>>>>> their own when they finish their jobs, but there are conditions
>>>>>>> under which
>>>>>>>>>>> they can sit around for extended periods of time waiting for an
>>>>>>> overloaded
>>>>>>>>>>> gen_server (e.g. couch_server) to respond.
>>>>>>>>>>>
>>>>>>>>>>> - Those named gen_servers (like couch_server) responsible for
>>>>>>> serializing
>>>>>>>>>>> access to important data structures will dutifully process
>>> messages
>>>>>>>>>>> received from old requests without any regard for (of even
>>> knowledge
>>>>>>> of)
>>>>>>>>>>> the fact that the client that sent the message timed out long
>>> ago.
>>>>>>> This can
>>>>>>>>>>> lead to a sort of death spiral in which the gen_server is
>>> ultimately
>>>>>>>>>>> spending ~all of its time serving dead clients and every client
>>> is
>>>>>>> timing
>>>>>>>>>>> out.
>>>>>>>>>>>
>>>>>>>>>>> I’d like to see us introduce a documented maximum request
>>> duration
>>>>>>> for all
>>>>>>>>>>> requests except the _changes feed, and then use that
>>> information to
>>>>>>> aid in
>>>>>>>>>>> load shedding throughout the stack. We can audit the codebase
>>> for
>>>>>>>>>>> gen_server calls with long timeouts (I know of a few on the
>>> critical
>>>>>>> path
>>>>>>>>>>> that set their timeouts to `infinity`) and we can design servers
>>>>> that
>>>>>>>>>>> efficiently drop old requests, knowing that the client who made
>>> the
>>>>>>> request
>>>>>>>>>>> must have timed out. A couple of topics for discussion:
>>>>>>>>>>>
>>>>>>>>>>> - the “gen_server that sheds old requests” is a very generic
>>>>>>> pattern, one
>>>>>>>>>>> that seems like it could be well-suited to its own behaviour. A
>>>>>>> cursory
>>>>>>>>>>> search of the internet didn’t turn up any prior art here, which
>>>>>>> surprises
>>>>>>>>>>> me a bit. I’m wondering if this is worth bringing up with the
>>>>> broader
>>>>>>>>>>> Erlang community.
>>>>>>>>>>>
>>>>>>>>>>> - setting and enforcing timeouts is a healthy pattern for
>>> read-only
>>>>>>>>>>> requests as it gives a lot more feedback to clients about the
>>> health
>>>>>>> of the
>>>>>>>>>>> server. When it comes to updates things are a little bit more
>>> muddy,
>>>>>>> just
>>>>>>>>>>> because there remains a chance that an update can be committed,
>>> but
>>>>>>> the
>>>>>>>>>>> caller times out before learning of the successful commit. We
>>> should
>>>>>>> try to
>>>>>>>>>>> minimize the likelihood of that occurring.
>>>>>>>>>>>
>>>>>>>>>>> Cheers, Adam
>>>>>>>>>>>
>>>>>>>>>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB,
>>> but
>>>>> of
>>>>>>>>>>> course FDB has a hard 5 second limit on all transactions, so it
>>> is a
>>>>>>> bit of
>>>>>>>>>>> a forcing function :).Even putting FoundationDB aside, I would
>>> still
>>>>>>> argue
>>>>>>>>>>> to pursue this path based on our Ops experience with the current
>>>>>>> codebase.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Adam Kocoloski <ko...@apache.org>.

Hi all,

The point I’m on is that we should take advantage of this extra bit of information that we acquire out-of-band (e.g. we just decide as a project that all operations take less than 5 seconds) and come up with smarter / cheaper / faster ways of doing load shedding based on that information.

For example, yes it could be interesting to use is_process_alive/1 to see if a client is still hanging around, and have the gen_server discard the work otherwise. It might also be too expensive to matter; I’m not sure anyone here has a good a priori sense of the cost of that call. But I’d certainly wager it’s more expensive than calling timer:now_diff/2 in the server and discarding any requests that were submitted more than 5 seconds ago.

Most of our timeout / cleanup solutions to date have been focused top-down, without making any assumptions about the behavior of the workers or servers underneath. I think we should try to approach this problem bottoms-up, forcing every call to complete within 5 seconds and handling timeouts correctly as they bubble up.

Adam

> On Apr 23, 2019, at 2:48 PM, Nick Vatamaniuc <va...@gmail.com> wrote:
> 
> We don't spawn (/link) or monitor remote processes, just monitor the local
> coordinator process. That should cheaper performance-wise. It's also for
> relatively long running streaming fabric requests (changes, all_docs). But
> you're right, perhaps doing these for shorter requests (doc updates, doc
> GETs) might become noticeable. Perhaps a pool of reusable monitoring
> processes work there...
> 
> For couch_server timeouts. I wonder if we can do a simpler thing and
> inspect the `From` part of each call and if the Pid is not alive drop the
> requestor at least avoid doing any expensive processing. For casts it might
> involve sending a sender Pid in the message. That doesn't address timeouts,
> just the case where the coordinating process went away while the message
> was stuck in the long message queue.
> 
> On Mon, Apr 22, 2019 at 4:32 PM Robert Newson <rn...@apache.org> wrote:
> 
>> My memory is fuzzy, but those items sound a lot like what happens with
>> rex, that motivated us (i.e, Adam) to build rexi, which deliberately does
>> less than the stock approach.
>> 
>> --
>>  Robert Samuel Newson
>>  rnewson@apache.org
>> 
>> On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote:
>>> Hi everyone,
>>> 
>>> We partially implement the first part (cleaning rexi workers) for all
>>> the
>>> fabric streaming requests. Which should be all_docs, changes, view map,
>>> view reduce:
>>> 
>> https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d
>>> 
>>> The pattern there is the following:
>>> 
>>> - With every request spawn a monitoring process that is in charge of
>>> keeping track of all the workers as they are spawned.
>>> - If regular cleanup takes place, then this monitoring process is
>> killed,
>>> to avoid sending double the number of kill messages to workers.
>>> - If the coordinating process doesn't run cleanup and just dies, the
>>> monitoring process will performs cleanup on its behalf.
>>> 
>>> Cheers,
>>> -Nick
>>> 
>>> 
>>> 
>>> On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson <rnewson@apache.org
>>> 
>>> wrote:
>>> 
>>>> My view is a) the server was unavailable for this request due to all
>> the
>>>> other requests it’s currently dealing with b) the connection was not
>> idle,
>>>> the client is not at fault.
>>>> 
>>>> B.
>>>> 
>>>>> On 18 Apr 2019, at 22:03, Done Collectively <sa...@inator.biz>
>> wrote:
>>>>> 
>>>>> Any reason 408 would be undesirable?
>>>>> 
>>>>> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408
>>>>> 
>>>>> 
>>>>> On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rn...@apache.org>
>>>> wrote:
>>>>> 
>>>>>> 503 imo.
>>>>>> 
>>>>>> --
>>>>>> Robert Samuel Newson
>>>>>> rnewson@apache.org
>>>>>> 
>>>>>> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
>>>>>>> Yes, we should. Currently it’s a 500, maybe there’s something more
>>>>>> appropriate:
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
>>>>>>> 
>>>>>>> Adam
>>>>>>> 
>>>>>>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>> What happens when it turns out the client *hasn't* timed out and
>> we
>>>>>>>> just...hang up on them? Should we consider at least trying to send
>>>> back
>>>>>>>> some sort of HTTP status code?
>>>>>>>> 
>>>>>>>> -Joan
>>>>>>>> 
>>>>>>>> On 2019-04-18 10:58, Garren Smith wrote:
>>>>>>>>> I'm +1 on this. With partition queries, we added a few more
>> timeouts
>>>>>> that
>>>>>>>>> can be enabled which Cloudant enable. So having the ability to
>> shed
>>>>>> old
>>>>>>>>> requests when these timeouts get hit would be great.
>>>>>>>>> 
>>>>>>>>> Cheers
>>>>>>>>> Garren
>>>>>>>>> 
>>>>>>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <
>> kocolosk@apache.org>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> For once, I’m coming to you with a topic that is not strictly
>> about
>>>>>>>>>> FoundationDB :)
>>>>>>>>>> 
>>>>>>>>>> CouchDB offers a few config settings (some of them
>> undocumented) to
>>>>>> put a
>>>>>>>>>> limit on how long the server is allowed to take to generate a
>>>>>> response. The
>>>>>>>>>> trouble with many of these timeouts is that, when they fire,
>> they do
>>>>>> not
>>>>>>>>>> actually clean up all of the work that they initiated. A couple
>> of
>>>>>> examples:
>>>>>>>>>> 
>>>>>>>>>> - Each HTTP response coordinated by the “fabric” application
>> spawns
>>>>>>>>>> several ephemeral processes via “rexi" on different nodes in the
>>>>>> cluster to
>>>>>>>>>> retrieve data and send it back to the process coordinating the
>>>>>> response. If
>>>>>>>>>> the request timeout fires, the coordinating process will be
>> killed
>>>>>> off, but
>>>>>>>>>> the ephemeral workers might not be. In a healthy cluster they’ll
>>>>>> exit on
>>>>>>>>>> their own when they finish their jobs, but there are conditions
>>>>>> under which
>>>>>>>>>> they can sit around for extended periods of time waiting for an
>>>>>> overloaded
>>>>>>>>>> gen_server (e.g. couch_server) to respond.
>>>>>>>>>> 
>>>>>>>>>> - Those named gen_servers (like couch_server) responsible for
>>>>>> serializing
>>>>>>>>>> access to important data structures will dutifully process
>> messages
>>>>>>>>>> received from old requests without any regard for (of even
>> knowledge
>>>>>> of)
>>>>>>>>>> the fact that the client that sent the message timed out long
>> ago.
>>>>>> This can
>>>>>>>>>> lead to a sort of death spiral in which the gen_server is
>> ultimately
>>>>>>>>>> spending ~all of its time serving dead clients and every client
>> is
>>>>>> timing
>>>>>>>>>> out.
>>>>>>>>>> 
>>>>>>>>>> I’d like to see us introduce a documented maximum request
>> duration
>>>>>> for all
>>>>>>>>>> requests except the _changes feed, and then use that
>> information to
>>>>>> aid in
>>>>>>>>>> load shedding throughout the stack. We can audit the codebase
>> for
>>>>>>>>>> gen_server calls with long timeouts (I know of a few on the
>> critical
>>>>>> path
>>>>>>>>>> that set their timeouts to `infinity`) and we can design servers
>>>> that
>>>>>>>>>> efficiently drop old requests, knowing that the client who made
>> the
>>>>>> request
>>>>>>>>>> must have timed out. A couple of topics for discussion:
>>>>>>>>>> 
>>>>>>>>>> - the “gen_server that sheds old requests” is a very generic
>>>>>> pattern, one
>>>>>>>>>> that seems like it could be well-suited to its own behaviour. A
>>>>>> cursory
>>>>>>>>>> search of the internet didn’t turn up any prior art here, which
>>>>>> surprises
>>>>>>>>>> me a bit. I’m wondering if this is worth bringing up with the
>>>> broader
>>>>>>>>>> Erlang community.
>>>>>>>>>> 
>>>>>>>>>> - setting and enforcing timeouts is a healthy pattern for
>> read-only
>>>>>>>>>> requests as it gives a lot more feedback to clients about the
>> health
>>>>>> of the
>>>>>>>>>> server. When it comes to updates things are a little bit more
>> muddy,
>>>>>> just
>>>>>>>>>> because there remains a chance that an update can be committed,
>> but
>>>>>> the
>>>>>>>>>> caller times out before learning of the successful commit. We
>> should
>>>>>> try to
>>>>>>>>>> minimize the likelihood of that occurring.
>>>>>>>>>> 
>>>>>>>>>> Cheers, Adam
>>>>>>>>>> 
>>>>>>>>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB,
>> but
>>>> of
>>>>>>>>>> course FDB has a hard 5 second limit on all transactions, so it
>> is a
>>>>>> bit of
>>>>>>>>>> a forcing function :).Even putting FoundationDB aside, I would
>> still
>>>>>> argue
>>>>>>>>>> to pursue this path based on our Ops experience with the current
>>>>>> codebase.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Nick Vatamaniuc <va...@gmail.com>.

We don't spawn (/link) or monitor remote processes, just monitor the local
coordinator process. That should cheaper performance-wise. It's also for
relatively long running streaming fabric requests (changes, all_docs). But
you're right, perhaps doing these for shorter requests (doc updates, doc
GETs) might become noticeable. Perhaps a pool of reusable monitoring
processes work there...

For couch_server timeouts. I wonder if we can do a simpler thing and
inspect the `From` part of each call and if the Pid is not alive drop the
requestor at least avoid doing any expensive processing. For casts it might
involve sending a sender Pid in the message. That doesn't address timeouts,
just the case where the coordinating process went away while the message
was stuck in the long message queue.

On Mon, Apr 22, 2019 at 4:32 PM Robert Newson <rn...@apache.org> wrote:

> My memory is fuzzy, but those items sound a lot like what happens with
> rex, that motivated us (i.e, Adam) to build rexi, which deliberately does
> less than the stock approach.
>
> --
>   Robert Samuel Newson
>   rnewson@apache.org
>
> On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote:
> > Hi everyone,
> >
> > We partially implement the first part (cleaning rexi workers) for all
> > the
> > fabric streaming requests. Which should be all_docs, changes, view map,
> > view reduce:
> >
> https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d
> >
> > The pattern there is the following:
> >
> >  - With every request spawn a monitoring process that is in charge of
> > keeping track of all the workers as they are spawned.
> >  - If regular cleanup takes place, then this monitoring process is
> killed,
> > to avoid sending double the number of kill messages to workers.
> >  - If the coordinating process doesn't run cleanup and just dies, the
> > monitoring process will performs cleanup on its behalf.
> >
> > Cheers,
> > -Nick
> >
> >
> >
> > On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson <rnewson@apache.org
> >
> > wrote:
> >
> > > My view is a) the server was unavailable for this request due to all
> the
> > > other requests it’s currently dealing with b) the connection was not
> idle,
> > > the client is not at fault.
> > >
> > > B.
> > >
> > > > On 18 Apr 2019, at 22:03, Done Collectively <sa...@inator.biz>
> wrote:
> > > >
> > > > Any reason 408 would be undesirable?
> > > >
> > > > https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408
> > > >
> > > >
> > > > On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rn...@apache.org>
> > > wrote:
> > > >
> > > >> 503 imo.
> > > >>
> > > >> --
> > > >>  Robert Samuel Newson
> > > >>  rnewson@apache.org
> > > >>
> > > >> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
> > > >>> Yes, we should. Currently it’s a 500, maybe there’s something more
> > > >> appropriate:
> > > >>>
> > > >>>
> > > >>
> > >
> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
> > > >>>
> > > >>> Adam
> > > >>>
> > > >>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org>
> wrote:
> > > >>>>
> > > >>>> What happens when it turns out the client *hasn't* timed out and
> we
> > > >>>> just...hang up on them? Should we consider at least trying to send
> > > back
> > > >>>> some sort of HTTP status code?
> > > >>>>
> > > >>>> -Joan
> > > >>>>
> > > >>>> On 2019-04-18 10:58, Garren Smith wrote:
> > > >>>>> I'm +1 on this. With partition queries, we added a few more
> timeouts
> > > >> that
> > > >>>>> can be enabled which Cloudant enable. So having the ability to
> shed
> > > >> old
> > > >>>>> requests when these timeouts get hit would be great.
> > > >>>>>
> > > >>>>> Cheers
> > > >>>>> Garren
> > > >>>>>
> > > >>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <
> kocolosk@apache.org>
> > > >> wrote:
> > > >>>>>
> > > >>>>>> Hi all,
> > > >>>>>>
> > > >>>>>> For once, I’m coming to you with a topic that is not strictly
> about
> > > >>>>>> FoundationDB :)
> > > >>>>>>
> > > >>>>>> CouchDB offers a few config settings (some of them
> undocumented) to
> > > >> put a
> > > >>>>>> limit on how long the server is allowed to take to generate a
> > > >> response. The
> > > >>>>>> trouble with many of these timeouts is that, when they fire,
> they do
> > > >> not
> > > >>>>>> actually clean up all of the work that they initiated. A couple
> of
> > > >> examples:
> > > >>>>>>
> > > >>>>>> - Each HTTP response coordinated by the “fabric” application
> spawns
> > > >>>>>> several ephemeral processes via “rexi" on different nodes in the
> > > >> cluster to
> > > >>>>>> retrieve data and send it back to the process coordinating the
> > > >> response. If
> > > >>>>>> the request timeout fires, the coordinating process will be
> killed
> > > >> off, but
> > > >>>>>> the ephemeral workers might not be. In a healthy cluster they’ll
> > > >> exit on
> > > >>>>>> their own when they finish their jobs, but there are conditions
> > > >> under which
> > > >>>>>> they can sit around for extended periods of time waiting for an
> > > >> overloaded
> > > >>>>>> gen_server (e.g. couch_server) to respond.
> > > >>>>>>
> > > >>>>>> - Those named gen_servers (like couch_server) responsible for
> > > >> serializing
> > > >>>>>> access to important data structures will dutifully process
> messages
> > > >>>>>> received from old requests without any regard for (of even
> knowledge
> > > >> of)
> > > >>>>>> the fact that the client that sent the message timed out long
> ago.
> > > >> This can
> > > >>>>>> lead to a sort of death spiral in which the gen_server is
> ultimately
> > > >>>>>> spending ~all of its time serving dead clients and every client
> is
> > > >> timing
> > > >>>>>> out.
> > > >>>>>>
> > > >>>>>> I’d like to see us introduce a documented maximum request
> duration
> > > >> for all
> > > >>>>>> requests except the _changes feed, and then use that
> information to
> > > >> aid in
> > > >>>>>> load shedding throughout the stack. We can audit the codebase
> for
> > > >>>>>> gen_server calls with long timeouts (I know of a few on the
> critical
> > > >> path
> > > >>>>>> that set their timeouts to `infinity`) and we can design servers
> > > that
> > > >>>>>> efficiently drop old requests, knowing that the client who made
> the
> > > >> request
> > > >>>>>> must have timed out. A couple of topics for discussion:
> > > >>>>>>
> > > >>>>>> - the “gen_server that sheds old requests” is a very generic
> > > >> pattern, one
> > > >>>>>> that seems like it could be well-suited to its own behaviour. A
> > > >> cursory
> > > >>>>>> search of the internet didn’t turn up any prior art here, which
> > > >> surprises
> > > >>>>>> me a bit. I’m wondering if this is worth bringing up with the
> > > broader
> > > >>>>>> Erlang community.
> > > >>>>>>
> > > >>>>>> - setting and enforcing timeouts is a healthy pattern for
> read-only
> > > >>>>>> requests as it gives a lot more feedback to clients about the
> health
> > > >> of the
> > > >>>>>> server. When it comes to updates things are a little bit more
> muddy,
> > > >> just
> > > >>>>>> because there remains a chance that an update can be committed,
> but
> > > >> the
> > > >>>>>> caller times out before learning of the successful commit. We
> should
> > > >> try to
> > > >>>>>> minimize the likelihood of that occurring.
> > > >>>>>>
> > > >>>>>> Cheers, Adam
> > > >>>>>>
> > > >>>>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB,
> but
> > > of
> > > >>>>>> course FDB has a hard 5 second limit on all transactions, so it
> is a
> > > >> bit of
> > > >>>>>> a forcing function :).Even putting FoundationDB aside, I would
> still
> > > >> argue
> > > >>>>>> to pursue this path based on our Ops experience with the current
> > > >> codebase.
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Robert Newson <rn...@apache.org>.

My memory is fuzzy, but those items sound a lot like what happens with rex, that motivated us (i.e, Adam) to build rexi, which deliberately does less than the stock approach.

-- 
  Robert Samuel Newson
  rnewson@apache.org

On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote:
> Hi everyone,
> 
> We partially implement the first part (cleaning rexi workers) for all 
> the
> fabric streaming requests. Which should be all_docs, changes, view map,
> view reduce:
> https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d
> 
> The pattern there is the following:
> 
>  - With every request spawn a monitoring process that is in charge of
> keeping track of all the workers as they are spawned.
>  - If regular cleanup takes place, then this monitoring process is killed,
> to avoid sending double the number of kill messages to workers.
>  - If the coordinating process doesn't run cleanup and just dies, the
> monitoring process will performs cleanup on its behalf.
> 
> Cheers,
> -Nick
> 
> 
> 
> On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson <rn...@apache.org>
> wrote:
> 
> > My view is a) the server was unavailable for this request due to all the
> > other requests it’s currently dealing with b) the connection was not idle,
> > the client is not at fault.
> >
> > B.
> >
> > > On 18 Apr 2019, at 22:03, Done Collectively <sa...@inator.biz> wrote:
> > >
> > > Any reason 408 would be undesirable?
> > >
> > > https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408
> > >
> > >
> > > On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rn...@apache.org>
> > wrote:
> > >
> > >> 503 imo.
> > >>
> > >> --
> > >>  Robert Samuel Newson
> > >>  rnewson@apache.org
> > >>
> > >> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
> > >>> Yes, we should. Currently it’s a 500, maybe there’s something more
> > >> appropriate:
> > >>>
> > >>>
> > >>
> > https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
> > >>>
> > >>> Adam
> > >>>
> > >>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org> wrote:
> > >>>>
> > >>>> What happens when it turns out the client *hasn't* timed out and we
> > >>>> just...hang up on them? Should we consider at least trying to send
> > back
> > >>>> some sort of HTTP status code?
> > >>>>
> > >>>> -Joan
> > >>>>
> > >>>> On 2019-04-18 10:58, Garren Smith wrote:
> > >>>>> I'm +1 on this. With partition queries, we added a few more timeouts
> > >> that
> > >>>>> can be enabled which Cloudant enable. So having the ability to shed
> > >> old
> > >>>>> requests when these timeouts get hit would be great.
> > >>>>>
> > >>>>> Cheers
> > >>>>> Garren
> > >>>>>
> > >>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <ko...@apache.org>
> > >> wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> For once, I’m coming to you with a topic that is not strictly about
> > >>>>>> FoundationDB :)
> > >>>>>>
> > >>>>>> CouchDB offers a few config settings (some of them undocumented) to
> > >> put a
> > >>>>>> limit on how long the server is allowed to take to generate a
> > >> response. The
> > >>>>>> trouble with many of these timeouts is that, when they fire, they do
> > >> not
> > >>>>>> actually clean up all of the work that they initiated. A couple of
> > >> examples:
> > >>>>>>
> > >>>>>> - Each HTTP response coordinated by the “fabric” application spawns
> > >>>>>> several ephemeral processes via “rexi" on different nodes in the
> > >> cluster to
> > >>>>>> retrieve data and send it back to the process coordinating the
> > >> response. If
> > >>>>>> the request timeout fires, the coordinating process will be killed
> > >> off, but
> > >>>>>> the ephemeral workers might not be. In a healthy cluster they’ll
> > >> exit on
> > >>>>>> their own when they finish their jobs, but there are conditions
> > >> under which
> > >>>>>> they can sit around for extended periods of time waiting for an
> > >> overloaded
> > >>>>>> gen_server (e.g. couch_server) to respond.
> > >>>>>>
> > >>>>>> - Those named gen_servers (like couch_server) responsible for
> > >> serializing
> > >>>>>> access to important data structures will dutifully process messages
> > >>>>>> received from old requests without any regard for (of even knowledge
> > >> of)
> > >>>>>> the fact that the client that sent the message timed out long ago.
> > >> This can
> > >>>>>> lead to a sort of death spiral in which the gen_server is ultimately
> > >>>>>> spending ~all of its time serving dead clients and every client is
> > >> timing
> > >>>>>> out.
> > >>>>>>
> > >>>>>> I’d like to see us introduce a documented maximum request duration
> > >> for all
> > >>>>>> requests except the _changes feed, and then use that information to
> > >> aid in
> > >>>>>> load shedding throughout the stack. We can audit the codebase for
> > >>>>>> gen_server calls with long timeouts (I know of a few on the critical
> > >> path
> > >>>>>> that set their timeouts to `infinity`) and we can design servers
> > that
> > >>>>>> efficiently drop old requests, knowing that the client who made the
> > >> request
> > >>>>>> must have timed out. A couple of topics for discussion:
> > >>>>>>
> > >>>>>> - the “gen_server that sheds old requests” is a very generic
> > >> pattern, one
> > >>>>>> that seems like it could be well-suited to its own behaviour. A
> > >> cursory
> > >>>>>> search of the internet didn’t turn up any prior art here, which
> > >> surprises
> > >>>>>> me a bit. I’m wondering if this is worth bringing up with the
> > broader
> > >>>>>> Erlang community.
> > >>>>>>
> > >>>>>> - setting and enforcing timeouts is a healthy pattern for read-only
> > >>>>>> requests as it gives a lot more feedback to clients about the health
> > >> of the
> > >>>>>> server. When it comes to updates things are a little bit more muddy,
> > >> just
> > >>>>>> because there remains a chance that an update can be committed, but
> > >> the
> > >>>>>> caller times out before learning of the successful commit. We should
> > >> try to
> > >>>>>> minimize the likelihood of that occurring.
> > >>>>>>
> > >>>>>> Cheers, Adam
> > >>>>>>
> > >>>>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but
> > of
> > >>>>>> course FDB has a hard 5 second limit on all transactions, so it is a
> > >> bit of
> > >>>>>> a forcing function :).Even putting FoundationDB aside, I would still
> > >> argue
> > >>>>>> to pursue this path based on our Ops experience with the current
> > >> codebase.
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Nick Vatamaniuc <va...@gmail.com>.

Hi everyone,

We partially implement the first part (cleaning rexi workers) for all the
fabric streaming requests. Which should be all_docs, changes, view map,
view reduce:
https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d

The pattern there is the following:

 - With every request spawn a monitoring process that is in charge of
keeping track of all the workers as they are spawned.
 - If regular cleanup takes place, then this monitoring process is killed,
to avoid sending double the number of kill messages to workers.
 - If the coordinating process doesn't run cleanup and just dies, the
monitoring process will performs cleanup on its behalf.

Cheers,
-Nick



On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson <rn...@apache.org>
wrote:

> My view is a) the server was unavailable for this request due to all the
> other requests it’s currently dealing with b) the connection was not idle,
> the client is not at fault.
>
> B.
>
> > On 18 Apr 2019, at 22:03, Done Collectively <sa...@inator.biz> wrote:
> >
> > Any reason 408 would be undesirable?
> >
> > https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408
> >
> >
> > On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rn...@apache.org>
> wrote:
> >
> >> 503 imo.
> >>
> >> --
> >>  Robert Samuel Newson
> >>  rnewson@apache.org
> >>
> >> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
> >>> Yes, we should. Currently it’s a 500, maybe there’s something more
> >> appropriate:
> >>>
> >>>
> >>
> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
> >>>
> >>> Adam
> >>>
> >>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org> wrote:
> >>>>
> >>>> What happens when it turns out the client *hasn't* timed out and we
> >>>> just...hang up on them? Should we consider at least trying to send
> back
> >>>> some sort of HTTP status code?
> >>>>
> >>>> -Joan
> >>>>
> >>>> On 2019-04-18 10:58, Garren Smith wrote:
> >>>>> I'm +1 on this. With partition queries, we added a few more timeouts
> >> that
> >>>>> can be enabled which Cloudant enable. So having the ability to shed
> >> old
> >>>>> requests when these timeouts get hit would be great.
> >>>>>
> >>>>> Cheers
> >>>>> Garren
> >>>>>
> >>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <ko...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> For once, I’m coming to you with a topic that is not strictly about
> >>>>>> FoundationDB :)
> >>>>>>
> >>>>>> CouchDB offers a few config settings (some of them undocumented) to
> >> put a
> >>>>>> limit on how long the server is allowed to take to generate a
> >> response. The
> >>>>>> trouble with many of these timeouts is that, when they fire, they do
> >> not
> >>>>>> actually clean up all of the work that they initiated. A couple of
> >> examples:
> >>>>>>
> >>>>>> - Each HTTP response coordinated by the “fabric” application spawns
> >>>>>> several ephemeral processes via “rexi" on different nodes in the
> >> cluster to
> >>>>>> retrieve data and send it back to the process coordinating the
> >> response. If
> >>>>>> the request timeout fires, the coordinating process will be killed
> >> off, but
> >>>>>> the ephemeral workers might not be. In a healthy cluster they’ll
> >> exit on
> >>>>>> their own when they finish their jobs, but there are conditions
> >> under which
> >>>>>> they can sit around for extended periods of time waiting for an
> >> overloaded
> >>>>>> gen_server (e.g. couch_server) to respond.
> >>>>>>
> >>>>>> - Those named gen_servers (like couch_server) responsible for
> >> serializing
> >>>>>> access to important data structures will dutifully process messages
> >>>>>> received from old requests without any regard for (of even knowledge
> >> of)
> >>>>>> the fact that the client that sent the message timed out long ago.
> >> This can
> >>>>>> lead to a sort of death spiral in which the gen_server is ultimately
> >>>>>> spending ~all of its time serving dead clients and every client is
> >> timing
> >>>>>> out.
> >>>>>>
> >>>>>> I’d like to see us introduce a documented maximum request duration
> >> for all
> >>>>>> requests except the _changes feed, and then use that information to
> >> aid in
> >>>>>> load shedding throughout the stack. We can audit the codebase for
> >>>>>> gen_server calls with long timeouts (I know of a few on the critical
> >> path
> >>>>>> that set their timeouts to `infinity`) and we can design servers
> that
> >>>>>> efficiently drop old requests, knowing that the client who made the
> >> request
> >>>>>> must have timed out. A couple of topics for discussion:
> >>>>>>
> >>>>>> - the “gen_server that sheds old requests” is a very generic
> >> pattern, one
> >>>>>> that seems like it could be well-suited to its own behaviour. A
> >> cursory
> >>>>>> search of the internet didn’t turn up any prior art here, which
> >> surprises
> >>>>>> me a bit. I’m wondering if this is worth bringing up with the
> broader
> >>>>>> Erlang community.
> >>>>>>
> >>>>>> - setting and enforcing timeouts is a healthy pattern for read-only
> >>>>>> requests as it gives a lot more feedback to clients about the health
> >> of the
> >>>>>> server. When it comes to updates things are a little bit more muddy,
> >> just
> >>>>>> because there remains a chance that an update can be committed, but
> >> the
> >>>>>> caller times out before learning of the successful commit. We should
> >> try to
> >>>>>> minimize the likelihood of that occurring.
> >>>>>>
> >>>>>> Cheers, Adam
> >>>>>>
> >>>>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but
> of
> >>>>>> course FDB has a hard 5 second limit on all transactions, so it is a
> >> bit of
> >>>>>> a forcing function :).Even putting FoundationDB aside, I would still
> >> argue
> >>>>>> to pursue this path based on our Ops experience with the current
> >> codebase.
> >>>>>
> >>>>
> >>>
> >>>
> >>
>
>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Robert Samuel Newson <rn...@apache.org>.

My view is a) the server was unavailable for this request due to all the other requests it’s currently dealing with b) the connection was not idle, the client is not at fault.

B.

> On 18 Apr 2019, at 22:03, Done Collectively <sa...@inator.biz> wrote:
> 
> Any reason 408 would be undesirable?
> 
> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408
> 
> 
> On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rn...@apache.org> wrote:
> 
>> 503 imo.
>> 
>> --
>>  Robert Samuel Newson
>>  rnewson@apache.org
>> 
>> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
>>> Yes, we should. Currently it’s a 500, maybe there’s something more
>> appropriate:
>>> 
>>> 
>> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
>>> 
>>> Adam
>>> 
>>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org> wrote:
>>>> 
>>>> What happens when it turns out the client *hasn't* timed out and we
>>>> just...hang up on them? Should we consider at least trying to send back
>>>> some sort of HTTP status code?
>>>> 
>>>> -Joan
>>>> 
>>>> On 2019-04-18 10:58, Garren Smith wrote:
>>>>> I'm +1 on this. With partition queries, we added a few more timeouts
>> that
>>>>> can be enabled which Cloudant enable. So having the ability to shed
>> old
>>>>> requests when these timeouts get hit would be great.
>>>>> 
>>>>> Cheers
>>>>> Garren
>>>>> 
>>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <ko...@apache.org>
>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> For once, I’m coming to you with a topic that is not strictly about
>>>>>> FoundationDB :)
>>>>>> 
>>>>>> CouchDB offers a few config settings (some of them undocumented) to
>> put a
>>>>>> limit on how long the server is allowed to take to generate a
>> response. The
>>>>>> trouble with many of these timeouts is that, when they fire, they do
>> not
>>>>>> actually clean up all of the work that they initiated. A couple of
>> examples:
>>>>>> 
>>>>>> - Each HTTP response coordinated by the “fabric” application spawns
>>>>>> several ephemeral processes via “rexi" on different nodes in the
>> cluster to
>>>>>> retrieve data and send it back to the process coordinating the
>> response. If
>>>>>> the request timeout fires, the coordinating process will be killed
>> off, but
>>>>>> the ephemeral workers might not be. In a healthy cluster they’ll
>> exit on
>>>>>> their own when they finish their jobs, but there are conditions
>> under which
>>>>>> they can sit around for extended periods of time waiting for an
>> overloaded
>>>>>> gen_server (e.g. couch_server) to respond.
>>>>>> 
>>>>>> - Those named gen_servers (like couch_server) responsible for
>> serializing
>>>>>> access to important data structures will dutifully process messages
>>>>>> received from old requests without any regard for (of even knowledge
>> of)
>>>>>> the fact that the client that sent the message timed out long ago.
>> This can
>>>>>> lead to a sort of death spiral in which the gen_server is ultimately
>>>>>> spending ~all of its time serving dead clients and every client is
>> timing
>>>>>> out.
>>>>>> 
>>>>>> I’d like to see us introduce a documented maximum request duration
>> for all
>>>>>> requests except the _changes feed, and then use that information to
>> aid in
>>>>>> load shedding throughout the stack. We can audit the codebase for
>>>>>> gen_server calls with long timeouts (I know of a few on the critical
>> path
>>>>>> that set their timeouts to `infinity`) and we can design servers that
>>>>>> efficiently drop old requests, knowing that the client who made the
>> request
>>>>>> must have timed out. A couple of topics for discussion:
>>>>>> 
>>>>>> - the “gen_server that sheds old requests” is a very generic
>> pattern, one
>>>>>> that seems like it could be well-suited to its own behaviour. A
>> cursory
>>>>>> search of the internet didn’t turn up any prior art here, which
>> surprises
>>>>>> me a bit. I’m wondering if this is worth bringing up with the broader
>>>>>> Erlang community.
>>>>>> 
>>>>>> - setting and enforcing timeouts is a healthy pattern for read-only
>>>>>> requests as it gives a lot more feedback to clients about the health
>> of the
>>>>>> server. When it comes to updates things are a little bit more muddy,
>> just
>>>>>> because there remains a chance that an update can be committed, but
>> the
>>>>>> caller times out before learning of the successful commit. We should
>> try to
>>>>>> minimize the likelihood of that occurring.
>>>>>> 
>>>>>> Cheers, Adam
>>>>>> 
>>>>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but of
>>>>>> course FDB has a hard 5 second limit on all transactions, so it is a
>> bit of
>>>>>> a forcing function :).Even putting FoundationDB aside, I would still
>> argue
>>>>>> to pursue this path based on our Ops experience with the current
>> codebase.
>>>>> 
>>>> 
>>> 
>>> 
>>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Done Collectively <sa...@inator.biz>.

Any reason 408 would be undesirable?

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408


On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rn...@apache.org> wrote:

> 503 imo.
>
> --
>   Robert Samuel Newson
>   rnewson@apache.org
>
> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
> > Yes, we should. Currently it’s a 500, maybe there’s something more
> appropriate:
> >
> >
> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
> >
> > Adam
> >
> > > On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org> wrote:
> > >
> > > What happens when it turns out the client *hasn't* timed out and we
> > > just...hang up on them? Should we consider at least trying to send back
> > > some sort of HTTP status code?
> > >
> > > -Joan
> > >
> > > On 2019-04-18 10:58, Garren Smith wrote:
> > >> I'm +1 on this. With partition queries, we added a few more timeouts
> that
> > >> can be enabled which Cloudant enable. So having the ability to shed
> old
> > >> requests when these timeouts get hit would be great.
> > >>
> > >> Cheers
> > >> Garren
> > >>
> > >> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <ko...@apache.org>
> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> For once, I’m coming to you with a topic that is not strictly about
> > >>> FoundationDB :)
> > >>>
> > >>> CouchDB offers a few config settings (some of them undocumented) to
> put a
> > >>> limit on how long the server is allowed to take to generate a
> response. The
> > >>> trouble with many of these timeouts is that, when they fire, they do
> not
> > >>> actually clean up all of the work that they initiated. A couple of
> examples:
> > >>>
> > >>> - Each HTTP response coordinated by the “fabric” application spawns
> > >>> several ephemeral processes via “rexi" on different nodes in the
> cluster to
> > >>> retrieve data and send it back to the process coordinating the
> response. If
> > >>> the request timeout fires, the coordinating process will be killed
> off, but
> > >>> the ephemeral workers might not be. In a healthy cluster they’ll
> exit on
> > >>> their own when they finish their jobs, but there are conditions
> under which
> > >>> they can sit around for extended periods of time waiting for an
> overloaded
> > >>> gen_server (e.g. couch_server) to respond.
> > >>>
> > >>> - Those named gen_servers (like couch_server) responsible for
> serializing
> > >>> access to important data structures will dutifully process messages
> > >>> received from old requests without any regard for (of even knowledge
> of)
> > >>> the fact that the client that sent the message timed out long ago.
> This can
> > >>> lead to a sort of death spiral in which the gen_server is ultimately
> > >>> spending ~all of its time serving dead clients and every client is
> timing
> > >>> out.
> > >>>
> > >>> I’d like to see us introduce a documented maximum request duration
> for all
> > >>> requests except the _changes feed, and then use that information to
> aid in
> > >>> load shedding throughout the stack. We can audit the codebase for
> > >>> gen_server calls with long timeouts (I know of a few on the critical
> path
> > >>> that set their timeouts to `infinity`) and we can design servers that
> > >>> efficiently drop old requests, knowing that the client who made the
> request
> > >>> must have timed out. A couple of topics for discussion:
> > >>>
> > >>> - the “gen_server that sheds old requests” is a very generic
> pattern, one
> > >>> that seems like it could be well-suited to its own behaviour. A
> cursory
> > >>> search of the internet didn’t turn up any prior art here, which
> surprises
> > >>> me a bit. I’m wondering if this is worth bringing up with the broader
> > >>> Erlang community.
> > >>>
> > >>> - setting and enforcing timeouts is a healthy pattern for read-only
> > >>> requests as it gives a lot more feedback to clients about the health
> of the
> > >>> server. When it comes to updates things are a little bit more muddy,
> just
> > >>> because there remains a chance that an update can be committed, but
> the
> > >>> caller times out before learning of the successful commit. We should
> try to
> > >>> minimize the likelihood of that occurring.
> > >>>
> > >>> Cheers, Adam
> > >>>
> > >>> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but of
> > >>> course FDB has a hard 5 second limit on all transactions, so it is a
> bit of
> > >>> a forcing function :).Even putting FoundationDB aside, I would still
> argue
> > >>> to pursue this path based on our Ops experience with the current
> codebase.
> > >>
> > >
> >
> >
>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Robert Newson <rn...@apache.org>.

503 imo.

-- 
  Robert Samuel Newson
  rnewson@apache.org

On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
> Yes, we should. Currently it’s a 500, maybe there’s something more appropriate:
> 
> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
> 
> Adam
> 
> > On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org> wrote:
> > 
> > What happens when it turns out the client *hasn't* timed out and we
> > just...hang up on them? Should we consider at least trying to send back
> > some sort of HTTP status code?
> > 
> > -Joan
> > 
> > On 2019-04-18 10:58, Garren Smith wrote:
> >> I'm +1 on this. With partition queries, we added a few more timeouts that
> >> can be enabled which Cloudant enable. So having the ability to shed old
> >> requests when these timeouts get hit would be great.
> >> 
> >> Cheers
> >> Garren
> >> 
> >> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <ko...@apache.org> wrote:
> >> 
> >>> Hi all,
> >>> 
> >>> For once, I’m coming to you with a topic that is not strictly about
> >>> FoundationDB :)
> >>> 
> >>> CouchDB offers a few config settings (some of them undocumented) to put a
> >>> limit on how long the server is allowed to take to generate a response. The
> >>> trouble with many of these timeouts is that, when they fire, they do not
> >>> actually clean up all of the work that they initiated. A couple of examples:
> >>> 
> >>> - Each HTTP response coordinated by the “fabric” application spawns
> >>> several ephemeral processes via “rexi" on different nodes in the cluster to
> >>> retrieve data and send it back to the process coordinating the response. If
> >>> the request timeout fires, the coordinating process will be killed off, but
> >>> the ephemeral workers might not be. In a healthy cluster they’ll exit on
> >>> their own when they finish their jobs, but there are conditions under which
> >>> they can sit around for extended periods of time waiting for an overloaded
> >>> gen_server (e.g. couch_server) to respond.
> >>> 
> >>> - Those named gen_servers (like couch_server) responsible for serializing
> >>> access to important data structures will dutifully process messages
> >>> received from old requests without any regard for (of even knowledge of)
> >>> the fact that the client that sent the message timed out long ago. This can
> >>> lead to a sort of death spiral in which the gen_server is ultimately
> >>> spending ~all of its time serving dead clients and every client is timing
> >>> out.
> >>> 
> >>> I’d like to see us introduce a documented maximum request duration for all
> >>> requests except the _changes feed, and then use that information to aid in
> >>> load shedding throughout the stack. We can audit the codebase for
> >>> gen_server calls with long timeouts (I know of a few on the critical path
> >>> that set their timeouts to `infinity`) and we can design servers that
> >>> efficiently drop old requests, knowing that the client who made the request
> >>> must have timed out. A couple of topics for discussion:
> >>> 
> >>> - the “gen_server that sheds old requests” is a very generic pattern, one
> >>> that seems like it could be well-suited to its own behaviour. A cursory
> >>> search of the internet didn’t turn up any prior art here, which surprises
> >>> me a bit. I’m wondering if this is worth bringing up with the broader
> >>> Erlang community.
> >>> 
> >>> - setting and enforcing timeouts is a healthy pattern for read-only
> >>> requests as it gives a lot more feedback to clients about the health of the
> >>> server. When it comes to updates things are a little bit more muddy, just
> >>> because there remains a chance that an update can be committed, but the
> >>> caller times out before learning of the successful commit. We should try to
> >>> minimize the likelihood of that occurring.
> >>> 
> >>> Cheers, Adam
> >>> 
> >>> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but of
> >>> course FDB has a hard 5 second limit on all transactions, so it is a bit of
> >>> a forcing function :).Even putting FoundationDB aside, I would still argue
> >>> to pursue this path based on our Ops experience with the current codebase.
> >> 
> > 
> 
>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Adam Kocoloski <ko...@apache.org>.

Yes, we should. Currently it’s a 500, maybe there’s something more appropriate:

https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949

Adam

> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wo...@apache.org> wrote:
> 
> What happens when it turns out the client *hasn't* timed out and we
> just...hang up on them? Should we consider at least trying to send back
> some sort of HTTP status code?
> 
> -Joan
> 
> On 2019-04-18 10:58, Garren Smith wrote:
>> I'm +1 on this. With partition queries, we added a few more timeouts that
>> can be enabled which Cloudant enable. So having the ability to shed old
>> requests when these timeouts get hit would be great.
>> 
>> Cheers
>> Garren
>> 
>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <ko...@apache.org> wrote:
>> 
>>> Hi all,
>>> 
>>> For once, I’m coming to you with a topic that is not strictly about
>>> FoundationDB :)
>>> 
>>> CouchDB offers a few config settings (some of them undocumented) to put a
>>> limit on how long the server is allowed to take to generate a response. The
>>> trouble with many of these timeouts is that, when they fire, they do not
>>> actually clean up all of the work that they initiated. A couple of examples:
>>> 
>>> - Each HTTP response coordinated by the “fabric” application spawns
>>> several ephemeral processes via “rexi" on different nodes in the cluster to
>>> retrieve data and send it back to the process coordinating the response. If
>>> the request timeout fires, the coordinating process will be killed off, but
>>> the ephemeral workers might not be. In a healthy cluster they’ll exit on
>>> their own when they finish their jobs, but there are conditions under which
>>> they can sit around for extended periods of time waiting for an overloaded
>>> gen_server (e.g. couch_server) to respond.
>>> 
>>> - Those named gen_servers (like couch_server) responsible for serializing
>>> access to important data structures will dutifully process messages
>>> received from old requests without any regard for (of even knowledge of)
>>> the fact that the client that sent the message timed out long ago. This can
>>> lead to a sort of death spiral in which the gen_server is ultimately
>>> spending ~all of its time serving dead clients and every client is timing
>>> out.
>>> 
>>> I’d like to see us introduce a documented maximum request duration for all
>>> requests except the _changes feed, and then use that information to aid in
>>> load shedding throughout the stack. We can audit the codebase for
>>> gen_server calls with long timeouts (I know of a few on the critical path
>>> that set their timeouts to `infinity`) and we can design servers that
>>> efficiently drop old requests, knowing that the client who made the request
>>> must have timed out. A couple of topics for discussion:
>>> 
>>> - the “gen_server that sheds old requests” is a very generic pattern, one
>>> that seems like it could be well-suited to its own behaviour. A cursory
>>> search of the internet didn’t turn up any prior art here, which surprises
>>> me a bit. I’m wondering if this is worth bringing up with the broader
>>> Erlang community.
>>> 
>>> - setting and enforcing timeouts is a healthy pattern for read-only
>>> requests as it gives a lot more feedback to clients about the health of the
>>> server. When it comes to updates things are a little bit more muddy, just
>>> because there remains a chance that an update can be committed, but the
>>> caller times out before learning of the successful commit. We should try to
>>> minimize the likelihood of that occurring.
>>> 
>>> Cheers, Adam
>>> 
>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but of
>>> course FDB has a hard 5 second limit on all transactions, so it is a bit of
>>> a forcing function :).Even putting FoundationDB aside, I would still argue
>>> to pursue this path based on our Ops experience with the current codebase.
>> 
>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Joan Touzet <wo...@apache.org>.

What happens when it turns out the client *hasn't* timed out and we
just...hang up on them? Should we consider at least trying to send back
some sort of HTTP status code?

-Joan

On 2019-04-18 10:58, Garren Smith wrote:
> I'm +1 on this. With partition queries, we added a few more timeouts that
> can be enabled which Cloudant enable. So having the ability to shed old
> requests when these timeouts get hit would be great.
> 
> Cheers
> Garren
> 
> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <ko...@apache.org> wrote:
> 
>> Hi all,
>>
>> For once, I’m coming to you with a topic that is not strictly about
>> FoundationDB :)
>>
>> CouchDB offers a few config settings (some of them undocumented) to put a
>> limit on how long the server is allowed to take to generate a response. The
>> trouble with many of these timeouts is that, when they fire, they do not
>> actually clean up all of the work that they initiated. A couple of examples:
>>
>> - Each HTTP response coordinated by the “fabric” application spawns
>> several ephemeral processes via “rexi" on different nodes in the cluster to
>> retrieve data and send it back to the process coordinating the response. If
>> the request timeout fires, the coordinating process will be killed off, but
>> the ephemeral workers might not be. In a healthy cluster they’ll exit on
>> their own when they finish their jobs, but there are conditions under which
>> they can sit around for extended periods of time waiting for an overloaded
>> gen_server (e.g. couch_server) to respond.
>>
>> - Those named gen_servers (like couch_server) responsible for serializing
>> access to important data structures will dutifully process messages
>> received from old requests without any regard for (of even knowledge of)
>> the fact that the client that sent the message timed out long ago. This can
>> lead to a sort of death spiral in which the gen_server is ultimately
>> spending ~all of its time serving dead clients and every client is timing
>> out.
>>
>> I’d like to see us introduce a documented maximum request duration for all
>> requests except the _changes feed, and then use that information to aid in
>> load shedding throughout the stack. We can audit the codebase for
>> gen_server calls with long timeouts (I know of a few on the critical path
>> that set their timeouts to `infinity`) and we can design servers that
>> efficiently drop old requests, knowing that the client who made the request
>> must have timed out. A couple of topics for discussion:
>>
>> - the “gen_server that sheds old requests” is a very generic pattern, one
>> that seems like it could be well-suited to its own behaviour. A cursory
>> search of the internet didn’t turn up any prior art here, which surprises
>> me a bit. I’m wondering if this is worth bringing up with the broader
>> Erlang community.
>>
>> - setting and enforcing timeouts is a healthy pattern for read-only
>> requests as it gives a lot more feedback to clients about the health of the
>> server. When it comes to updates things are a little bit more muddy, just
>> because there remains a chance that an update can be committed, but the
>> caller times out before learning of the successful commit. We should try to
>> minimize the likelihood of that occurring.
>>
>> Cheers, Adam
>>
>> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but of
>> course FDB has a hard 5 second limit on all transactions, so it is a bit of
>> a forcing function :).Even putting FoundationDB aside, I would still argue
>> to pursue this path based on our Ops experience with the current codebase.
>

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Garren Smith <ga...@apache.org>.

I'm +1 on this. With partition queries, we added a few more timeouts that
can be enabled which Cloudant enable. So having the ability to shed old
requests when these timeouts get hit would be great.

Cheers
Garren

On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <ko...@apache.org> wrote:

> Hi all,
>
> For once, I’m coming to you with a topic that is not strictly about
> FoundationDB :)
>
> CouchDB offers a few config settings (some of them undocumented) to put a
> limit on how long the server is allowed to take to generate a response. The
> trouble with many of these timeouts is that, when they fire, they do not
> actually clean up all of the work that they initiated. A couple of examples:
>
> - Each HTTP response coordinated by the “fabric” application spawns
> several ephemeral processes via “rexi" on different nodes in the cluster to
> retrieve data and send it back to the process coordinating the response. If
> the request timeout fires, the coordinating process will be killed off, but
> the ephemeral workers might not be. In a healthy cluster they’ll exit on
> their own when they finish their jobs, but there are conditions under which
> they can sit around for extended periods of time waiting for an overloaded
> gen_server (e.g. couch_server) to respond.
>
> - Those named gen_servers (like couch_server) responsible for serializing
> access to important data structures will dutifully process messages
> received from old requests without any regard for (of even knowledge of)
> the fact that the client that sent the message timed out long ago. This can
> lead to a sort of death spiral in which the gen_server is ultimately
> spending ~all of its time serving dead clients and every client is timing
> out.
>
> I’d like to see us introduce a documented maximum request duration for all
> requests except the _changes feed, and then use that information to aid in
> load shedding throughout the stack. We can audit the codebase for
> gen_server calls with long timeouts (I know of a few on the critical path
> that set their timeouts to `infinity`) and we can design servers that
> efficiently drop old requests, knowing that the client who made the request
> must have timed out. A couple of topics for discussion:
>
> - the “gen_server that sheds old requests” is a very generic pattern, one
> that seems like it could be well-suited to its own behaviour. A cursory
> search of the internet didn’t turn up any prior art here, which surprises
> me a bit. I’m wondering if this is worth bringing up with the broader
> Erlang community.
>
> - setting and enforcing timeouts is a healthy pattern for read-only
> requests as it gives a lot more feedback to clients about the health of the
> server. When it comes to updates things are a little bit more muddy, just
> because there remains a chance that an update can be committed, but the
> caller times out before learning of the successful commit. We should try to
> minimize the likelihood of that occurring.
>
> Cheers, Adam
>
> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but of
> course FDB has a hard 5 second limit on all transactions, so it is a bit of
> a forcing function :).Even putting FoundationDB aside, I would still argue
> to pursue this path based on our Ops experience with the current codebase.

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

Posted by Jan Lehnardt <ja...@apache.org>.

Hey all,

I like the discussion here and the suggestion too, being a lot more predictable about timeouts will be great if only from an error message perspective, where currently we need a seance to find out what really caused a fabric_timeout, and/or spurious rpc errors that are tough to figure out in retrospect.

I worry a little bit about going straight to the FDB 5s timeout (on the other hand, why not, if we can), but having a more global way to manage timeouts would help a lot.

That said, and this is definitely applicable to FDB as well, with this, folks won’t be able to stream an entire and consistent(ish) snapshot of _all_docs or _changes someplace else. Of course today, if they get interrupted somewhere, only _changes can be picked up, albeit with increased complexity on the reading end to handle repeats and dupes. But: in general, being able to stream a full set of _all_docs/_changes as it is materialised on disk, even across compactions using UNIX file handle conventions (the delete will only be carried out once the last fd to the file closes) is _extremely_ neat and used quite a bit.

I realise with FDB that’s all gone if only for the 5s window, although the rumours about the next gen storage engine might allow consistent resume (up to a point) because everything there is serialised on disk as well, so that’s worth digging into.

With all this, all I’m asking here, or to be discussed later, how we advise our users to get a full export of their database in at least a somewhat consistent manner. I care less about specific semantics, i.e. in 1.x it was the consistency at the time of the read starting, I’d be fine with consistency at the time of the read ending e.g..

Jan “hopes he used i.e. and e.g. correctly” L
_

> On 16. Apr 2019, at 02:40, Adam Kocoloski <ko...@apache.org> wrote:
> 
> Hi all,
> 
> For once, I’m coming to you with a topic that is not strictly about FoundationDB :)
> 
> CouchDB offers a few config settings (some of them undocumented) to put a limit on how long the server is allowed to take to generate a response. The trouble with many of these timeouts is that, when they fire, they do not actually clean up all of the work that they initiated. A couple of examples:
> 
> - Each HTTP response coordinated by the “fabric” application spawns several ephemeral processes via “rexi" on different nodes in the cluster to retrieve data and send it back to the process coordinating the response. If the request timeout fires, the coordinating process will be killed off, but the ephemeral workers might not be. In a healthy cluster they’ll exit on their own when they finish their jobs, but there are conditions under which they can sit around for extended periods of time waiting for an overloaded gen_server (e.g. couch_server) to respond.
> 
> - Those named gen_servers (like couch_server) responsible for serializing access to important data structures will dutifully process messages received from old requests without any regard for (of even knowledge of) the fact that the client that sent the message timed out long ago. This can lead to a sort of death spiral in which the gen_server is ultimately spending ~all of its time serving dead clients and every client is timing out.
> 
> I’d like to see us introduce a documented maximum request duration for all requests except the _changes feed, and then use that information to aid in load shedding throughout the stack. We can audit the codebase for gen_server calls with long timeouts (I know of a few on the critical path that set their timeouts to `infinity`) and we can design servers that efficiently drop old requests, knowing that the client who made the request must have timed out. A couple of topics for discussion:
> 
> - the “gen_server that sheds old requests” is a very generic pattern, one that seems like it could be well-suited to its own behaviour. A cursory search of the internet didn’t turn up any prior art here, which surprises me a bit. I’m wondering if this is worth bringing up with the broader Erlang community.
> 
> - setting and enforcing timeouts is a healthy pattern for read-only requests as it gives a lot more feedback to clients about the health of the server. When it comes to updates things are a little bit more muddy, just because there remains a chance that an update can be committed, but the caller times out before learning of the successful commit. We should try to minimize the likelihood of that occurring.
> 
> Cheers, Adam
> 
> P.S. I did say that this wasn’t _strictly_ about FoundationDB, but of course FDB has a hard 5 second limit on all transactions, so it is a bit of a forcing function :).Even putting FoundationDB aside, I would still argue to pursue this path based on our Ops experience with the current codebase.

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/