You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/05/17 05:24:01 UTC

[GitHub] dlg99 opened a new pull request #1410: Issue #1409: Added server side backpressure (@bug W-3651831@)

dlg99 opened a new pull request #1410: Issue #1409: Added server side backpressure (@bug W-3651831@)
URL: https://github.com/apache/bookkeeper/pull/1410

Added server side backpressure handling and related unit tests.

Background:

BK’s writes happen in this order on the server side:
First, ledger storage (Interleaved|Sorted), presumably non-blocking to some level.
Second, journal.
Request is finished when journal’s write is fsynced.

Three major moving parts on the server side need to be taken into account:
- Journal; its performance (I/O delays) or journal’s batching delay (mis)configuration affect client latency. Journal has internal batching and separate thread for data fsync/request ack.
- InterleavedLedgerStorage/entry log. It will naturally block request before it reaches journal if blocked on I/O.
- SortedLedgerStorage. Sorted storage puts request into in-memory data structure/SkipList (aka memtable) until it reaches certain limit and flushes it to disk asynchronously.

Implementation:

1. Limit number of requests in progress (separately for reads and writes since they are handled in different thread pools).
Requests in progress (RIPs) are requests being processed by threads in threadpool + requests waiting for the next thread. RIPs lifetime if from the moment it is received/read by netty to the moment response for the request is sent.
Target limit of RIPs (heuristics) is ((number of processing threads)*2+(expected max batch size on journal)), so each thread can have one request to process and the next one waiting.
It is assumed that we have enough memory to keep data for that many requests. It is impossible to estimate size of read request at the moment when it is received anyway.

Limit is configured by setting number of RIPs explicitly in config for the following reasons:
Easier to experiment with different numbers. I.e. we may want to experiment with different number of requests in progress, i.e. ((number of processing threads)+2*(expected max batch size on journal)) or simply (2*(expected max batch size on journal)).
There is an option to run request directly on netty thread so no config parameter to base initial value on and netty’s defaults can change between versions.
Removes need for explicit enable/disable backpressure flag, instead we can set RIPs to zero.

2. Pause netty’s autoread when limit is exceeded to prevent it from pulling more data before we track it as RIP.

3. Limit number of requests in asynchronous write path (LedgerStorage)

InterleavedLedgerStorage will naturally block if write is slowed down due to i.e. fsync.
SortedLedgerStorage has naive implementation of throttling that blocks request for 1ms if checkpoint (memtable flush) is in progress. This is replaced with block until space in memtable is available. The limit is set to 2*(skipListSize) where skipListSize is limit that triggers memtable flush.

4. Sending response must respect netty’s isWritable() flag and wait up to certain timeout, if needed. Drop response after timeout (client will not hear about that request) or close the channel (disconnect will notify the client that responses to requests from that connection will never happen).

Master Issue: #1409

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services