You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/03/09 06:17:15 UTC

[GitHub] ivankelly opened a new issue #570: Multiple active entrylogs

ivankelly opened a new issue #570: Multiple active entrylogs
URL: https://github.com/apache/bookkeeper/issues/570

JIRA: https://issues.apache.org/jira/browse/BOOKKEEPER-1041

Reporter: Venkateswararao Jujjuri (JV) @jvrao

Current bookkeeper is tuned for rotational HDDs. It has one active entrylog, and all the ledger/entries go to the same entrylog until it is rotated out. This is perfect for HDDs as seeks and moving head allover the disk platter is very expensive. But this is very inefficient for SSDs, as each SSD can handle multiple parallel writers, also this method is extremely inefficient for compaction as it causes write amplification and inefficient disk space usage.

Our proposal is to have multiple active entrylogs and a configuration param on how many parallel entrylogs the system can have. This way one can have ability to configure to have less (may be one) ledger per entrylog.

### Comments from JIRA

---
*Enrico Olivelli* 2017-04-20T07:28:28.619+0000

{quote}
But this is very inefficient for HDDs
{quote}

[~jujjuri]
did you mean SSD ?

---
*Venkateswararao Jujjuri (JV)* 2017-04-21T16:42:29.203+0000

Yes I mean SSD. Corrected. Thanks.

---
*Charan Reddy Guttapalem* 2017-05-18T15:01:22.566+0000

Issue:

In Bookie's EntryLogger, we are having only one current active entryLog and all the ledger/entries go to the same entrylog. This is perfect for HDDs as file syncs, seeks and moving head allover the disk platter is very expensive. But having single active Entry Log is inefficient for SSDs, as each SSD can handle multiple parallel writers. Also, having single active EntryLog (irrespective of LedgerStorage type - interleaved/sorted), is inefficient for compaction, since entries of multiple ledgers will end up in an entrylog.

Also in SortedLedgerStorage , in the addEntry request we flush EntryMemtable, if it reaches the sizelimit. Because of this we are observing unpredictable tail latency for addEntry request. When EntryMemTable snapshot of size (64 MB) is flushed all at once, this may affect the journal addentry latency. Also, if the rate of new add requests surpasses the rate at which the EntryMemTable's previous snapshot is flushed, then at a point the current EntryMemTable map will reach the limit and since the previous snapshot flush is in progress, EntryMemTable will throttle new addRequests, which would affect addEntry latency.

Goals:

The main purpose of this feature is to have efficient Garbagecollection story by minimizing the amount of compactions required and the ability to reclaim the deleted ledger's space quicker. Also with this feature we can lay foreground for switching to InterleavedLedgerStorage from SortedLedgerStorage and get predictable tail latency.

Proposal:

So proposal here is to have multiple active entrylogs. Which will help with compaction performance and make SortedLedgerStorage redundant.

Design Overview:

- is to have server configuration specifying number of active entry logs per ledgerdir.
- for backward compatibility (for existing behaviour) that config can be set to 0.
- round-robin method will be used for choosing the active entry log for the current ledger in EntryLogger.addEntry method
- if the total number of active entrylogs is more than or equal to number of active ledgers, then we get almost exclusivity
- For implementing Round-Robin approach, we need to maintain state information of mapping of ledgerId to SlotId
- there will be numberofledgerdirs*numberofactiveentrylogsperledgerdir slots. a slot is mapped to ledgerdir, but the activeentrylog of that slot will be rotated when it reaches the capacity.
- By knowing the SlotId we can get the corresponding entryLogId associated to that slot.
- If there is no entry for current ledger in the map, then we pick the next in order slot and add the mapping entry to the map.
- Since Bookie won't be informed about the writeclose of the ledger, there is no easy way to know when to remove the mapping entry from the map. Considering it is just <long ledgerid, int slotid> mapentry, we may compromise on evicting policy. We can just use some Cache, which has eviction policy, timebased on last access
- If a ledgerdir becomes full, then all the slots having entrylogs in that ledgerdir, should become inactive. The existing mappings, mappings of active ledgers to these slots (active entrylogs), should be updated to available active slots.
- when ledgerdir becomes writable again, then the slots which were inactive should be made active and become eligible for round-robin distribution
- For this feature I need to make changes to checkpoint logic. Currently with BOOKKEEPER-564 change, we are scheduling checkpoint only when current entrylog file is rotated. So we dont call 'flushCurrentLog' when we checkpoint. But for this feature, since there are going to be multiple active entrylogs, scheduling checkpoint when entrylog file is rotated, is not an option. So I need to call flushCurrentLogs when checkpoint is made for every 'flushinterval' period

---
*Enrico Olivelli* 2017-05-19T06:49:35.104+0000

[~jujjuri] [~reddycharan18@gmail.com]
This sound very interesting. Now I can see clearly way JV wrote on the mailing list that maybe clients could send some hint to the bookies that a ledger has been gracefully deleted/closed

---
*Charan Reddy Guttapalem* 2017-06-02T00:22:59.045+0000

[~sijie@apache.org] created writeup for this work item and discussed about it in last community call (May 18th)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services