You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Ivan Kelly (JIRA)" <ji...@apache.org> on 2012/10/17 18:14:03 UTC

[jira] [Commented] (BOOKKEEPER-432) Improve performance of entry log range read per ledger entries

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477989#comment-13477989 ] 

Ivan Kelly commented on BOOKKEEPER-432:
---------------------------------------

I think 1. can be split into two parts, sorting before flushing, and maintaining skiplists rather than index files. Sorting before flushing should give us a really easy win and shouldn't be very difficult to implement.

For example, lets say we have 1k entries and are writing at 50k entries per second to 1000 ledgers. Lets assume flushing takes 1 second (in reality for 1000 ledgers it'll be more like 2 minutes).

This means the amount of data flushed will be 50MB. As it currently is now, entries will be evenly distributed across this, so for 1000 ledgers, there will be 999KB (999 other 1KB entries) between any 2 entries of the same ledger. This is well outside of the size of the read ahead (128KB), so on average, we have to seek for every single entry. 

By contrast, with sorting, a single seek would get us 50 entries.

I don't think we explicitly need to sort either. We can have a block pool of 50k blocks. Each ledger on the server side has chain of blocks from the pool. When we flush, we flush the blocks from each ledger in order. If we run out of blocks, we flush (without a disk force) a ledger, and return the blocks to the pool.
                
> Improve performance of entry log range read per ledger entries 
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-432
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-432
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>         Environment: Linux
>            Reporter: Yixue (Andrew) Zhu
>              Labels: patch
>
> We observed random I/O reads when some subscribers fall behind (on some topics), as delivery needs to scan the entry logs (thru ledger index), which are interleaved with ledger entries across all ledgers being served.
> Essentially, the ledger index is a non-clustered index. It is not effective when a large number of ledger entries need to be served, which tend to be scattered around due to interleaving.
> Some possible improvements:
> 1. Change the ledger entries buffer to use a SkipList (or other suitable), sorted on (ledger, entry sequence). When the buffer is flushed, the entry log is written out in the already-sorted order. 
> The "active" ledger index can point to the entries buffer (SkipList), and fixed up with entry-log position once latter is persisted.
> Or, the ledger index can be just rebuilt on demand. The entry log file tail can have index attached (light-weight b-tree, similar with big-table). We need to track per ledger which log files contribute entries to it, so that in-memory index can be rebuilt from the tails of corresponding log files.
> 2. Use affinity concept to make ensembles of ledgers (belonging to same topic) as identical as possible. This will help above 1. be more effective.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira