You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bookkeeper.apache.org by Sijie Guo <gu...@gmail.com> on 2013/01/19 08:09:39 UTC

[Discussion] efficient metadata accesses for the applications using bookkeeper

*** this thread is splited from previous thread, focusing on the metadata
accesses discussion for applications using bookkeeper. I will try to make
everything clearly and add the details (although I assumed you should have
knew the details before). If I don't make it clearly, please forgive my
language barrier. ***

The common pattern of all the application used bookkeeper now is using a
extra place to record the ledger id(s) returned by bookkeeper. The recorded
ledger id(s) is contained in application's metadata.

so when application tried to add/read entries from bookkeeper, the metadata
access path would be:

application identifier -> application metadata -> ledger id -> ledger
metadata.

several clarifications here:
1. application identifier: an identifier application used to locate its
metadata. E.g. for Hedwig, it is topic name.
2. all the path would be accessed if you want to add/read entries. so no
lazy loading here.
3. it is a simple abstraction to eliminate details. for most application,
it requires more metadata accesses. E.g. close previous opened ledger and
create a ledger.

And the application operation would trigger following operations as below:

a) application open application metadata and get the ledger id it used.
(read)
b) application open the ledger, bookkeeper accessing its metadata. (read,
writes)
c) application create a new ledger, bookkeeper writing its metadata. (write)
d) application write its application metadata to record the new ledger id.
(write)

so for an application operation, it usually requires at least 2 reads and 4
writes.

obviously, current working style is bad for two reasons.

1) performance concern: too many metadata accesses, hitting the bottleneck
of the system.
2) resource concern: no transaction between c) and d), if application
failed at that point, a zombie ledger is caused (ledger id is leaking
exhausting ledger id space) and metadata resource is wasted (especially, if
the metadata store in zookeeper, it eats up zookeeper memory).

putting performance concern aside for a while and talking about resource
concern first. for ledger id leaking problem, I don't see any solution
could resolve it, referring the discussion in 'ledger id generation'
thread. for metadata problem, it isn't possible to garbage collect them,
since you had to cross comparing application metadata and ledger id lists.
How much percentage of such leaking would happen? I guessed it would
depends on machine failures and the number of application metadata owned by
one machine. But I don't have any number.

go back to performance concern.

application identifier -> application metadata -> ledger id -> ledger
metadata.

there are two dimensions to reduce the metadata accesses complexity:

1) for single metadata access, how to reduce the number of metadata
accesses?
2) for the whole system, how to reduce the number of total metadata
accesses?

Grouping is for 2). An efficient grouping is: after grouping, the metadata
could be reduced or eliminated. otherwise, you still hit same quantity of
 metadata. For bookkeeper, it might be grouped by sharing same ensemble
setting. But for application, it might difficult since their metadata
having different ledger id mapping.

Two different linear complexity makes me attractive thinking on the
problem. so I began thinking of 1).

I tried to compare the metadata between application metadata and ledger
metadata and found: for the applications I knew (e.g. Hedwig and
ManagerLedgers. not very sure about HDFS Namenode, but it would be
similar), they duplicates the metadata works in bookkeeper.

Taking Hedwig as an example, Let's step a bit into their detail metadata
format.

for Hedwig, it stores ledger ranges, which is a mapping between sequence id
and ledger id. E.g.

1 -> ledger x
100 -> ledger y
1000 -> ledger z

for BookKeeper, it stores id ranges as fragments, which is also a mapping
between entry id and ledger fragment. E.g.

0 -> ensemble (a, b, c)
99 -> ensemble (c, d, e)

They use similar metadata information : where Hedwig could find a entry. So
why Hedwig and BookKeeper can't share same metadata? Then we could remove
the mapping between Hedwig metadata and BookKeeper metadata.

A new metadata access flow works as I proposed before :

application identifier -> ledger name -> ledger metadata.

The relationship between application identifier to ledger name could be
computed w/o touch metadata store. E.g. Hedwig topic 'TOPIC' => ledger name
'/hedwig/TOPIC'.

It is still O(N), linear on the number of metadata, but it is better. There
is only one complexity remained.

So it is the time to apply grouping algorithm to group the ledger metadata
to prioritize metadata access penalties. And every application will benefit
the improvements we made by grouping bookkeeper metadata, using the new
metadata access flow.

-Sijie

Re: [Discussion] efficient metadata accesses for the applications using bookkeeper

Posted by Ivan Kelly <iv...@apache.org>.

On Fri, Jan 18, 2013 at 11:09:39PM -0800, Sijie Guo wrote:
> 1) performance concern: too many metadata accesses, hitting the bottleneck
> of the system.
Part of this is the price you pay for having an API. We could have a
completely integrated system, which would reduce everything to the
absolute minimum. However, this wouldn't be components wouldn't be
reusable and would be much more difficult to maintain. Having neatly
separated parts allows us to reason about the components
independently. If BK is dependent on the upper level application for
its ids, this advantage goes away. 

Regarding the bottleneck, this is only hit in the recovery
case. The session fencing proposal is sufficient to avoid it.

> 2) resource concern: no transaction between c) and d), if application
> failed at that point, a zombie ledger is caused (ledger id is leaking
> exhausting ledger id space) and metadata resource is wasted (especially, if
> the metadata store in zookeeper, it eats up zookeeper memory).
Zombies need to be handled at some level of the application anyhow. We
handle them at the bookkeeper layer with garbage collection. There's
no reason why we can't follow the same approach with Hedwig.

In any case, the requirement for this should be small. The number of
leaked ledgers is dependent on the number of crashed bookies. If there
are so many crashed bookies that the amount of zombie metadata causes
problems with zookeeper, then you probably shouldn't be using
zookeeper, but rather something that can scale better to this usecase.

To summarize my views on this.

1) I don't consider the extra metadata writes, a problem in and of
themselves.
2) Session fencing will solve the major performance issues.
3) The other issues can be addressed by changes in hedwig, not in
bookkeeper.
4) A linear performance improvement in one aspect of the API usage
does not justify a paradigm shift in how ledgers are used.

-Ivan

Re: [Discussion] efficient metadata accesses for the applications using bookkeeper

Posted by Sijie Guo <gu...@gmail.com>.

if application runs in rolling ledgers, one application entity (e.g a
Hedwig topic) will use several ledgers. It is a one-to-multiple mapping.

for a one-to-multiple mapping, since application doesn't know how the
ledger id is generated, application has to record this mapping as its
metadata as you said.

To avoid such metadata mapping, you had to reduce 'one-to-multiple' mapping
to 'one-to-one' mapping. After it is 'one-to-one' mapping, you could use a
implicit name mapping to eliminate recording metadata. This is the idea I
introduced 'ledger name' for bookkeeper to achieve implicit name mapping.

The way I reduced the 'one-to-multiple' mapping to 'one-to-one' mapping is
that I deprecated the rolling ledgers style and introduced the operation
're-open' and 'shrink'. So the application sticked on using only one
ledger, 're-open' it to append entries and 'shrink' it when entries
consumed.

-Sijie

On Sat, Jan 19, 2013 at 10:44 AM, Jiannan Wang <ji...@yahoo-inc.com>wrote:

> I roughly know your idea but I have a question: so you mean there will be
> a implicit mapping from application metadata to the ledger id, right? then
> if there are many ledger id, is it the bookie server's responsibility to
> handle it? And it seems we still need metadata for it.
>

Re: [Discussion] efficient metadata accesses for the applications using bookkeeper

Posted by Jiannan Wang <ji...@yahoo-inc.com>.

I can't agree more with Sijie that issues this discussion from an
application's perspective.
I roughly know your idea but I have a question: so you mean there will be
a implicit mapping from application metadata to the ledger id, right? then
if there are many ledger id, is it the bookie server's responsibility to
handle it? And it seems we still need metadata for it.

- Jiannan

On 1/19/13 3:09 PM, "Sijie Guo" <gu...@gmail.com> wrote:

>*** this thread is splited from previous thread, focusing on the metadata
>accesses discussion for applications using bookkeeper. I will try to make
>everything clearly and add the details (although I assumed you should have
>knew the details before). If I don't make it clearly, please forgive my
>language barrier. ***
>
>The common pattern of all the application used bookkeeper now is using a
>extra place to record the ledger id(s) returned by bookkeeper. The
>recorded
>ledger id(s) is contained in application's metadata.
>
>so when application tried to add/read entries from bookkeeper, the
>metadata
>access path would be:
>
>application identifier -> application metadata -> ledger id -> ledger
>metadata.
>
>several clarifications here:
>1. application identifier: an identifier application used to locate its
>metadata. E.g. for Hedwig, it is topic name.
>2. all the path would be accessed if you want to add/read entries. so no
>lazy loading here.
>3. it is a simple abstraction to eliminate details. for most application,
>it requires more metadata accesses. E.g. close previous opened ledger and
>create a ledger.
>
>And the application operation would trigger following operations as below:
>
>a) application open application metadata and get the ledger id it used.
>(read)
>b) application open the ledger, bookkeeper accessing its metadata. (read,
>writes)
>c) application create a new ledger, bookkeeper writing its metadata.
>(write)
>d) application write its application metadata to record the new ledger id.
>(write)
>
>so for an application operation, it usually requires at least 2 reads and
>4
>writes.
>
>obviously, current working style is bad for two reasons.
>
>1) performance concern: too many metadata accesses, hitting the bottleneck
>of the system.
>2) resource concern: no transaction between c) and d), if application
>failed at that point, a zombie ledger is caused (ledger id is leaking
>exhausting ledger id space) and metadata resource is wasted (especially,
>if
>the metadata store in zookeeper, it eats up zookeeper memory).
>
>putting performance concern aside for a while and talking about resource
>concern first. for ledger id leaking problem, I don't see any solution
>could resolve it, referring the discussion in 'ledger id generation'
>thread. for metadata problem, it isn't possible to garbage collect them,
>since you had to cross comparing application metadata and ledger id lists.
>How much percentage of such leaking would happen? I guessed it would
>depends on machine failures and the number of application metadata owned
>by
>one machine. But I don't have any number.
>
>go back to performance concern.
>
>application identifier -> application metadata -> ledger id -> ledger
>metadata.
>
>there are two dimensions to reduce the metadata accesses complexity:
>
>1) for single metadata access, how to reduce the number of metadata
>accesses?
>2) for the whole system, how to reduce the number of total metadata
>accesses?
>
>Grouping is for 2). An efficient grouping is: after grouping, the metadata
>could be reduced or eliminated. otherwise, you still hit same quantity of
> metadata. For bookkeeper, it might be grouped by sharing same ensemble
>setting. But for application, it might difficult since their metadata
>having different ledger id mapping.
>
>Two different linear complexity makes me attractive thinking on the
>problem. so I began thinking of 1).
>
>I tried to compare the metadata between application metadata and ledger
>metadata and found: for the applications I knew (e.g. Hedwig and
>ManagerLedgers. not very sure about HDFS Namenode, but it would be
>similar), they duplicates the metadata works in bookkeeper.
>
>Taking Hedwig as an example, Let's step a bit into their detail metadata
>format.
>
>for Hedwig, it stores ledger ranges, which is a mapping between sequence
>id
>and ledger id. E.g.
>
>1 -> ledger x
>100 -> ledger y
>1000 -> ledger z
>
>for BookKeeper, it stores id ranges as fragments, which is also a mapping
>between entry id and ledger fragment. E.g.
>
>0 -> ensemble (a, b, c)
>99 -> ensemble (c, d, e)
>
>They use similar metadata information : where Hedwig could find a entry.
>So
>why Hedwig and BookKeeper can't share same metadata? Then we could remove
>the mapping between Hedwig metadata and BookKeeper metadata.
>
>A new metadata access flow works as I proposed before :
>
>application identifier -> ledger name -> ledger metadata.
>
>The relationship between application identifier to ledger name could be
>computed w/o touch metadata store. E.g. Hedwig topic 'TOPIC' => ledger
>name
>'/hedwig/TOPIC'.
>
>It is still O(N), linear on the number of metadata, but it is better.
>There
>is only one complexity remained.
>
>So it is the time to apply grouping algorithm to group the ledger metadata
>to prioritize metadata access penalties. And every application will
>benefit
>the improvements we made by grouping bookkeeper metadata, using the new
>metadata access flow.
>
>-Sijie