You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@bookkeeper.apache.org by Whitney Sorenson <ws...@hubspot.com> on 2013/02/04 17:47:06 UTC

Using BK as WAL and accessing ledger metadata

Hey all,

A couple questions about running BK stand-alone:

1) If I call openLedgerNoRecovery am I blocking writes or not? What are the
guarantees I lose - just ordering? Can I use this to essentially read /
tail an active ledger?

2) How can I access BK's metadata so that I can determine a list of
ledgers, and which ledgers are closed/open? It doesn't appear in the client
documentation (
http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/)
Is this not an intended operation? Are clients supposed to track ledger ids
on their own (we are currently doing this but it seems suboptimal)

Thank you;

-Whitney Sorenson
HubSpot

Re: Using BK as WAL and accessing ledger metadata

Posted by Whitney Sorenson <ws...@hubspot.com>.

Thanks Flavio,

We have been considering reading the BK state out of ZK ourselves. I could
see how this data might be available in a round-about (not advised way)
from the BK client. I don't think we would be needing to manipulate it,
because after we have processed the ledgers we delete them. The only
additional state I believe we would need is simply a lock around a ledger
while it is being processed (moved out of BK.)




On Mon, Feb 4, 2013 at 4:08 PM, Flavio Junqueira <fp...@yahoo.com>wrote:

> Hi Whitney,
>
> In general we leave it up to the application to organize the ledgers it
> creates. It is indifferent to bookkeeper which ledgers have been created by
> a single writer and and how the content of ledgers relate. Managing this
> kind of application state is something that zookeeper does well and since
> we assume a zookeeper deployment, the application can use it to manage its
> metadata. Although we typically don't recommend that applications access
> the zookeeper metadata for bookkeeper ledgers, there is nothing really that
> prevents you from doing it. If it is useful for you to read this metadata,
> I don't see a big problem with doing it, although I'd like to stress that I
> find it a bad idea to try to manipulate the zookeeper state for bookkeeper
> ledger directly.
>
> On your point about duplication, don't you need to remember which closed
> ledgers have been already processed? Just knowing the list of closed
> ledgers might not be sufficient. If this is the case, then you need to keep
> some additional metadata on the side.
>
> -Flavio
>
> On Feb 4, 2013, at 7:44 PM, Whitney Sorenson <ws...@hubspot.com>
> wrote:
>
> Thank you for responding.
>
> Forgive me if I'm missing something, but if I have a writer and separate
> readers, why would I want to have to communicate ledger ids between them?
> More specifically, we have a series of writers writing to a write-ahead log
> and a separate set of readers that are consuming these ledgers to move them
> into long term storage and send them to queues / workflows to be processed.
> This means I have to keep the state about which ledgers are available, and
> which are closed, which seems to be a complete duplication of the state
> that is already in BK.
>
> I'm not sure named ledgers are helpful in this situation, except that we
> could keep less state (perhaps a sequential id.)
>
> On Mon, Feb 4, 2013 at 1:27 PM, Sijie Guo <gu...@gmail.com> wrote:
>
>>
>> Hello, Whitney:
>>
>> please check the replies inline.
>>
>> On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> Hey all,
>>>
>>> A couple questions about running BK stand-alone:
>>>
>>> 1) If I call openLedgerNoRecovery am I blocking writes or not? What are
>>> the guarantees I lose - just ordering? Can I use this to essentially read /
>>> tail an active ledger?
>>>
>>
>> open a ledger using openLedgerNoRecovery doesn't block any writes to it.
>> And you don't lose the ordering guarantee. You could use it to read/tail an
>> active ledger, but please keep in mind that you need to call
>> #readLastConfirmed to catch up to the latest confirmed entries added by the
>> writer. And the entries you could read from an openLedgerNoRecovery ledger,
>> is just between 0 and last confirmed.
>>
>> you could check:
>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long,
>> org.apache.bookkeeper.client.BookKeeper.DigestType, byte[],
>> org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object)
>>
>>
>>>
>>> 2) How can I access BK's metadata so that I can determine a list of
>>> ledgers, and which ledgers are closed/open? It doesn't appear in the client
>>> documentation (
>>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/)
>>> Is this not an intended operation? Are clients supposed to track ledger ids
>>> on their own (we are currently doing this but it seems suboptimal)
>>>
>>>
>> currently we don't expose the API for client. Is there any special case
>> you are considering? We'd happy to expose it if necessary.
>>
>>  Since most of the cases are working in following styles: a *standby*
>> writer observes the *active* writer state, if the *active* writer failed,
>> the *standby* writer would take over the responsibility, closed the ledger
>> written by *active* writer, replayed the ledger and created a new ledger to
>> write new entries. For now, clients needs to track ledger ids on their end.
>>
>> There is one proposal working on providing *named* ledgers on top of
>> bookkeeper to ease user's experience tracking ledger ids. You could check :
>> https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are under
>> discussion on whether to provide ledger name internally in bookkeeper for
>> metadata access concerns. We'd like to hear your feedback on the usage of
>> API and make it better.
>>
>>
>>
>>> Thank you;
>>>
>>> -Whitney Sorenson
>>> HubSpot
>>>
>>>
>>
>
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Flavio Junqueira <fp...@yahoo.com>.

Hi Whitney,

In general we leave it up to the application to organize the ledgers it creates. It is indifferent to bookkeeper which ledgers have been created by a single writer and and how the content of ledgers relate. Managing this kind of application state is something that zookeeper does well and since we assume a zookeeper deployment, the application can use it to manage its metadata. Although we typically don't recommend that applications access the zookeeper metadata for bookkeeper ledgers, there is nothing really that prevents you from doing it. If it is useful for you to read this metadata, I don't see a big problem with doing it, although I'd like to stress that I find it a bad idea to try to manipulate the zookeeper state for bookkeeper ledger directly. 

On your point about duplication, don't you need to remember which closed ledgers have been already processed? Just knowing the list of closed ledgers might not be sufficient. If this is the case, then you need to keep some additional metadata on the side.

-Flavio

On Feb 4, 2013, at 7:44 PM, Whitney Sorenson <ws...@hubspot.com> wrote:

> Thank you for responding.
> 
> Forgive me if I'm missing something, but if I have a writer and separate readers, why would I want to have to communicate ledger ids between them? More specifically, we have a series of writers writing to a write-ahead log and a separate set of readers that are consuming these ledgers to move them into long term storage and send them to queues / workflows to be processed. This means I have to keep the state about which ledgers are available, and which are closed, which seems to be a complete duplication of the state that is already in BK.
> 
> I'm not sure named ledgers are helpful in this situation, except that we could keep less state (perhaps a sequential id.)
> 
> On Mon, Feb 4, 2013 at 1:27 PM, Sijie Guo <gu...@gmail.com> wrote:
> 
> Hello, Whitney:
> 
> please check the replies inline.
> 
> On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <ws...@hubspot.com> wrote:
> Hey all,
> 
> A couple questions about running BK stand-alone:
> 
> 1) If I call openLedgerNoRecovery am I blocking writes or not? What are the guarantees I lose - just ordering? Can I use this to essentially read / tail an active ledger?
> 
> open a ledger using openLedgerNoRecovery doesn't block any writes to it. And you don't lose the ordering guarantee. You could use it to read/tail an active ledger, but please keep in mind that you need to call #readLastConfirmed to catch up to the latest confirmed entries added by the writer. And the entries you could read from an openLedgerNoRecovery ledger, is just between 0 and last confirmed. 
> 
> you could check: http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long, org.apache.bookkeeper.client.BookKeeper.DigestType, byte[], org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object) 
>  
> 
> 2) How can I access BK's metadata so that I can determine a list of ledgers, and which ledgers are closed/open? It doesn't appear in the client documentation (http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/) Is this not an intended operation? Are clients supposed to track ledger ids on their own (we are currently doing this but it seems suboptimal)
> 
> 
> currently we don't expose the API for client. Is there any special case you are considering? We'd happy to expose it if necessary.
> 
> Since most of the cases are working in following styles: a *standby* writer observes the *active* writer state, if the *active* writer failed, the *standby* writer would take over the responsibility, closed the ledger written by *active* writer, replayed the ledger and created a new ledger to write new entries. For now, clients needs to track ledger ids on their end.
> 
> There is one proposal working on providing *named* ledgers on top of bookkeeper to ease user's experience tracking ledger ids. You could check : https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are under discussion on whether to provide ledger name internally in bookkeeper for metadata access concerns. We'd like to hear your feedback on the usage of API and make it better.
> 
>  
> Thank you;
> 
> -Whitney Sorenson
> HubSpot
> 
> 
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Flavio Junqueira <fp...@yahoo.com>.

On Feb 6, 2013, at 7:10 AM, Sijie Guo <gu...@gmail.com> wrote:

> 
> so exposing api like LedgerHandle#isClosed() to verify a ledger is closed or not, is it enough for your case?
> 

The advantage I see of doing this is that the application doesn't have to fetch and parse the metadata itself. I think it is a good idea.

-Flavio


> -Sijie
> 
> 
> 
> On Tue, Feb 5, 2013 at 11:11 AM, Whitney Sorenson <ws...@hubspot.com> wrote:
> Sijie,
> 
> The problem is I have many writers (all with their own ledgers.) They are constantly closing and creating new ledgers.
> 
> Then I have many readers which want to read the ledgers. How should the readers know what the ledgers are that exist to be read - they want to read ALL ledgers that are closed, essentially.
> 
> Does this make sense?
> 
> 
> On Tue, Feb 5, 2013 at 12:09 AM, Sijie Guo <gu...@gmail.com> wrote:
> Hello Whitney,
> 
> 
> On Mon, Feb 4, 2013 at 10:44 AM, Whitney Sorenson <ws...@hubspot.com> wrote:
> Thank you for responding.
> 
> Forgive me if I'm missing something, but if I have a writer and separate readers, why would I want to have to communicate ledger ids between them? More specifically, we have a series of writers writing to a write-ahead log and a separate set of readers that are consuming these ledgers to move them into long term storage and send them to queues / workflows to be processed.
> 
> I am just curious about the case you mentioned that you have a series of writers writing to a write-ahead log. If the write-ahead log means a ledger, I couldn't image how you implemented a series of writers writing to it, since bookkeeper just allow one writer writing to a ledger.
> 
> if the write-head log is formed by several ledgers, it means that you might already have a mechanism to map the writer to the ledger, so when a writer #openLedger, it means that the ledger would be closed. basically, the close state could be distinguished by different calls : the ledger handle returned by #createLedger is an opened ledger while the ledger handle returned by #openLedger is a closed ledger. If you want to write entries, you had to create a new ledger. Either the ledger is closed or the writer is crashed, the ledger could not be written again. So I don't think you need to keep the state is your end. If I don't understand your case, please let me know.
>  
> This means I have to keep the state about which ledgers are available, and which are closed, which seems to be a complete duplication of the state that is already in BK.
> 
> I'm not sure named ledgers are helpful in this situation, except that we could keep less state (perhaps a sequential id.)
> 
> On Mon, Feb 4, 2013 at 1:27 PM, Sijie Guo <gu...@gmail.com> wrote:
> 
> Hello, Whitney:
> 
> please check the replies inline.
> 
> On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <ws...@hubspot.com> wrote:
> Hey all,
> 
> A couple questions about running BK stand-alone:
> 
> 1) If I call openLedgerNoRecovery am I blocking writes or not? What are the guarantees I lose - just ordering? Can I use this to essentially read / tail an active ledger?
> 
> open a ledger using openLedgerNoRecovery doesn't block any writes to it. And you don't lose the ordering guarantee. You could use it to read/tail an active ledger, but please keep in mind that you need to call #readLastConfirmed to catch up to the latest confirmed entries added by the writer. And the entries you could read from an openLedgerNoRecovery ledger, is just between 0 and last confirmed. 
> 
> you could check: http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long, org.apache.bookkeeper.client.BookKeeper.DigestType, byte[], org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object) 
>  
> 
> 2) How can I access BK's metadata so that I can determine a list of ledgers, and which ledgers are closed/open? It doesn't appear in the client documentation (http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/) Is this not an intended operation? Are clients supposed to track ledger ids on their own (we are currently doing this but it seems suboptimal)
> 
> 
> currently we don't expose the API for client. Is there any special case you are considering? We'd happy to expose it if necessary.
> 
> Since most of the cases are working in following styles: a *standby* writer observes the *active* writer state, if the *active* writer failed, the *standby* writer would take over the responsibility, closed the ledger written by *active* writer, replayed the ledger and created a new ledger to write new entries. For now, clients needs to track ledger ids on their end.
> 
> There is one proposal working on providing *named* ledgers on top of bookkeeper to ease user's experience tracking ledger ids. You could check : https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are under discussion on whether to provide ledger name internally in bookkeeper for metadata access concerns. We'd like to hear your feedback on the usage of API and make it better.
> 
>  
> Thank you;
> 
> -Whitney Sorenson
> HubSpot
> 
> 
> 
> 
> 
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Ivan Kelly <iv...@apache.org>.

On Thu, Feb 07, 2013 at 09:00:35PM -0800, Sijie Guo wrote:
> I think we already have similar interface in LedgerManager,
> "LedgerRangeIterator getLedgerRanges()". I think it would be quite
> straightforward to expose it.
Yes, we should reuse this. However, from the client viewpoint it would
be better to expose it as a Iterable<Long> getLedgers(); to not burden
the user with the range stuff. Doing this should be straightforward.

-Ivan

Re: Using BK as WAL and accessing ledger metadata

Posted by Sijie Guo <gu...@gmail.com>.

Ivan,

I think we already have similar interface in LedgerManager,
"LedgerRangeIterator getLedgerRanges()". I think it would be quite
straightforward to expose it.

-Sijie


On Wed, Feb 6, 2013 at 7:41 AM, Ivan Kelly <iv...@apache.org> wrote:

> On Wed, Feb 06, 2013 at 10:29:02AM -0500, Whitney Sorenson wrote:
> > I think adding isClosed is a step in the right direction, but I'd
> > also like to see a method like BkAdmin#getLedgerIds - that way I
> > don't need to track the list of ledger ids in zk myself.
> We already have a jira for this
> https://issues.apache.org/jira/browse/BOOKKEEPER-257
>
> A scanner type interface would be best for this though, so that if
> there's a lot of ledger ids, we wouldn't run out of memory etc.
>
> -Ivan
>
>
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Ivan Kelly <iv...@apache.org>.

Yes, a jira is the best place to discuss it. Set fix version to
4.3.0. It would also be good to have some info of the driving usecase
on the JIRA.

-Ivan

On Thu, Feb 07, 2013 at 11:38:46AM -0500, Whitney Sorenson wrote:
> I will look forward to this addition, then. Do you need a jira to track the
> possible addition of isClosed() on a ledger handle?
> 
> 
> On Wed, Feb 6, 2013 at 10:41 AM, Ivan Kelly <iv...@apache.org> wrote:
> 
> > On Wed, Feb 06, 2013 at 10:29:02AM -0500, Whitney Sorenson wrote:
> > > I think adding isClosed is a step in the right direction, but I'd
> > > also like to see a method like BkAdmin#getLedgerIds - that way I
> > > don't need to track the list of ledger ids in zk myself.
> > We already have a jira for this
> > https://issues.apache.org/jira/browse/BOOKKEEPER-257
> >
> > A scanner type interface would be best for this though, so that if
> > there's a lot of ledger ids, we wouldn't run out of memory etc.
> >
> > -Ivan
> >
> >
> >

Re: Using BK as WAL and accessing ledger metadata

Posted by Whitney Sorenson <ws...@hubspot.com>.

I will look forward to this addition, then. Do you need a jira to track the
possible addition of isClosed() on a ledger handle?


On Wed, Feb 6, 2013 at 10:41 AM, Ivan Kelly <iv...@apache.org> wrote:

> On Wed, Feb 06, 2013 at 10:29:02AM -0500, Whitney Sorenson wrote:
> > I think adding isClosed is a step in the right direction, but I'd
> > also like to see a method like BkAdmin#getLedgerIds - that way I
> > don't need to track the list of ledger ids in zk myself.
> We already have a jira for this
> https://issues.apache.org/jira/browse/BOOKKEEPER-257
>
> A scanner type interface would be best for this though, so that if
> there's a lot of ledger ids, we wouldn't run out of memory etc.
>
> -Ivan
>
>
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Ivan Kelly <iv...@apache.org>.

On Wed, Feb 06, 2013 at 10:29:02AM -0500, Whitney Sorenson wrote:
> I think adding isClosed is a step in the right direction, but I'd
> also like to see a method like BkAdmin#getLedgerIds - that way I
> don't need to track the list of ledger ids in zk myself. 
We already have a jira for this
https://issues.apache.org/jira/browse/BOOKKEEPER-257

A scanner type interface would be best for this though, so that if
there's a lot of ledger ids, we wouldn't run out of memory etc.

-Ivan

Re: Using BK as WAL and accessing ledger metadata

Posted by Whitney Sorenson <ws...@hubspot.com>.

I think adding isClosed is a step in the right direction, but I'd also like to see a method like BkAdmin#getLedgerIds - that way I don't need to track the list of ledger ids in zk myself.

On Feb 6, 2013, at 1:10 AM, Sijie Guo <gu...@gmail.com> wrote:

> Whitney,
> 
> Thanks for replies.
> 
> I think the case depends on how you reader consume the entries in the ledgers.
> 
> 1) if a reader consumes the entries added by a writer in order, so you just keep a ledger id list, for the last ledger, it is an writing ledger, you should use #openLedgerNoRecovery to read, for the ledgers created before last ledger, they are closed ledgers, you could #openLedger to read.
> 
> 2) if the readers just randomly pick a ledger to read, you might need to check the state.
> 
> 
> so exposing api like LedgerHandle#isClosed() to verify a ledger is closed or not, is it enough for your case?
> 
> -Sijie
> 
> 
> 
> On Tue, Feb 5, 2013 at 11:11 AM, Whitney Sorenson <ws...@hubspot.com> wrote:
> Sijie,
> 
> The problem is I have many writers (all with their own ledgers.) They are constantly closing and creating new ledgers.
> 
> Then I have many readers which want to read the ledgers. How should the readers know what the ledgers are that exist to be read - they want to read ALL ledgers that are closed, essentially.
> 
> Does this make sense?
> 
> 
> On Tue, Feb 5, 2013 at 12:09 AM, Sijie Guo <gu...@gmail.com> wrote:
> Hello Whitney,
> 
> 
> On Mon, Feb 4, 2013 at 10:44 AM, Whitney Sorenson <ws...@hubspot.com> wrote:
> Thank you for responding.
> 
> Forgive me if I'm missing something, but if I have a writer and separate readers, why would I want to have to communicate ledger ids between them? More specifically, we have a series of writers writing to a write-ahead log and a separate set of readers that are consuming these ledgers to move them into long term storage and send them to queues / workflows to be processed.
> 
> I am just curious about the case you mentioned that you have a series of writers writing to a write-ahead log. If the write-ahead log means a ledger, I couldn't image how you implemented a series of writers writing to it, since bookkeeper just allow one writer writing to a ledger.
> 
> if the write-head log is formed by several ledgers, it means that you might already have a mechanism to map the writer to the ledger, so when a writer #openLedger, it means that the ledger would be closed. basically, the close state could be distinguished by different calls : the ledger handle returned by #createLedger is an opened ledger while the ledger handle returned by #openLedger is a closed ledger. If you want to write entries, you had to create a new ledger. Either the ledger is closed or the writer is crashed, the ledger could not be written again. So I don't think you need to keep the state is your end. If I don't understand your case, please let me know.
>  
> This means I have to keep the state about which ledgers are available, and which are closed, which seems to be a complete duplication of the state that is already in BK.
> 
> I'm not sure named ledgers are helpful in this situation, except that we could keep less state (perhaps a sequential id.)
> 
> On Mon, Feb 4, 2013 at 1:27 PM, Sijie Guo <gu...@gmail.com> wrote:
> 
> Hello, Whitney:
> 
> please check the replies inline.
> 
> On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <ws...@hubspot.com> wrote:
> Hey all,
> 
> A couple questions about running BK stand-alone:
> 
> 1) If I call openLedgerNoRecovery am I blocking writes or not? What are the guarantees I lose - just ordering? Can I use this to essentially read / tail an active ledger?
> 
> open a ledger using openLedgerNoRecovery doesn't block any writes to it. And you don't lose the ordering guarantee. You could use it to read/tail an active ledger, but please keep in mind that you need to call #readLastConfirmed to catch up to the latest confirmed entries added by the writer. And the entries you could read from an openLedgerNoRecovery ledger, is just between 0 and last confirmed. 
> 
> you could check: http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long, org.apache.bookkeeper.client.BookKeeper.DigestType, byte[], org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object) 
>  
> 
> 2) How can I access BK's metadata so that I can determine a list of ledgers, and which ledgers are closed/open? It doesn't appear in the client documentation (http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/) Is this not an intended operation? Are clients supposed to track ledger ids on their own (we are currently doing this but it seems suboptimal)
> 
> 
> currently we don't expose the API for client. Is there any special case you are considering? We'd happy to expose it if necessary.
> 
> Since most of the cases are working in following styles: a *standby* writer observes the *active* writer state, if the *active* writer failed, the *standby* writer would take over the responsibility, closed the ledger written by *active* writer, replayed the ledger and created a new ledger to write new entries. For now, clients needs to track ledger ids on their end.
> 
> There is one proposal working on providing *named* ledgers on top of bookkeeper to ease user's experience tracking ledger ids. You could check : https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are under discussion on whether to provide ledger name internally in bookkeeper for metadata access concerns. We'd like to hear your feedback on the usage of API and make it better.
> 
>  
> Thank you;
> 
> -Whitney Sorenson
> HubSpot
> 
> 
> 
> 
> 
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Sijie Guo <gu...@gmail.com>.

Whitney,

Thanks for replies.

I think the case depends on how you reader consume the entries in the
ledgers.

1) if a reader consumes the entries added by a writer in order, so you just
keep a ledger id list, for the last ledger, it is an writing ledger, you
should use #openLedgerNoRecovery to read, for the ledgers created before
last ledger, they are closed ledgers, you could #openLedger to read.

2) if the readers just randomly pick a ledger to read, you might need to
check the state.


so exposing api like LedgerHandle#isClosed() to verify a ledger is closed
or not, is it enough for your case?

-Sijie



On Tue, Feb 5, 2013 at 11:11 AM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Sijie,
>
> The problem is I have many writers (all with their own ledgers.) They are
> constantly closing and creating new ledgers.
>
> Then I have many readers which want to read the ledgers. How should the
> readers know what the ledgers are that exist to be read - they want to read
> ALL ledgers that are closed, essentially.
>
> Does this make sense?
>
>
> On Tue, Feb 5, 2013 at 12:09 AM, Sijie Guo <gu...@gmail.com> wrote:
>
>> Hello Whitney,
>>
>>
>> On Mon, Feb 4, 2013 at 10:44 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> Thank you for responding.
>>>
>>> Forgive me if I'm missing something, but if I have a writer and separate
>>> readers, why would I want to have to communicate ledger ids between them?
>>> More specifically, we have a series of writers writing to a write-ahead log
>>> and a separate set of readers that are consuming these ledgers to move them
>>> into long term storage and send them to queues / workflows to be processed.
>>>
>>
>> I am just curious about the case you mentioned that you have a series of
>> writers writing to a write-ahead log. If the write-ahead log means a
>> ledger, I couldn't image how you implemented a series of writers writing to
>> it, since bookkeeper just allow one writer writing to a ledger.
>>
>> if the write-head log is formed by several ledgers, it means that you
>> might already have a mechanism to map the writer to the ledger, so when a
>> writer #openLedger, it means that the ledger would be closed. basically,
>> the close state could be distinguished by different calls : the ledger
>> handle returned by #createLedger is an opened ledger while the ledger
>> handle returned by #openLedger is a closed ledger. If you want to write
>> entries, you had to create a new ledger. Either the ledger is closed or the
>> writer is crashed, the ledger could not be written again. So I don't think
>> you need to keep the state is your end. If I don't understand your case,
>> please let me know.
>>
>>
>>> This means I have to keep the state about which ledgers are available,
>>> and which are closed, which seems to be a complete duplication of the state
>>> that is already in BK.
>>>
>>> I'm not sure named ledgers are helpful in this situation, except that we
>>> could keep less state (perhaps a sequential id.)
>>>
>>> On Mon, Feb 4, 2013 at 1:27 PM, Sijie Guo <gu...@gmail.com> wrote:
>>>
>>>>
>>>> Hello, Whitney:
>>>>
>>>> please check the replies inline.
>>>>
>>>> On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <wsorenson@hubspot.com
>>>> > wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> A couple questions about running BK stand-alone:
>>>>>
>>>>> 1) If I call openLedgerNoRecovery am I blocking writes or not? What
>>>>> are the guarantees I lose - just ordering? Can I use this to essentially
>>>>> read / tail an active ledger?
>>>>>
>>>>
>>>> open a ledger using openLedgerNoRecovery doesn't block any writes to
>>>> it. And you don't lose the ordering guarantee. You could use it to
>>>> read/tail an active ledger, but please keep in mind that you need to call
>>>> #readLastConfirmed to catch up to the latest confirmed entries added by the
>>>> writer. And the entries you could read from an openLedgerNoRecovery ledger,
>>>> is just between 0 and last confirmed.
>>>>
>>>> you could check:
>>>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long,
>>>> org.apache.bookkeeper.client.BookKeeper.DigestType, byte[],
>>>> org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object)
>>>>
>>>>
>>>>>
>>>>> 2) How can I access BK's metadata so that I can determine a list of
>>>>> ledgers, and which ledgers are closed/open? It doesn't appear in the client
>>>>> documentation (
>>>>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/)
>>>>> Is this not an intended operation? Are clients supposed to track ledger ids
>>>>> on their own (we are currently doing this but it seems suboptimal)
>>>>>
>>>>>
>>>> currently we don't expose the API for client. Is there any special case
>>>> you are considering? We'd happy to expose it if necessary.
>>>>
>>>>  Since most of the cases are working in following styles: a *standby*
>>>> writer observes the *active* writer state, if the *active* writer failed,
>>>> the *standby* writer would take over the responsibility, closed the ledger
>>>> written by *active* writer, replayed the ledger and created a new ledger to
>>>> write new entries. For now, clients needs to track ledger ids on their end.
>>>>
>>>> There is one proposal working on providing *named* ledgers on top of
>>>> bookkeeper to ease user's experience tracking ledger ids. You could check :
>>>> https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are
>>>> under discussion on whether to provide ledger name internally in bookkeeper
>>>> for metadata access concerns. We'd like to hear your feedback on the usage
>>>> of API and make it better.
>>>>
>>>>
>>>>
>>>>> Thank you;
>>>>>
>>>>> -Whitney Sorenson
>>>>> HubSpot
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Whitney Sorenson <ws...@hubspot.com>.

Sijie,

The problem is I have many writers (all with their own ledgers.) They are
constantly closing and creating new ledgers.

Then I have many readers which want to read the ledgers. How should the
readers know what the ledgers are that exist to be read - they want to read
ALL ledgers that are closed, essentially.

Does this make sense?


On Tue, Feb 5, 2013 at 12:09 AM, Sijie Guo <gu...@gmail.com> wrote:

> Hello Whitney,
>
>
> On Mon, Feb 4, 2013 at 10:44 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> Thank you for responding.
>>
>> Forgive me if I'm missing something, but if I have a writer and separate
>> readers, why would I want to have to communicate ledger ids between them?
>> More specifically, we have a series of writers writing to a write-ahead log
>> and a separate set of readers that are consuming these ledgers to move them
>> into long term storage and send them to queues / workflows to be processed.
>>
>
> I am just curious about the case you mentioned that you have a series of
> writers writing to a write-ahead log. If the write-ahead log means a
> ledger, I couldn't image how you implemented a series of writers writing to
> it, since bookkeeper just allow one writer writing to a ledger.
>
> if the write-head log is formed by several ledgers, it means that you
> might already have a mechanism to map the writer to the ledger, so when a
> writer #openLedger, it means that the ledger would be closed. basically,
> the close state could be distinguished by different calls : the ledger
> handle returned by #createLedger is an opened ledger while the ledger
> handle returned by #openLedger is a closed ledger. If you want to write
> entries, you had to create a new ledger. Either the ledger is closed or the
> writer is crashed, the ledger could not be written again. So I don't think
> you need to keep the state is your end. If I don't understand your case,
> please let me know.
>
>
>> This means I have to keep the state about which ledgers are available,
>> and which are closed, which seems to be a complete duplication of the state
>> that is already in BK.
>>
>> I'm not sure named ledgers are helpful in this situation, except that we
>> could keep less state (perhaps a sequential id.)
>>
>> On Mon, Feb 4, 2013 at 1:27 PM, Sijie Guo <gu...@gmail.com> wrote:
>>
>>>
>>> Hello, Whitney:
>>>
>>> please check the replies inline.
>>>
>>> On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>>
>>>> Hey all,
>>>>
>>>> A couple questions about running BK stand-alone:
>>>>
>>>> 1) If I call openLedgerNoRecovery am I blocking writes or not? What are
>>>> the guarantees I lose - just ordering? Can I use this to essentially read /
>>>> tail an active ledger?
>>>>
>>>
>>> open a ledger using openLedgerNoRecovery doesn't block any writes to it.
>>> And you don't lose the ordering guarantee. You could use it to read/tail an
>>> active ledger, but please keep in mind that you need to call
>>> #readLastConfirmed to catch up to the latest confirmed entries added by the
>>> writer. And the entries you could read from an openLedgerNoRecovery ledger,
>>> is just between 0 and last confirmed.
>>>
>>> you could check:
>>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long,
>>> org.apache.bookkeeper.client.BookKeeper.DigestType, byte[],
>>> org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object)
>>>
>>>
>>>>
>>>> 2) How can I access BK's metadata so that I can determine a list of
>>>> ledgers, and which ledgers are closed/open? It doesn't appear in the client
>>>> documentation (
>>>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/)
>>>> Is this not an intended operation? Are clients supposed to track ledger ids
>>>> on their own (we are currently doing this but it seems suboptimal)
>>>>
>>>>
>>> currently we don't expose the API for client. Is there any special case
>>> you are considering? We'd happy to expose it if necessary.
>>>
>>>  Since most of the cases are working in following styles: a *standby*
>>> writer observes the *active* writer state, if the *active* writer failed,
>>> the *standby* writer would take over the responsibility, closed the ledger
>>> written by *active* writer, replayed the ledger and created a new ledger to
>>> write new entries. For now, clients needs to track ledger ids on their end.
>>>
>>> There is one proposal working on providing *named* ledgers on top of
>>> bookkeeper to ease user's experience tracking ledger ids. You could check :
>>> https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are under
>>> discussion on whether to provide ledger name internally in bookkeeper for
>>> metadata access concerns. We'd like to hear your feedback on the usage of
>>> API and make it better.
>>>
>>>
>>>
>>>> Thank you;
>>>>
>>>> -Whitney Sorenson
>>>> HubSpot
>>>>
>>>>
>>>
>>
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Sijie Guo <gu...@gmail.com>.

Hello Whitney,


On Mon, Feb 4, 2013 at 10:44 AM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Thank you for responding.
>
> Forgive me if I'm missing something, but if I have a writer and separate
> readers, why would I want to have to communicate ledger ids between them?
> More specifically, we have a series of writers writing to a write-ahead log
> and a separate set of readers that are consuming these ledgers to move them
> into long term storage and send them to queues / workflows to be processed.
>

I am just curious about the case you mentioned that you have a series of
writers writing to a write-ahead log. If the write-ahead log means a
ledger, I couldn't image how you implemented a series of writers writing to
it, since bookkeeper just allow one writer writing to a ledger.

if the write-head log is formed by several ledgers, it means that you might
already have a mechanism to map the writer to the ledger, so when a writer
#openLedger, it means that the ledger would be closed. basically, the close
state could be distinguished by different calls : the ledger handle
returned by #createLedger is an opened ledger while the ledger handle
returned by #openLedger is a closed ledger. If you want to write entries,
you had to create a new ledger. Either the ledger is closed or the writer
is crashed, the ledger could not be written again. So I don't think you
need to keep the state is your end. If I don't understand your case, please
let me know.


> This means I have to keep the state about which ledgers are available, and
> which are closed, which seems to be a complete duplication of the state
> that is already in BK.
>
> I'm not sure named ledgers are helpful in this situation, except that we
> could keep less state (perhaps a sequential id.)
>
> On Mon, Feb 4, 2013 at 1:27 PM, Sijie Guo <gu...@gmail.com> wrote:
>
>>
>> Hello, Whitney:
>>
>> please check the replies inline.
>>
>> On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> Hey all,
>>>
>>> A couple questions about running BK stand-alone:
>>>
>>> 1) If I call openLedgerNoRecovery am I blocking writes or not? What are
>>> the guarantees I lose - just ordering? Can I use this to essentially read /
>>> tail an active ledger?
>>>
>>
>> open a ledger using openLedgerNoRecovery doesn't block any writes to it.
>> And you don't lose the ordering guarantee. You could use it to read/tail an
>> active ledger, but please keep in mind that you need to call
>> #readLastConfirmed to catch up to the latest confirmed entries added by the
>> writer. And the entries you could read from an openLedgerNoRecovery ledger,
>> is just between 0 and last confirmed.
>>
>> you could check:
>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long,
>> org.apache.bookkeeper.client.BookKeeper.DigestType, byte[],
>> org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object)
>>
>>
>>>
>>> 2) How can I access BK's metadata so that I can determine a list of
>>> ledgers, and which ledgers are closed/open? It doesn't appear in the client
>>> documentation (
>>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/)
>>> Is this not an intended operation? Are clients supposed to track ledger ids
>>> on their own (we are currently doing this but it seems suboptimal)
>>>
>>>
>> currently we don't expose the API for client. Is there any special case
>> you are considering? We'd happy to expose it if necessary.
>>
>>  Since most of the cases are working in following styles: a *standby*
>> writer observes the *active* writer state, if the *active* writer failed,
>> the *standby* writer would take over the responsibility, closed the ledger
>> written by *active* writer, replayed the ledger and created a new ledger to
>> write new entries. For now, clients needs to track ledger ids on their end.
>>
>> There is one proposal working on providing *named* ledgers on top of
>> bookkeeper to ease user's experience tracking ledger ids. You could check :
>> https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are under
>> discussion on whether to provide ledger name internally in bookkeeper for
>> metadata access concerns. We'd like to hear your feedback on the usage of
>> API and make it better.
>>
>>
>>
>>> Thank you;
>>>
>>> -Whitney Sorenson
>>> HubSpot
>>>
>>>
>>
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Whitney Sorenson <ws...@hubspot.com>.

Thank you for responding.

Forgive me if I'm missing something, but if I have a writer and separate
readers, why would I want to have to communicate ledger ids between them?
More specifically, we have a series of writers writing to a write-ahead log
and a separate set of readers that are consuming these ledgers to move them
into long term storage and send them to queues / workflows to be processed.
This means I have to keep the state about which ledgers are available, and
which are closed, which seems to be a complete duplication of the state
that is already in BK.

I'm not sure named ledgers are helpful in this situation, except that we
could keep less state (perhaps a sequential id.)

On Mon, Feb 4, 2013 at 1:27 PM, Sijie Guo <gu...@gmail.com> wrote:

>
> Hello, Whitney:
>
> please check the replies inline.
>
> On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> Hey all,
>>
>> A couple questions about running BK stand-alone:
>>
>> 1) If I call openLedgerNoRecovery am I blocking writes or not? What are
>> the guarantees I lose - just ordering? Can I use this to essentially read /
>> tail an active ledger?
>>
>
> open a ledger using openLedgerNoRecovery doesn't block any writes to it.
> And you don't lose the ordering guarantee. You could use it to read/tail an
> active ledger, but please keep in mind that you need to call
> #readLastConfirmed to catch up to the latest confirmed entries added by the
> writer. And the entries you could read from an openLedgerNoRecovery ledger,
> is just between 0 and last confirmed.
>
> you could check:
> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long,
> org.apache.bookkeeper.client.BookKeeper.DigestType, byte[],
> org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object)
>
>
>>
>> 2) How can I access BK's metadata so that I can determine a list of
>> ledgers, and which ledgers are closed/open? It doesn't appear in the client
>> documentation (
>> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/)
>> Is this not an intended operation? Are clients supposed to track ledger ids
>> on their own (we are currently doing this but it seems suboptimal)
>>
>>
> currently we don't expose the API for client. Is there any special case
> you are considering? We'd happy to expose it if necessary.
>
> Since most of the cases are working in following styles: a *standby*
> writer observes the *active* writer state, if the *active* writer failed,
> the *standby* writer would take over the responsibility, closed the ledger
> written by *active* writer, replayed the ledger and created a new ledger to
> write new entries. For now, clients needs to track ledger ids on their end.
>
> There is one proposal working on providing *named* ledgers on top of
> bookkeeper to ease user's experience tracking ledger ids. You could check :
> https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are under
> discussion on whether to provide ledger name internally in bookkeeper for
> metadata access concerns. We'd like to hear your feedback on the usage of
> API and make it better.
>
>
>
>> Thank you;
>>
>> -Whitney Sorenson
>> HubSpot
>>
>>
>

Re: Using BK as WAL and accessing ledger metadata

Posted by Sijie Guo <gu...@gmail.com>.

Hello, Whitney:

please check the replies inline.

On Mon, Feb 4, 2013 at 8:47 AM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Hey all,
>
> A couple questions about running BK stand-alone:
>
> 1) If I call openLedgerNoRecovery am I blocking writes or not? What are
> the guarantees I lose - just ordering? Can I use this to essentially read /
> tail an active ledger?
>

open a ledger using openLedgerNoRecovery doesn't block any writes to it.
And you don't lose the ordering guarantee. You could use it to read/tail an
active ledger, but please keep in mind that you need to call
#readLastConfirmed to catch up to the latest confirmed entries added by the
writer. And the entries you could read from an openLedgerNoRecovery ledger,
is just between 0 and last confirmed.

you could check:
http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/BookKeeper.html#asyncOpenLedgerNoRecovery(long,
org.apache.bookkeeper.client.BookKeeper.DigestType, byte[],
org.apache.bookkeeper.client.AsyncCallback.OpenCallback, java.lang.Object)

>
> 2) How can I access BK's metadata so that I can determine a list of
> ledgers, and which ledgers are closed/open? It doesn't appear in the client
> documentation (
> http://zookeeper.apache.org/bookkeeper/docs/r4.2.0/apidocs/org/apache/bookkeeper/client/)
> Is this not an intended operation? Are clients supposed to track ledger ids
> on their own (we are currently doing this but it seems suboptimal)
>
>
currently we don't expose the API for client. Is there any special case you
are considering? We'd happy to expose it if necessary.

Since most of the cases are working in following styles: a *standby* writer
observes the *active* writer state, if the *active* writer failed, the
*standby* writer would take over the responsibility, closed the ledger
written by *active* writer, replayed the ledger and created a new ledger to
write new entries. For now, clients needs to track ledger ids on their end.

There is one proposal working on providing *named* ledgers on top of
bookkeeper to ease user's experience tracking ledger ids. You could check :
https://issues.apache.org/jira/browse/BOOKKEEPER-220 . And we are under
discussion on whether to provide ledger name internally in bookkeeper for
metadata access concerns. We'd like to hear your feedback on the usage of
API and make it better.

> Thank you;
>
> -Whitney Sorenson
> HubSpot
>
>