You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Ivan Kelly <iv...@apache.org> on 2018/07/27 17:54:50 UTC

Changing ledger metadata to binary format

Hi folks,

I think this was discussed yesterday in the meeting, and a bit on
slack, but I haven't seen anything much written down, so I'm starting
a thread here.

The crux of the problem is that the protobuf text format currently
used for metadata cannot have new fields added without breaking client
compatability, as the text parser can't be configured to ignore
unrecognised values (and google aren't going to fix this).

Protobuf binary format does support new fields though. So if a field
is added, a client that knows nothing of it can read it back without
issue.

I propose we approach this the following way:
- We already have a version in /ledgers/LAYOUT. In a current cluster,
this contains
```
2
org.apache.bookkeeper.meta.HierarchicalLedgerManagerFactory:1
```
- We define a new LedgerMetadata protobuf. This is a chance to clean
up mistakes we've made previously.
- When writing a metadata, check what is in /ledgers/LAYOUT. If it is
as above, write using the current text protobuf. If is bumped, use the
new binary format.
- When reading metadata, first try to parse binary, and fall back to
text if that fails. (we could also add a layout check to short
circuit)

When upgrading a cluster, the layout will be as above, it will
continue to only use text format until there is some admin
intervention. When the admin is satisfied that all clients are on a
new enough version, they call a script which bumps the version. From
this point clients will write the binary version.

New clusters go straight to binary. We will also need a script to dump
the metadata from a znode. One wrinkle, which is another reason to
create a new LedgerMetadata protobuf, is that when you are writing in
text format, there's no way to filter the fields. So even if we write
in text format, if we add new fields we are breaking old clients.
Another approach would be to ensure the protobuf only contains the
fields that are available now, but this seems messy to me.

Anyhow, this is to be a jumping off point for discussion. Comments welcome,

Cheers,
Ivan

Re: Changing ledger metadata to binary format

Posted by Enrico Olivelli <eo...@gmail.com>.
Il lun 30 lug 2018, 17:32 Venkateswara Rao Jujjuri <ju...@gmail.com> ha
scritto:

> > As far as I understand we are working on this change in order to be ready
> > with a full "rollback" story, so we have to be one version ahead.
>
> If that is the case, I think we need to look at the entire roll-back story.
> At this point, I am not sure if we have identified all the pieces and if
> they fit together.
> Enrico, are you driving the roll-back story?


No, I am not.
I am only coordinating 4.8 release and I would like to move things forward.
I am not in a hurry for this change.
To me we can defer it to 4.9

Enrico


or whoever, can you please
> come up with
> a comprehensive proposal?
>
> Thanks,
> JV
>
>
>
>
> On Mon, Jul 30, 2018 at 8:05 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Il giorno lun 30 lug 2018 alle ore 16:55 Venkateswara Rao Jujjuri <
> > jujjuri@gmail.com> ha scritto:
> >
> > > We have introduced customMetadata in the past and that was not
> mandatory.
> > > How did we make that work with text format?
> > > Also, my preference is to wait until we have a need to introduce a new
> > > field for this conversion if needed.
> > >
> >
> > JV,
> > So you are saying that you prefer to defer this change to the first case,
> > when it will be really needed ?
> >
> > As far as I understand we are working on this change in order to be ready
> > with a full "rollback" story, so we have to be one version ahead.
> >
> > Enrico
> >
> >
> >
> >
> > >
> > > JV
> > >
> > > On Mon, Jul 30, 2018 at 12:42 AM, Enrico Olivelli <eolivelli@gmail.com
> >
> > > wrote:
> > >
> > > > Il giorno lun 30 lug 2018 alle ore 09:17 Ivan Kelly <
> ivank@apache.org>
> > > ha
> > > > scritto:
> > > >
> > > > > >> Thank you for putting this together. It is also good to put this
> > as
> > > a
> > > > > BP,
> > > > > >> since it is about the metadata layout.
> > > > >
> > > > > I'll put a BP up this week once I have initial feedback.
> > > > >
> > > > > >> > - When writing a metadata, check what is in /ledgers/LAYOUT.
> If
> > it
> > > > is
> > > > > >> > as above, write using the current text protobuf. If is bumped,
> > use
> > > > the
> > > > > >> > new binary format.
> > > > > >
> > > > > > isn't it too costly? Adding a zk read for each write. We could
> add
> > a
> > > > > watch
> > > > > > but is has an important cost
> > > > >
> > > > > This cost can be amortized. In simplest case, we only read when
> > client
> > > > > starts (as already happens). Client are able both read and write
> the
> > > > > old format, so clients that read the pre-upgrade version will still
> > > > > write in text format, and this is fine.
> > > > >
> > > >
> > > > For me it is fine to read it when client starts. A rolling restart of
> > the
> > > > client application will be enough to load the new configuration.
> > > >
> > > >
> > > > >
> > > > > However, we would want them to eventually see the new format
> without
> > a
> > > > > restart. At one extreme, we could put a watch on the znode, but
> this
> > > > > could be quite heavy (I think the only per-client watch we have now
> > is
> > > > > on /ledgers/available). Alternatively, we could just read
> > > > > /ledgers/LAYOUT once per day. This way clients who saw the old
> > version
> > > > > will eventually get updated.
> > > > >
> > > >
> > > > I prefer NOT to use the watch, this cost will be payed forever even
> > when
> > > > there will be no configuration changes.
> > > >
> > > > I would go with the 'load at start' option, it is clear and works
> fine
> > > >
> > > >
> > > >
> > > > >
> > > > > > What about having a client side config writeMetadataVersion ?
> > > > > > We start a new metadata version, the new one  will be encoded as
> > > > binary.
> > > > > > By default 4.8 clients will use previous version, as we already
> do
> > > for
> > > > > > journal and fileinfo on bookies.
> > > > > >
> > > > > > Each ledger is independent from the others, there is no need of a
> > > > global
> > > > > > flag written on zk.
> > > > >
> > > > > There is a global property here though,which only the administrator
> > > > > can control. It is "Can all clients in the cluster/application read
> > > > > the new ledger metadata version?". This is why I'd like to keep it
> as
> > > > > a centralized flag rather than allowing each client to decide for
> > > > > themselves.
> > > > >
> > > >
> > > > ok, let's keep it
> > > >
> > > > Enrico
> > > >
> > > >
> > > >
> > > > >
> > > > > > Don't we already have some tool? (Maybe I only have seen such
> tool
> > in
> > > > my
> > > > > > company applications)
> > > > >
> > > > > It may already exist, I haven't checked.
> > > > >
> > > > > -Ivan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jvrao
> > > ---
> > > First they ignore you, then they laugh at you, then they fight you,
> then
> > > you win. - Mahatma Gandhi
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>
-- 


-- Enrico Olivelli

Re: Changing ledger metadata to binary format

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
> As far as I understand we are working on this change in order to be ready
> with a full "rollback" story, so we have to be one version ahead.

If that is the case, I think we need to look at the entire roll-back story.
At this point, I am not sure if we have identified all the pieces and if
they fit together.
Enrico, are you driving the roll-back story? or whoever, can you please
come up with
a comprehensive proposal?

Thanks,
JV




On Mon, Jul 30, 2018 at 8:05 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Il giorno lun 30 lug 2018 alle ore 16:55 Venkateswara Rao Jujjuri <
> jujjuri@gmail.com> ha scritto:
>
> > We have introduced customMetadata in the past and that was not mandatory.
> > How did we make that work with text format?
> > Also, my preference is to wait until we have a need to introduce a new
> > field for this conversion if needed.
> >
>
> JV,
> So you are saying that you prefer to defer this change to the first case,
> when it will be really needed ?
>
> As far as I understand we are working on this change in order to be ready
> with a full "rollback" story, so we have to be one version ahead.
>
> Enrico
>
>
>
>
> >
> > JV
> >
> > On Mon, Jul 30, 2018 at 12:42 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Il giorno lun 30 lug 2018 alle ore 09:17 Ivan Kelly <iv...@apache.org>
> > ha
> > > scritto:
> > >
> > > > >> Thank you for putting this together. It is also good to put this
> as
> > a
> > > > BP,
> > > > >> since it is about the metadata layout.
> > > >
> > > > I'll put a BP up this week once I have initial feedback.
> > > >
> > > > >> > - When writing a metadata, check what is in /ledgers/LAYOUT. If
> it
> > > is
> > > > >> > as above, write using the current text protobuf. If is bumped,
> use
> > > the
> > > > >> > new binary format.
> > > > >
> > > > > isn't it too costly? Adding a zk read for each write. We could add
> a
> > > > watch
> > > > > but is has an important cost
> > > >
> > > > This cost can be amortized. In simplest case, we only read when
> client
> > > > starts (as already happens). Client are able both read and write the
> > > > old format, so clients that read the pre-upgrade version will still
> > > > write in text format, and this is fine.
> > > >
> > >
> > > For me it is fine to read it when client starts. A rolling restart of
> the
> > > client application will be enough to load the new configuration.
> > >
> > >
> > > >
> > > > However, we would want them to eventually see the new format without
> a
> > > > restart. At one extreme, we could put a watch on the znode, but this
> > > > could be quite heavy (I think the only per-client watch we have now
> is
> > > > on /ledgers/available). Alternatively, we could just read
> > > > /ledgers/LAYOUT once per day. This way clients who saw the old
> version
> > > > will eventually get updated.
> > > >
> > >
> > > I prefer NOT to use the watch, this cost will be payed forever even
> when
> > > there will be no configuration changes.
> > >
> > > I would go with the 'load at start' option, it is clear and works fine
> > >
> > >
> > >
> > > >
> > > > > What about having a client side config writeMetadataVersion ?
> > > > > We start a new metadata version, the new one  will be encoded as
> > > binary.
> > > > > By default 4.8 clients will use previous version, as we already do
> > for
> > > > > journal and fileinfo on bookies.
> > > > >
> > > > > Each ledger is independent from the others, there is no need of a
> > > global
> > > > > flag written on zk.
> > > >
> > > > There is a global property here though,which only the administrator
> > > > can control. It is "Can all clients in the cluster/application read
> > > > the new ledger metadata version?". This is why I'd like to keep it as
> > > > a centralized flag rather than allowing each client to decide for
> > > > themselves.
> > > >
> > >
> > > ok, let's keep it
> > >
> > > Enrico
> > >
> > >
> > >
> > > >
> > > > > Don't we already have some tool? (Maybe I only have seen such tool
> in
> > > my
> > > > > company applications)
> > > >
> > > > It may already exist, I haven't checked.
> > > >
> > > > -Ivan
> > > >
> > >
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Changing ledger metadata to binary format

Posted by Enrico Olivelli <eo...@gmail.com>.
Il giorno lun 30 lug 2018 alle ore 16:55 Venkateswara Rao Jujjuri <
jujjuri@gmail.com> ha scritto:

> We have introduced customMetadata in the past and that was not mandatory.
> How did we make that work with text format?
> Also, my preference is to wait until we have a need to introduce a new
> field for this conversion if needed.
>

JV,
So you are saying that you prefer to defer this change to the first case,
when it will be really needed ?

As far as I understand we are working on this change in order to be ready
with a full "rollback" story, so we have to be one version ahead.

Enrico




>
> JV
>
> On Mon, Jul 30, 2018 at 12:42 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Il giorno lun 30 lug 2018 alle ore 09:17 Ivan Kelly <iv...@apache.org>
> ha
> > scritto:
> >
> > > >> Thank you for putting this together. It is also good to put this as
> a
> > > BP,
> > > >> since it is about the metadata layout.
> > >
> > > I'll put a BP up this week once I have initial feedback.
> > >
> > > >> > - When writing a metadata, check what is in /ledgers/LAYOUT. If it
> > is
> > > >> > as above, write using the current text protobuf. If is bumped, use
> > the
> > > >> > new binary format.
> > > >
> > > > isn't it too costly? Adding a zk read for each write. We could add a
> > > watch
> > > > but is has an important cost
> > >
> > > This cost can be amortized. In simplest case, we only read when client
> > > starts (as already happens). Client are able both read and write the
> > > old format, so clients that read the pre-upgrade version will still
> > > write in text format, and this is fine.
> > >
> >
> > For me it is fine to read it when client starts. A rolling restart of the
> > client application will be enough to load the new configuration.
> >
> >
> > >
> > > However, we would want them to eventually see the new format without a
> > > restart. At one extreme, we could put a watch on the znode, but this
> > > could be quite heavy (I think the only per-client watch we have now is
> > > on /ledgers/available). Alternatively, we could just read
> > > /ledgers/LAYOUT once per day. This way clients who saw the old version
> > > will eventually get updated.
> > >
> >
> > I prefer NOT to use the watch, this cost will be payed forever even when
> > there will be no configuration changes.
> >
> > I would go with the 'load at start' option, it is clear and works fine
> >
> >
> >
> > >
> > > > What about having a client side config writeMetadataVersion ?
> > > > We start a new metadata version, the new one  will be encoded as
> > binary.
> > > > By default 4.8 clients will use previous version, as we already do
> for
> > > > journal and fileinfo on bookies.
> > > >
> > > > Each ledger is independent from the others, there is no need of a
> > global
> > > > flag written on zk.
> > >
> > > There is a global property here though,which only the administrator
> > > can control. It is "Can all clients in the cluster/application read
> > > the new ledger metadata version?". This is why I'd like to keep it as
> > > a centralized flag rather than allowing each client to decide for
> > > themselves.
> > >
> >
> > ok, let's keep it
> >
> > Enrico
> >
> >
> >
> > >
> > > > Don't we already have some tool? (Maybe I only have seen such tool in
> > my
> > > > company applications)
> > >
> > > It may already exist, I haven't checked.
> > >
> > > -Ivan
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Re: Changing ledger metadata to binary format

Posted by Ivan Kelly <iv...@apache.org>.
On Thu, May 30, 2019 at 12:15 AM Venkateswara Rao Jujjuri
<ju...@gmail.com> wrote:
>
> > "Let's decide this when we need it"
>
> Right. This is the time as we are trying to add fault domain info to the
> cookie.

When you say cookie, you mean under /ledgers/cookies? That falls
outside the scope of binary metadata.

> > to make a flag in the metadata store that we flip to enable it.
> Right I do recall the same. But given your proposal in this thread, all
> readers try to read in binary format,
> if that fails they fail back to text mode.

Actually, it doesn't even need this fallback, as the version of the
metadata is stored at the very start, so we switch on that.

> If this is the case, does the following make sense?
> 1. Bookies are updated first with new version. Replication workers
> recognize binary/text format and any metadata updates happen in the same
> format.
> 2. Clients are rolled out later with a caveat that the ledgers created by
> new clients can't be read by old clients. This may be ok as long as they
> don't share
>     the same ledgers.
> With these two things, do we need an explicit flip to enable?

This works, but you need to expose a mechanism in the client to tell
it to write in V3.

As long as old clients don't try and read, there should be no problem.

> I can read the patches again, but let me ask :) did we take care of bookie
> shell also to understand binary format?

Yes, bookie shell can understand it since it reads though the same interfaces.

-Ivan

Re: Changing ledger metadata to binary format

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
> "Let's decide this when we need it"

Right. This is the time as we are trying to add fault domain info to the
cookie.

> to make a flag in the metadata store that we flip to enable it.
Right I do recall the same. But given your proposal in this thread, all
readers try to read in binary format,
if that fails they fail back to text mode.
If this is the case, does the following make sense?
1. Bookies are updated first with new version. Replication workers
recognize binary/text format and any metadata updates happen in the same
format.
2. Clients are rolled out later with a caveat that the ledgers created by
new clients can't be read by old clients. This may be ok as long as they
don't share
    the same ledgers.
With these two things, do we need an explicit flip to enable?

> The important thing at the time was to make it so clients can _read_
binary metadata

I can read the patches again, but let me ask :) did we take care of bookie
shell also to understand binary format?

On Wed, May 29, 2019 at 3:15 AM Ivan Kelly <iv...@apache.org> wrote:

> > What is our plan to move forward with binary format?
>
> I've not plans regarding it. Moving forward with it will happen when
> someone comes with a metadata change which will break text metadata
> users (i.e. almost any metadata change).
>
> > Anyone using binary format in production? even for new clusters?
>
> We are not.
>
> > What is blocking us from using binary format for the new clusters?
>
> We need a tool to enable it. I can't remember the exact details, but
> there was a conversation about whether to make each users specify the
> metadata version in their configuration, or to make a flag in the
> metadata store that we flip to enable it. I don't think we settled on
> anything, and it turned into a "Let's decide this when we need it"
> type of thing. The important thing at the time was to make it so
> clients can _read_ binary metadata, so that these clients can then
> read future updates.
>
> -Ivan
>


-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Changing ledger metadata to binary format

Posted by Ivan Kelly <iv...@apache.org>.
> What is our plan to move forward with binary format?

I've not plans regarding it. Moving forward with it will happen when
someone comes with a metadata change which will break text metadata
users (i.e. almost any metadata change).

> Anyone using binary format in production? even for new clusters?

We are not.

> What is blocking us from using binary format for the new clusters?

We need a tool to enable it. I can't remember the exact details, but
there was a conversation about whether to make each users specify the
metadata version in their configuration, or to make a flag in the
metadata store that we flip to enable it. I don't think we settled on
anything, and it turned into a "Let's decide this when we need it"
type of thing. The important thing at the time was to make it so
clients can _read_ binary metadata, so that these clients can then
read future updates.

-Ivan

Re: Changing ledger metadata to binary format

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
Hi all, bringing back this conversation back to life..

https://github.com/apache/bookkeeper/issues/723

What is our plan to move forward with binary format? Anyone using binary
format in production? even for new clusters?
What is blocking us from using binary format for the new clusters?

On Mon, Jul 30, 2018 at 11:02 AM Sijie Guo <gu...@gmail.com> wrote:

> JV,
>
> 1) for the real world use case, this is the problem I have hit which I have
> to manually add a flag in the ledger metadata to avoid serializing ctime.
>
> https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java#L105
>
> 2)  > We have introduced customMetadata in the past and that was not
> mandatory.
> How did we make that work with text format?
>
> by default if you don't use customMetadata, customMetadata is not
> serialized into metadata. so there is no problem for rolling back if you
> don't
> use this feature. so that's why it wasn't a problem for most of the people.
>
> the ctime field was the problem.
>
> 3)
> > At this point, I am not sure if we have identified all the pieces and if
> they fit together.
> > Enrico, are you driving the roll-back story? or whoever, can you please
> come up with a comprehensive proposal?
>
> we have a good-enough story for upgrading/rolling-back on wire-protocol and
> disk formats. however we don't have a good story on metadata part.
> that's why I sort of pulling in the metadata piece only. however if filling
> in a big picture would help discussing this problem, I can drive the whole
> picture
> for this roll-back story.
>
> - Sijie
>
> On Mon, Jul 30, 2018 at 7:55 AM Venkateswara Rao Jujjuri <
> jujjuri@gmail.com>
> wrote:
>
> > We have introduced customMetadata in the past and that was not mandatory.
> > How did we make that work with text format?
> >
>
>
>
> > Also, my preference is to wait until we have a need to introduce a new
> > field for this conversion if needed.
> >
> > JV
> >
> > On Mon, Jul 30, 2018 at 12:42 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Il giorno lun 30 lug 2018 alle ore 09:17 Ivan Kelly <iv...@apache.org>
> > ha
> > > scritto:
> > >
> > > > >> Thank you for putting this together. It is also good to put this
> as
> > a
> > > > BP,
> > > > >> since it is about the metadata layout.
> > > >
> > > > I'll put a BP up this week once I have initial feedback.
> > > >
> > > > >> > - When writing a metadata, check what is in /ledgers/LAYOUT. If
> it
> > > is
> > > > >> > as above, write using the current text protobuf. If is bumped,
> use
> > > the
> > > > >> > new binary format.
> > > > >
> > > > > isn't it too costly? Adding a zk read for each write. We could add
> a
> > > > watch
> > > > > but is has an important cost
> > > >
> > > > This cost can be amortized. In simplest case, we only read when
> client
> > > > starts (as already happens). Client are able both read and write the
> > > > old format, so clients that read the pre-upgrade version will still
> > > > write in text format, and this is fine.
> > > >
> > >
> > > For me it is fine to read it when client starts. A rolling restart of
> the
> > > client application will be enough to load the new configuration.
> > >
> > >
> > > >
> > > > However, we would want them to eventually see the new format without
> a
> > > > restart. At one extreme, we could put a watch on the znode, but this
> > > > could be quite heavy (I think the only per-client watch we have now
> is
> > > > on /ledgers/available). Alternatively, we could just read
> > > > /ledgers/LAYOUT once per day. This way clients who saw the old
> version
> > > > will eventually get updated.
> > > >
> > >
> > > I prefer NOT to use the watch, this cost will be payed forever even
> when
> > > there will be no configuration changes.
> > >
> > > I would go with the 'load at start' option, it is clear and works fine
> > >
> > >
> > >
> > > >
> > > > > What about having a client side config writeMetadataVersion ?
> > > > > We start a new metadata version, the new one  will be encoded as
> > > binary.
> > > > > By default 4.8 clients will use previous version, as we already do
> > for
> > > > > journal and fileinfo on bookies.
> > > > >
> > > > > Each ledger is independent from the others, there is no need of a
> > > global
> > > > > flag written on zk.
> > > >
> > > > There is a global property here though,which only the administrator
> > > > can control. It is "Can all clients in the cluster/application read
> > > > the new ledger metadata version?". This is why I'd like to keep it as
> > > > a centralized flag rather than allowing each client to decide for
> > > > themselves.
> > > >
> > >
> > > ok, let's keep it
> > >
> > > Enrico
> > >
> > >
> > >
> > > >
> > > > > Don't we already have some tool? (Maybe I only have seen such tool
> in
> > > my
> > > > > company applications)
> > > >
> > > > It may already exist, I haven't checked.
> > > >
> > > > -Ivan
> > > >
> > >
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>


-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Changing ledger metadata to binary format

Posted by Sijie Guo <gu...@gmail.com>.
JV,

1) for the real world use case, this is the problem I have hit which I have
to manually add a flag in the ledger metadata to avoid serializing ctime.
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java#L105

2)  > We have introduced customMetadata in the past and that was not
mandatory.
How did we make that work with text format?

by default if you don't use customMetadata, customMetadata is not
serialized into metadata. so there is no problem for rolling back if you
don't
use this feature. so that's why it wasn't a problem for most of the people.

the ctime field was the problem.

3)
> At this point, I am not sure if we have identified all the pieces and if
they fit together.
> Enrico, are you driving the roll-back story? or whoever, can you please
come up with a comprehensive proposal?

we have a good-enough story for upgrading/rolling-back on wire-protocol and
disk formats. however we don't have a good story on metadata part.
that's why I sort of pulling in the metadata piece only. however if filling
in a big picture would help discussing this problem, I can drive the whole
picture
for this roll-back story.

- Sijie

On Mon, Jul 30, 2018 at 7:55 AM Venkateswara Rao Jujjuri <ju...@gmail.com>
wrote:

> We have introduced customMetadata in the past and that was not mandatory.
> How did we make that work with text format?
>



> Also, my preference is to wait until we have a need to introduce a new
> field for this conversion if needed.
>
> JV
>
> On Mon, Jul 30, 2018 at 12:42 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Il giorno lun 30 lug 2018 alle ore 09:17 Ivan Kelly <iv...@apache.org>
> ha
> > scritto:
> >
> > > >> Thank you for putting this together. It is also good to put this as
> a
> > > BP,
> > > >> since it is about the metadata layout.
> > >
> > > I'll put a BP up this week once I have initial feedback.
> > >
> > > >> > - When writing a metadata, check what is in /ledgers/LAYOUT. If it
> > is
> > > >> > as above, write using the current text protobuf. If is bumped, use
> > the
> > > >> > new binary format.
> > > >
> > > > isn't it too costly? Adding a zk read for each write. We could add a
> > > watch
> > > > but is has an important cost
> > >
> > > This cost can be amortized. In simplest case, we only read when client
> > > starts (as already happens). Client are able both read and write the
> > > old format, so clients that read the pre-upgrade version will still
> > > write in text format, and this is fine.
> > >
> >
> > For me it is fine to read it when client starts. A rolling restart of the
> > client application will be enough to load the new configuration.
> >
> >
> > >
> > > However, we would want them to eventually see the new format without a
> > > restart. At one extreme, we could put a watch on the znode, but this
> > > could be quite heavy (I think the only per-client watch we have now is
> > > on /ledgers/available). Alternatively, we could just read
> > > /ledgers/LAYOUT once per day. This way clients who saw the old version
> > > will eventually get updated.
> > >
> >
> > I prefer NOT to use the watch, this cost will be payed forever even when
> > there will be no configuration changes.
> >
> > I would go with the 'load at start' option, it is clear and works fine
> >
> >
> >
> > >
> > > > What about having a client side config writeMetadataVersion ?
> > > > We start a new metadata version, the new one  will be encoded as
> > binary.
> > > > By default 4.8 clients will use previous version, as we already do
> for
> > > > journal and fileinfo on bookies.
> > > >
> > > > Each ledger is independent from the others, there is no need of a
> > global
> > > > flag written on zk.
> > >
> > > There is a global property here though,which only the administrator
> > > can control. It is "Can all clients in the cluster/application read
> > > the new ledger metadata version?". This is why I'd like to keep it as
> > > a centralized flag rather than allowing each client to decide for
> > > themselves.
> > >
> >
> > ok, let's keep it
> >
> > Enrico
> >
> >
> >
> > >
> > > > Don't we already have some tool? (Maybe I only have seen such tool in
> > my
> > > > company applications)
> > >
> > > It may already exist, I haven't checked.
> > >
> > > -Ivan
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Re: Changing ledger metadata to binary format

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
We have introduced customMetadata in the past and that was not mandatory.
How did we make that work with text format?
Also, my preference is to wait until we have a need to introduce a new
field for this conversion if needed.

JV

On Mon, Jul 30, 2018 at 12:42 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Il giorno lun 30 lug 2018 alle ore 09:17 Ivan Kelly <iv...@apache.org> ha
> scritto:
>
> > >> Thank you for putting this together. It is also good to put this as a
> > BP,
> > >> since it is about the metadata layout.
> >
> > I'll put a BP up this week once I have initial feedback.
> >
> > >> > - When writing a metadata, check what is in /ledgers/LAYOUT. If it
> is
> > >> > as above, write using the current text protobuf. If is bumped, use
> the
> > >> > new binary format.
> > >
> > > isn't it too costly? Adding a zk read for each write. We could add a
> > watch
> > > but is has an important cost
> >
> > This cost can be amortized. In simplest case, we only read when client
> > starts (as already happens). Client are able both read and write the
> > old format, so clients that read the pre-upgrade version will still
> > write in text format, and this is fine.
> >
>
> For me it is fine to read it when client starts. A rolling restart of the
> client application will be enough to load the new configuration.
>
>
> >
> > However, we would want them to eventually see the new format without a
> > restart. At one extreme, we could put a watch on the znode, but this
> > could be quite heavy (I think the only per-client watch we have now is
> > on /ledgers/available). Alternatively, we could just read
> > /ledgers/LAYOUT once per day. This way clients who saw the old version
> > will eventually get updated.
> >
>
> I prefer NOT to use the watch, this cost will be payed forever even when
> there will be no configuration changes.
>
> I would go with the 'load at start' option, it is clear and works fine
>
>
>
> >
> > > What about having a client side config writeMetadataVersion ?
> > > We start a new metadata version, the new one  will be encoded as
> binary.
> > > By default 4.8 clients will use previous version, as we already do for
> > > journal and fileinfo on bookies.
> > >
> > > Each ledger is independent from the others, there is no need of a
> global
> > > flag written on zk.
> >
> > There is a global property here though,which only the administrator
> > can control. It is "Can all clients in the cluster/application read
> > the new ledger metadata version?". This is why I'd like to keep it as
> > a centralized flag rather than allowing each client to decide for
> > themselves.
> >
>
> ok, let's keep it
>
> Enrico
>
>
>
> >
> > > Don't we already have some tool? (Maybe I only have seen such tool in
> my
> > > company applications)
> >
> > It may already exist, I haven't checked.
> >
> > -Ivan
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Changing ledger metadata to binary format

Posted by Enrico Olivelli <eo...@gmail.com>.
Il giorno lun 30 lug 2018 alle ore 09:17 Ivan Kelly <iv...@apache.org> ha
scritto:

> >> Thank you for putting this together. It is also good to put this as a
> BP,
> >> since it is about the metadata layout.
>
> I'll put a BP up this week once I have initial feedback.
>
> >> > - When writing a metadata, check what is in /ledgers/LAYOUT. If it is
> >> > as above, write using the current text protobuf. If is bumped, use the
> >> > new binary format.
> >
> > isn't it too costly? Adding a zk read for each write. We could add a
> watch
> > but is has an important cost
>
> This cost can be amortized. In simplest case, we only read when client
> starts (as already happens). Client are able both read and write the
> old format, so clients that read the pre-upgrade version will still
> write in text format, and this is fine.
>

For me it is fine to read it when client starts. A rolling restart of the
client application will be enough to load the new configuration.


>
> However, we would want them to eventually see the new format without a
> restart. At one extreme, we could put a watch on the znode, but this
> could be quite heavy (I think the only per-client watch we have now is
> on /ledgers/available). Alternatively, we could just read
> /ledgers/LAYOUT once per day. This way clients who saw the old version
> will eventually get updated.
>

I prefer NOT to use the watch, this cost will be payed forever even when
there will be no configuration changes.

I would go with the 'load at start' option, it is clear and works fine



>
> > What about having a client side config writeMetadataVersion ?
> > We start a new metadata version, the new one  will be encoded as binary.
> > By default 4.8 clients will use previous version, as we already do for
> > journal and fileinfo on bookies.
> >
> > Each ledger is independent from the others, there is no need of a global
> > flag written on zk.
>
> There is a global property here though,which only the administrator
> can control. It is "Can all clients in the cluster/application read
> the new ledger metadata version?". This is why I'd like to keep it as
> a centralized flag rather than allowing each client to decide for
> themselves.
>

ok, let's keep it

Enrico



>
> > Don't we already have some tool? (Maybe I only have seen such tool in my
> > company applications)
>
> It may already exist, I haven't checked.
>
> -Ivan
>

Re: Changing ledger metadata to binary format

Posted by Ivan Kelly <iv...@apache.org>.
>> Thank you for putting this together. It is also good to put this as a BP,
>> since it is about the metadata layout.

I'll put a BP up this week once I have initial feedback.

>> > - When writing a metadata, check what is in /ledgers/LAYOUT. If it is
>> > as above, write using the current text protobuf. If is bumped, use the
>> > new binary format.
>
> isn't it too costly? Adding a zk read for each write. We could add a watch
> but is has an important cost

This cost can be amortized. In simplest case, we only read when client
starts (as already happens). Client are able both read and write the
old format, so clients that read the pre-upgrade version will still
write in text format, and this is fine.

However, we would want them to eventually see the new format without a
restart. At one extreme, we could put a watch on the znode, but this
could be quite heavy (I think the only per-client watch we have now is
on /ledgers/available). Alternatively, we could just read
/ledgers/LAYOUT once per day. This way clients who saw the old version
will eventually get updated.

> What about having a client side config writeMetadataVersion ?
> We start a new metadata version, the new one  will be encoded as binary.
> By default 4.8 clients will use previous version, as we already do for
> journal and fileinfo on bookies.
>
> Each ledger is independent from the others, there is no need of a global
> flag written on zk.

There is a global property here though,which only the administrator
can control. It is "Can all clients in the cluster/application read
the new ledger metadata version?". This is why I'd like to keep it as
a centralized flag rather than allowing each client to decide for
themselves.

> Don't we already have some tool? (Maybe I only have seen such tool in my
> company applications)

It may already exist, I haven't checked.

-Ivan

Re: Changing ledger metadata to binary format

Posted by Enrico Olivelli <eo...@gmail.com>.
Il ven 27 lug 2018, 20:30 Sijie Guo <gu...@gmail.com> ha scritto:

> Ivan,
>
> Thank you for putting this together. It is also good to put this as a BP,
> since it is about the metadata layout.
>
> On Fri, Jul 27, 2018 at 10:54 AM Ivan Kelly <iv...@apache.org> wrote:
>
> > Hi folks,
> >
> > I think this was discussed yesterday in the meeting, and a bit on
> > slack, but I haven't seen anything much written down, so I'm starting
> > a thread here.
> >
> > The crux of the problem is that the protobuf text format currently
> > used for metadata cannot have new fields added without breaking client
> > compatability, as the text parser can't be configured to ignore
> > unrecognised values (and google aren't going to fix this).
> >
> > Protobuf binary format does support new fields though. So if a field
> > is added, a client that knows nothing of it can read it back without
> > issue.
> >
> > I propose we approach this the following way:
> > - We already have a version in /ledgers/LAYOUT. In a current cluster,
> > this contains
> > ```
> > 2
> > org.apache.bookkeeper.meta.HierarchicalLedgerManagerFactory:1
> > ```
> > - We define a new LedgerMetadata protobuf. This is a chance to clean
> > up mistakes we've made previously.
> > - When writing a metadata, check what is in /ledgers/LAYOUT. If it is
> > as above, write using the current text protobuf. If is bumped, use the
> > new binary format.
>

isn't it too costly? Adding a zk read for each write. We could add a watch
but is has an important cost

What about having a client side config writeMetadataVersion ?
We start a new metadata version, the new one  will be encoded as binary.
By default 4.8 clients will use previous version, as we already do for
journal and fileinfo on bookies.

Each ledger is independent from the others, there is no need of a global
flag written on zk.

> - When reading metadata, first try to parse binary, and fall back to
> > text if that fails. (we could also add a layout check to short
> > circuit)
>

This sounds good. No config needed.

>
> > When upgrading a cluster, the layout will be as above, it will
> > continue to only use text format until there is some admin
> > intervention. When the admin is satisfied that all clients are on a
> > new enough version, they call a script which bumps the version. From
> > this point clients will write the binary version.
> >
> > New clusters go straight to binary. We will also need a script to dump
> > the metadata from a znode.


Don't we already have some tool? (Maybe I only have seen such tool in my
company applications)

One wrinkle, which is another reason to
> > create a new LedgerMetadata protobuf, is that when you are writing in
> > text format, there's no way to filter the fields. So even if we write
> > in text format, if we add new fields we are breaking old clients.
> > Another approach would be to ensure the protobuf only contains the
> > fields that are available now, but this seems messy to me.
> >
> > Anyhow, this is to be a jumping off point for discussion. Comments
> welcome,
> >
> > Cheers,
> > Ivan
> >
>
-- 


-- Enrico Olivelli

Re: Changing ledger metadata to binary format

Posted by Sijie Guo <gu...@gmail.com>.
Ivan,

Thank you for putting this together. It is also good to put this as a BP,
since it is about the metadata layout.

On Fri, Jul 27, 2018 at 10:54 AM Ivan Kelly <iv...@apache.org> wrote:

> Hi folks,
>
> I think this was discussed yesterday in the meeting, and a bit on
> slack, but I haven't seen anything much written down, so I'm starting
> a thread here.
>
> The crux of the problem is that the protobuf text format currently
> used for metadata cannot have new fields added without breaking client
> compatability, as the text parser can't be configured to ignore
> unrecognised values (and google aren't going to fix this).
>
> Protobuf binary format does support new fields though. So if a field
> is added, a client that knows nothing of it can read it back without
> issue.
>
> I propose we approach this the following way:
> - We already have a version in /ledgers/LAYOUT. In a current cluster,
> this contains
> ```
> 2
> org.apache.bookkeeper.meta.HierarchicalLedgerManagerFactory:1
> ```
> - We define a new LedgerMetadata protobuf. This is a chance to clean
> up mistakes we've made previously.
> - When writing a metadata, check what is in /ledgers/LAYOUT. If it is
> as above, write using the current text protobuf. If is bumped, use the
> new binary format.
> - When reading metadata, first try to parse binary, and fall back to
> text if that fails. (we could also add a layout check to short
> circuit)
>
> When upgrading a cluster, the layout will be as above, it will
> continue to only use text format until there is some admin
> intervention. When the admin is satisfied that all clients are on a
> new enough version, they call a script which bumps the version. From
> this point clients will write the binary version.
>
> New clusters go straight to binary. We will also need a script to dump
> the metadata from a znode. One wrinkle, which is another reason to
> create a new LedgerMetadata protobuf, is that when you are writing in
> text format, there's no way to filter the fields. So even if we write
> in text format, if we add new fields we are breaking old clients.
> Another approach would be to ensure the protobuf only contains the
> fields that are available now, but this seems messy to me.
>
> Anyhow, this is to be a jumping off point for discussion. Comments welcome,
>
> Cheers,
> Ivan
>