You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "张铎 (Duo Zhang)" <pa...@gmail.com> on 2021/08/14 02:58:45 UTC

[DISCUSS] About using masters as ConnectionRegistry endpoint

In HBASE-18095, the community provided a new way to get the registry
information of a cluster, without touching ZooKeeper. The decision at
that time was to use masters(including active and backup masters) as
the connection registry endpoint.

Later, when discussing how to implement splittable meta, we planned to
make use of this framework to hide the actual ROOT table
implementation. But then we found out that the approach of using
masters as connection registry, violates one of our tendencies that we
do not want to inline masters, especially the active master in the
normal read/write path.

The several sub tasks of HBASE-26149 aims to solve this problem. We
all agree that by default, we should not inline masters, but there are
some conflicts on whether to still allow end users to configure that
they want to use masters as registry endpoints, as it is a feature
which has already been published in our releases.

There are some discussions in the PR for HBASE-26172
https://github.com/apache/hbase/pull/3566#discussion_r684494130

Feel free to post your opinion here.

Thanks.

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
OK, seems no big concerns here.

Then I will go with the cleaner way, deprecated MasterRegistry and do
not return masters as registry endpoints.

Thanks.

Bharath Vissapragada <bh...@apache.org> 于2021年8月16日周一 上午8:20写道:
>
> Thanks, Duo. I commented on the PR but want to respond here too to kick
> start the discussion and in case anyone else has different viewpoints.
>
> I agree that the original decision of inlining active masters needs to be
> corrected going forward. I vote for the proposal to deprecate the master
> based registry in 2.5.0 in favor of a "RegionServer" based registry and
> remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
> configuration to allow masters as that violates the design principle that
> we all agreed upon and instead only use region servers as the registry
> hosting services.
>
>
> On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:
>
> > In HBASE-18095, the community provided a new way to get the registry
> > information of a cluster, without touching ZooKeeper. The decision at
> > that time was to use masters(including active and backup masters) as
> > the connection registry endpoint.
> >
> > Later, when discussing how to implement splittable meta, we planned to
> > make use of this framework to hide the actual ROOT table
> > implementation. But then we found out that the approach of using
> > masters as connection registry, violates one of our tendencies that we
> > do not want to inline masters, especially the active master in the
> > normal read/write path.
> >
> > The several sub tasks of HBASE-26149 aims to solve this problem. We
> > all agree that by default, we should not inline masters, but there are
> > some conflicts on whether to still allow end users to configure that
> > they want to use masters as registry endpoints, as it is a feature
> > which has already been published in our releases.
> >
> > There are some discussions in the PR for HBASE-26172
> > https://github.com/apache/hbase/pull/3566#discussion_r684494130
> >
> > Feel free to post your opinion here.
> >
> > Thanks.
> >

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
We already have a connection to zookeeper at region server side, and
we already watched a lot of things, so watching one more znode does
not add too much pressure to zk.
And I've been implementing a framework to replicate meta location
without zookeeper, this could reduce the pressure on zookeeper a lot.

On the data freshness, this is a problem. But anyway, in the old
master registry implementation, if you go to backup masters, you still
can't always get the newest data. This is a problem we should try to
do better, like using the above framework to replicate the
modification quickly, without involving active master in the normal
read/write path.

Nick Dimiduk <nd...@apache.org> 于2021年8月19日周四 上午2:38写道:
>
> I recall that there was a concern about data freshness, that the masters
> were all tracking the necessary state and all of them would always be
> up-to-date. Region servers were not already in this business, and doing so
> would put undue burden on ZooKeeper.
>
> Perhaps I miss-remember, or the discussion evolved since I last checked?
>
> On Sun, Aug 15, 2021 at 5:20 PM Bharath Vissapragada <bh...@apache.org>
> wrote:
>
> > Thanks, Duo. I commented on the PR but want to respond here too to kick
> > start the discussion and in case anyone else has different viewpoints.
> >
> > I agree that the original decision of inlining active masters needs to be
> > corrected going forward. I vote for the proposal to deprecate the master
> > based registry in 2.5.0 in favor of a "RegionServer" based registry and
> > remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
> > configuration to allow masters as that violates the design principle that
> > we all agreed upon and instead only use region servers as the registry
> > hosting services.
> >
> >
> > On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > In HBASE-18095, the community provided a new way to get the registry
> > > information of a cluster, without touching ZooKeeper. The decision at
> > > that time was to use masters(including active and backup masters) as
> > > the connection registry endpoint.
> > >
> > > Later, when discussing how to implement splittable meta, we planned to
> > > make use of this framework to hide the actual ROOT table
> > > implementation. But then we found out that the approach of using
> > > masters as connection registry, violates one of our tendencies that we
> > > do not want to inline masters, especially the active master in the
> > > normal read/write path.
> > >
> > > The several sub tasks of HBASE-26149 aims to solve this problem. We
> > > all agree that by default, we should not inline masters, but there are
> > > some conflicts on whether to still allow end users to configure that
> > > they want to use masters as registry endpoints, as it is a feature
> > > which has already been published in our releases.
> > >
> > > There are some discussions in the PR for HBASE-26172
> > > https://github.com/apache/hbase/pull/3566#discussion_r684494130
> > >
> > > Feel free to post your opinion here.
> > >
> > > Thanks.
> > >
> >

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by Bharath Vissapragada <bh...@apache.org>.
Are you referring to meta region location(s) cache? If so, yes we will have
more network chatter accessing those znode(s) whenever meta (replicas) move
around (what happens in split-meta is still under discussion as you
probably are already following, but it is likely that we will still use ZK
as the notification layer, so your question still holds). This design
offsets the reduction in client-ZK access load as clients don't hit ZK
anymore, so theoretically it is not any worse than what we have today. On
the plus side, as I mentioned we remove masters from the picture.

One simple improvement is to restrict the no. of regionservers maintaining
that cache this information (HBASE-26173
<https://issues.apache.org/jira/browse/HBASE-26173>), we don't need every
RS to do that and we can maintain a minimum # (or a %) that can choose to
do that.

On Wed, Aug 18, 2021 at 11:39 AM Nick Dimiduk <nd...@apache.org> wrote:

> I recall that there was a concern about data freshness, that the masters
> were all tracking the necessary state and all of them would always be
> up-to-date. Region servers were not already in this business, and doing so
> would put undue burden on ZooKeeper.
>
> Perhaps I miss-remember, or the discussion evolved since I last checked?
>
> On Sun, Aug 15, 2021 at 5:20 PM Bharath Vissapragada <bh...@apache.org>
> wrote:
>
> > Thanks, Duo. I commented on the PR but want to respond here too to kick
> > start the discussion and in case anyone else has different viewpoints.
> >
> > I agree that the original decision of inlining active masters needs to be
> > corrected going forward. I vote for the proposal to deprecate the master
> > based registry in 2.5.0 in favor of a "RegionServer" based registry and
> > remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
> > configuration to allow masters as that violates the design principle that
> > we all agreed upon and instead only use region servers as the registry
> > hosting services.
> >
> >
> > On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > In HBASE-18095, the community provided a new way to get the registry
> > > information of a cluster, without touching ZooKeeper. The decision at
> > > that time was to use masters(including active and backup masters) as
> > > the connection registry endpoint.
> > >
> > > Later, when discussing how to implement splittable meta, we planned to
> > > make use of this framework to hide the actual ROOT table
> > > implementation. But then we found out that the approach of using
> > > masters as connection registry, violates one of our tendencies that we
> > > do not want to inline masters, especially the active master in the
> > > normal read/write path.
> > >
> > > The several sub tasks of HBASE-26149 aims to solve this problem. We
> > > all agree that by default, we should not inline masters, but there are
> > > some conflicts on whether to still allow end users to configure that
> > > they want to use masters as registry endpoints, as it is a feature
> > > which has already been published in our releases.
> > >
> > > There are some discussions in the PR for HBASE-26172
> > > https://github.com/apache/hbase/pull/3566#discussion_r684494130
> > >
> > > Feel free to post your opinion here.
> > >
> > > Thanks.
> > >
> >
>

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by Bharath Vissapragada <bh...@apache.org>.
Are you referring to meta region location(s) cache? If so, yes we will have
more network chatter accessing those znode(s) whenever meta (replicas) move
around (what happens in split-meta is still under discussion as you
probably are already following, but it is likely that we will still use ZK
as the notification layer, so your question still holds). This design
offsets the reduction in client-ZK access load as clients don't hit ZK
anymore, so theoretically it is not any worse than what we have today. On
the plus side, as I mentioned we remove masters from the picture.

One simple improvement is to restrict the no. of regionservers maintaining
that cache this information (HBASE-26173
<https://issues.apache.org/jira/browse/HBASE-26173>), we don't need every
RS to do that and we can maintain a minimum # (or a %) that can choose to
do that.

On Wed, Aug 18, 2021 at 11:39 AM Nick Dimiduk <nd...@apache.org> wrote:

> I recall that there was a concern about data freshness, that the masters
> were all tracking the necessary state and all of them would always be
> up-to-date. Region servers were not already in this business, and doing so
> would put undue burden on ZooKeeper.
>
> Perhaps I miss-remember, or the discussion evolved since I last checked?
>
> On Sun, Aug 15, 2021 at 5:20 PM Bharath Vissapragada <bh...@apache.org>
> wrote:
>
> > Thanks, Duo. I commented on the PR but want to respond here too to kick
> > start the discussion and in case anyone else has different viewpoints.
> >
> > I agree that the original decision of inlining active masters needs to be
> > corrected going forward. I vote for the proposal to deprecate the master
> > based registry in 2.5.0 in favor of a "RegionServer" based registry and
> > remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
> > configuration to allow masters as that violates the design principle that
> > we all agreed upon and instead only use region servers as the registry
> > hosting services.
> >
> >
> > On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > In HBASE-18095, the community provided a new way to get the registry
> > > information of a cluster, without touching ZooKeeper. The decision at
> > > that time was to use masters(including active and backup masters) as
> > > the connection registry endpoint.
> > >
> > > Later, when discussing how to implement splittable meta, we planned to
> > > make use of this framework to hide the actual ROOT table
> > > implementation. But then we found out that the approach of using
> > > masters as connection registry, violates one of our tendencies that we
> > > do not want to inline masters, especially the active master in the
> > > normal read/write path.
> > >
> > > The several sub tasks of HBASE-26149 aims to solve this problem. We
> > > all agree that by default, we should not inline masters, but there are
> > > some conflicts on whether to still allow end users to configure that
> > > they want to use masters as registry endpoints, as it is a feature
> > > which has already been published in our releases.
> > >
> > > There are some discussions in the PR for HBASE-26172
> > > https://github.com/apache/hbase/pull/3566#discussion_r684494130
> > >
> > > Feel free to post your opinion here.
> > >
> > > Thanks.
> > >
> >
>

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
We already have a connection to zookeeper at region server side, and
we already watched a lot of things, so watching one more znode does
not add too much pressure to zk.
And I've been implementing a framework to replicate meta location
without zookeeper, this could reduce the pressure on zookeeper a lot.

On the data freshness, this is a problem. But anyway, in the old
master registry implementation, if you go to backup masters, you still
can't always get the newest data. This is a problem we should try to
do better, like using the above framework to replicate the
modification quickly, without involving active master in the normal
read/write path.

Nick Dimiduk <nd...@apache.org> 于2021年8月19日周四 上午2:38写道:
>
> I recall that there was a concern about data freshness, that the masters
> were all tracking the necessary state and all of them would always be
> up-to-date. Region servers were not already in this business, and doing so
> would put undue burden on ZooKeeper.
>
> Perhaps I miss-remember, or the discussion evolved since I last checked?
>
> On Sun, Aug 15, 2021 at 5:20 PM Bharath Vissapragada <bh...@apache.org>
> wrote:
>
> > Thanks, Duo. I commented on the PR but want to respond here too to kick
> > start the discussion and in case anyone else has different viewpoints.
> >
> > I agree that the original decision of inlining active masters needs to be
> > corrected going forward. I vote for the proposal to deprecate the master
> > based registry in 2.5.0 in favor of a "RegionServer" based registry and
> > remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
> > configuration to allow masters as that violates the design principle that
> > we all agreed upon and instead only use region servers as the registry
> > hosting services.
> >
> >
> > On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > In HBASE-18095, the community provided a new way to get the registry
> > > information of a cluster, without touching ZooKeeper. The decision at
> > > that time was to use masters(including active and backup masters) as
> > > the connection registry endpoint.
> > >
> > > Later, when discussing how to implement splittable meta, we planned to
> > > make use of this framework to hide the actual ROOT table
> > > implementation. But then we found out that the approach of using
> > > masters as connection registry, violates one of our tendencies that we
> > > do not want to inline masters, especially the active master in the
> > > normal read/write path.
> > >
> > > The several sub tasks of HBASE-26149 aims to solve this problem. We
> > > all agree that by default, we should not inline masters, but there are
> > > some conflicts on whether to still allow end users to configure that
> > > they want to use masters as registry endpoints, as it is a feature
> > > which has already been published in our releases.
> > >
> > > There are some discussions in the PR for HBASE-26172
> > > https://github.com/apache/hbase/pull/3566#discussion_r684494130
> > >
> > > Feel free to post your opinion here.
> > >
> > > Thanks.
> > >
> >

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by Nick Dimiduk <nd...@apache.org>.
I recall that there was a concern about data freshness, that the masters
were all tracking the necessary state and all of them would always be
up-to-date. Region servers were not already in this business, and doing so
would put undue burden on ZooKeeper.

Perhaps I miss-remember, or the discussion evolved since I last checked?

On Sun, Aug 15, 2021 at 5:20 PM Bharath Vissapragada <bh...@apache.org>
wrote:

> Thanks, Duo. I commented on the PR but want to respond here too to kick
> start the discussion and in case anyone else has different viewpoints.
>
> I agree that the original decision of inlining active masters needs to be
> corrected going forward. I vote for the proposal to deprecate the master
> based registry in 2.5.0 in favor of a "RegionServer" based registry and
> remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
> configuration to allow masters as that violates the design principle that
> we all agreed upon and instead only use region servers as the registry
> hosting services.
>
>
> On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > In HBASE-18095, the community provided a new way to get the registry
> > information of a cluster, without touching ZooKeeper. The decision at
> > that time was to use masters(including active and backup masters) as
> > the connection registry endpoint.
> >
> > Later, when discussing how to implement splittable meta, we planned to
> > make use of this framework to hide the actual ROOT table
> > implementation. But then we found out that the approach of using
> > masters as connection registry, violates one of our tendencies that we
> > do not want to inline masters, especially the active master in the
> > normal read/write path.
> >
> > The several sub tasks of HBASE-26149 aims to solve this problem. We
> > all agree that by default, we should not inline masters, but there are
> > some conflicts on whether to still allow end users to configure that
> > they want to use masters as registry endpoints, as it is a feature
> > which has already been published in our releases.
> >
> > There are some discussions in the PR for HBASE-26172
> > https://github.com/apache/hbase/pull/3566#discussion_r684494130
> >
> > Feel free to post your opinion here.
> >
> > Thanks.
> >
>

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
OK, seems no big concerns here.

Then I will go with the cleaner way, deprecated MasterRegistry and do
not return masters as registry endpoints.

Thanks.

Bharath Vissapragada <bh...@apache.org> 于2021年8月16日周一 上午8:20写道:
>
> Thanks, Duo. I commented on the PR but want to respond here too to kick
> start the discussion and in case anyone else has different viewpoints.
>
> I agree that the original decision of inlining active masters needs to be
> corrected going forward. I vote for the proposal to deprecate the master
> based registry in 2.5.0 in favor of a "RegionServer" based registry and
> remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
> configuration to allow masters as that violates the design principle that
> we all agreed upon and instead only use region servers as the registry
> hosting services.
>
>
> On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:
>
> > In HBASE-18095, the community provided a new way to get the registry
> > information of a cluster, without touching ZooKeeper. The decision at
> > that time was to use masters(including active and backup masters) as
> > the connection registry endpoint.
> >
> > Later, when discussing how to implement splittable meta, we planned to
> > make use of this framework to hide the actual ROOT table
> > implementation. But then we found out that the approach of using
> > masters as connection registry, violates one of our tendencies that we
> > do not want to inline masters, especially the active master in the
> > normal read/write path.
> >
> > The several sub tasks of HBASE-26149 aims to solve this problem. We
> > all agree that by default, we should not inline masters, but there are
> > some conflicts on whether to still allow end users to configure that
> > they want to use masters as registry endpoints, as it is a feature
> > which has already been published in our releases.
> >
> > There are some discussions in the PR for HBASE-26172
> > https://github.com/apache/hbase/pull/3566#discussion_r684494130
> >
> > Feel free to post your opinion here.
> >
> > Thanks.
> >

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by Nick Dimiduk <nd...@apache.org>.
I recall that there was a concern about data freshness, that the masters
were all tracking the necessary state and all of them would always be
up-to-date. Region servers were not already in this business, and doing so
would put undue burden on ZooKeeper.

Perhaps I miss-remember, or the discussion evolved since I last checked?

On Sun, Aug 15, 2021 at 5:20 PM Bharath Vissapragada <bh...@apache.org>
wrote:

> Thanks, Duo. I commented on the PR but want to respond here too to kick
> start the discussion and in case anyone else has different viewpoints.
>
> I agree that the original decision of inlining active masters needs to be
> corrected going forward. I vote for the proposal to deprecate the master
> based registry in 2.5.0 in favor of a "RegionServer" based registry and
> remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
> configuration to allow masters as that violates the design principle that
> we all agreed upon and instead only use region servers as the registry
> hosting services.
>
>
> On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > In HBASE-18095, the community provided a new way to get the registry
> > information of a cluster, without touching ZooKeeper. The decision at
> > that time was to use masters(including active and backup masters) as
> > the connection registry endpoint.
> >
> > Later, when discussing how to implement splittable meta, we planned to
> > make use of this framework to hide the actual ROOT table
> > implementation. But then we found out that the approach of using
> > masters as connection registry, violates one of our tendencies that we
> > do not want to inline masters, especially the active master in the
> > normal read/write path.
> >
> > The several sub tasks of HBASE-26149 aims to solve this problem. We
> > all agree that by default, we should not inline masters, but there are
> > some conflicts on whether to still allow end users to configure that
> > they want to use masters as registry endpoints, as it is a feature
> > which has already been published in our releases.
> >
> > There are some discussions in the PR for HBASE-26172
> > https://github.com/apache/hbase/pull/3566#discussion_r684494130
> >
> > Feel free to post your opinion here.
> >
> > Thanks.
> >
>

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by Bharath Vissapragada <bh...@apache.org>.
Thanks, Duo. I commented on the PR but want to respond here too to kick
start the discussion and in case anyone else has different viewpoints.

I agree that the original decision of inlining active masters needs to be
corrected going forward. I vote for the proposal to deprecate the master
based registry in 2.5.0 in favor of a "RegionServer" based registry and
remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
configuration to allow masters as that violates the design principle that
we all agreed upon and instead only use region servers as the registry
hosting services.


On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> In HBASE-18095, the community provided a new way to get the registry
> information of a cluster, without touching ZooKeeper. The decision at
> that time was to use masters(including active and backup masters) as
> the connection registry endpoint.
>
> Later, when discussing how to implement splittable meta, we planned to
> make use of this framework to hide the actual ROOT table
> implementation. But then we found out that the approach of using
> masters as connection registry, violates one of our tendencies that we
> do not want to inline masters, especially the active master in the
> normal read/write path.
>
> The several sub tasks of HBASE-26149 aims to solve this problem. We
> all agree that by default, we should not inline masters, but there are
> some conflicts on whether to still allow end users to configure that
> they want to use masters as registry endpoints, as it is a feature
> which has already been published in our releases.
>
> There are some discussions in the PR for HBASE-26172
> https://github.com/apache/hbase/pull/3566#discussion_r684494130
>
> Feel free to post your opinion here.
>
> Thanks.
>

Re: [DISCUSS] About using masters as ConnectionRegistry endpoint

Posted by Bharath Vissapragada <bh...@apache.org>.
Thanks, Duo. I commented on the PR but want to respond here too to kick
start the discussion and in case anyone else has different viewpoints.

I agree that the original decision of inlining active masters needs to be
corrected going forward. I vote for the proposal to deprecate the master
based registry in 2.5.0 in favor of a "RegionServer" based registry and
remove it completely in 4.0.0.  IMO we should *not *expose any opt-in
configuration to allow masters as that violates the design principle that
we all agreed upon and instead only use region servers as the registry
hosting services.


On Fri, Aug 13, 2021 at 7:59 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> In HBASE-18095, the community provided a new way to get the registry
> information of a cluster, without touching ZooKeeper. The decision at
> that time was to use masters(including active and backup masters) as
> the connection registry endpoint.
>
> Later, when discussing how to implement splittable meta, we planned to
> make use of this framework to hide the actual ROOT table
> implementation. But then we found out that the approach of using
> masters as connection registry, violates one of our tendencies that we
> do not want to inline masters, especially the active master in the
> normal read/write path.
>
> The several sub tasks of HBASE-26149 aims to solve this problem. We
> all agree that by default, we should not inline masters, but there are
> some conflicts on whether to still allow end users to configure that
> they want to use masters as registry endpoints, as it is a feature
> which has already been published in our releases.
>
> There are some discussions in the PR for HBASE-26172
> https://github.com/apache/hbase/pull/3566#discussion_r684494130
>
> Feel free to post your opinion here.
>
> Thanks.
>