You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "张铎 (Duo Zhang)" <pa...@gmail.com> on 2021/04/26 03:44:59 UTC

[SURVEY] The current usage of favor node balancer across the community

As you all know, we always want to reduce the size of the hbase-server
module. This time we want to separate the balancer related code to another
sub module.

The design doc:
https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#

You can see the bottom of the design doc, favor node balancer is a problem,
as it stores the favor node information in hbase:meta. Stack mentioned that
the feature is already dead, maybe we could just purge it from our code
base.

So here we want to know if there are still some users in the community who
still use favor node balancer. Please share your experience and whether you
still want to use it.

Thanks.

Re: [E] Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Thiruvel Thirumoolan <th...@verizonmedia.com.INVALID>.
We at VerizonMedia (Yahoo!) have been running FavoredNodes for about 4
years now in production and it has helped us a lot with our scale.

Since we started working on this many years ago, we have contributed
patches to upstream, although not much recently. We will resume
contributing the remaining patches as part of our migration to 2.x. They
will be part of https://issues.apache.org/jira/browse/HBASE-15531. Most of
the code lives in the Favored Node based classes and I think that's helpful
for maintenance as well.

Hello Mallikarjun,
We are glad to see you use and benefit from it. As Stack mentioned, it
would be good to see a writeup of your experience and enhancements.

Thanks!
Thiruvel

On Tue, Apr 27, 2021 at 1:35 PM Stack <st...@duboce.net> wrote:

> On Mon, Apr 26, 2021 at 7:30 PM Mallikarjun <ma...@gmail.com>
> wrote:
>
> > Inline reply
> >
> > On Tue, Apr 27, 2021 at 1:03 AM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:
> > >
> > > > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <
> mallik.v.arjun@gmail.com>
> > > > wrote:
> > > >
> > > >> We use FavoredStochasticBalancer, which by description says the same
> > > thing
> > > >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to
> be
> > > >>
> > > >>
> > > >
> > > > Other concerns:
> > > >
> > > >  * Hard-coded triplet of nodes that will inevitably rot as machines
> > come
> > > > and go (Are there tools for remediation?)
> > >
> >
> > It doesn't really rot, if you think it with balancer responsible to
> > assigning regions
> >
> > 1. On every region assigned to a particular regionserver, the balancer
> > would have to reassign this triplet and hence there is no scope of rot
> > (Same logic applied to WAL as well). (On compaction hdfs blocks will be
> > pulled back if any spill over)
> >
> >
> I don't follow the above but no harm; I can wait for the write-up (smile).
>
>
>
> > 2. We used hostnames only (so, come and go is not going to be new nodes
> but
> > same hostnames)
> >
> >
> Ack.
>
>
> > Couple of outstanding problems though.
> >
> > 1. We couldn't increase replication factor to > 3. Which was fine so far
> > for our use cases. But we have had thoughts around fixing them.
> >
> >
> Not the end-of-the-world I'd say. Would be nice to have though.
>
>
>
> > 2. Balancer doesn't understand favored nodes construct, perfect balanced
> fn
> > among the rsgroup datanodes isn't possible, but with some variance like
> > 10-20% difference is expected
> >
> >
> Can be worked on.....
>
>
>
> >
> > > >  * A workaround for a facility that belongs in the NN
> > >
> >
> > Probably, you can argue both ways. Hbase is the owner of data
>
>
>
> Sort-of. NN hands out where replicas should be placed according to its
> configured policies. Then there is the HDFS balancer....
>
> ....
>
>
>
> > One more concern was that the feature was dead/unused. You seem to refute
> > > this notion of mine.
> > > S
> > >
> >
> > We have been using this for more than a year with hbase 2.1 in highly
> > critical workloads for our company. And several years with hbase 1.2 as
> > well with backporting rsgroup from master at that time. (2017-18 ish)
> >
> > And it has been very smooth operationally in hbase 2.1
> >
> >
> Sweet.
>
> Trying to get the other FN users to show up here on this thread to speak of
> their experience....
>
> Thanks for speaking up,
> S
>
>
> >
> > >
> > >
> > > >
> > > >
> > > >> Going a step back.
> > > >>
> > > >> Did we ever consider giving a thought towards truely multi-tenant
> > hbase?
> > > >>
> > > >
> > > > Always.
> > > >
> > > >
> > > >> Where each rsgroup has a group of datanodes and namespace tables
> data
> > > >> created under that particular rsgroup would sit on those datanodes
> > only?
> > > >> We
> > > >> have attempted to do that and have largely been very successful
> > running
> > > >> clusters of hundreds of terabytes with hundreds of
> > > >> regionservers(datanodes)
> > > >> per cluster.
> > > >>
> > > >>
> > > > So isolation of load by node? (I believe this is where the rsgroup
> > > feature
> > > > came from originally; the desire for a deploy like you describe
> above.
> > > > IIUC, its what Thiru and crew run).
> > > >
> > > >
> > > >
> > > >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> > > >> contributed by Thiruvel Thirumoolan -->
> > > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HBASE-2D15533&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0hlxL7SW6tggJAukksUnJkMcDJvBO7w1VPzLNvS9baM&m=CxuaNJnoCdjCh384YN2AaTDxatsoQd5j15UwzOOyEtg&s=uLteKIWCi9WvsVltCIY01718uVnqkLYKTpRPlz_Ke1o&e=
> > > >>
> > > >> On each balance operation, while the region is moved around (or
> while
> > > >> creating table), favored nodes are assigned based on the rsgroup
> that
> > > >> region is pinned to. And hence data is pinned to those datanodes
> only
> > > >> (Pinning favored nodes is best effort from the hdfs side, but there
> > are
> > > >> only a few exception scenarios where data will be spilled over and
> > they
> > > >> recover after a major compaction).
> > > >>
> > > >>
> > > > Sounds like you have studied this deploy in operation. Write it up?
> > Blog
> > > > post on hbase.apache.org?
> > > >
> > >
> >
> > Definitely will write up.
> >
> >
> > > >
> > > >
> > > >> 2. We have introduced several balancer cost functions to restore
> > things
> > > to
> > > >> normalcy (multi tenancy with fn pinning) such as when a node is
> dead,
> > or
> > > >> when fn's are imbalanced within the same rsgroup, etc.
> > > >>
> > > >> 3. We had diverse workloads under the same cluster and WAL isolation
> > > >> became
> > > >> a requirement and we went ahead with similar philosophy mentioned in
> > > line
> > > >> 1. Where WAL's are created with FN pinning so that they are tied to
> > > >> datanodes belonging to the same rsgroup. Some discussion seems to
> have
> > > >> happened here -->
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HBASE-2D21641&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0hlxL7SW6tggJAukksUnJkMcDJvBO7w1VPzLNvS9baM&m=CxuaNJnoCdjCh384YN2AaTDxatsoQd5j15UwzOOyEtg&s=D4DMuFwLNtesIeoCU9QIwHYM6I3eEPKrBUEYuIoZJ9A&e=
> > > >>
> > > >> There are several other enhancements we have worked on with respect
> to
> > > >> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> > > >> replication, etc.
> > > >>
> > > >> For above use cases, we would be needing fn information on
> hbase:meta.
> > > >>
> > > >> If the use case seems to be a fit for how we would want hbase to be
> > > taken
> > > >> forward as one of the supported use cases, willing to contribute our
> > > >> changes back to the community. (I was anyway planning to initiate
> this
> > > >> discussion)
> > > >>
> > > >
> > > > Contribs always welcome.
> > >
> >
> > Happy to see our thoughts are in line. We will prepare a plan on these
> > contributions.
> >
> >
> > > >
> > > > Thanks Malilkarjun,
> > > > S
> > > >
> > > >
> > > >
> > > >>
> > > >> To strengthen the above use case. Here is what one of our multi
> tenant
> > > >> cluster looks like
> > > >>
> > > >> RSGroups(Tenants): 21 (With tenant isolation)
> > > >> Regionservers: 275
> > > >> Regions Hosted: 6k
> > > >> Tables Hosted: 87
> > > >> Capacity: 250 TB (100TB used)
> > > >>
> > > >> ---
> > > >> Mallikarjun
> > > >>
> > > >>
> > > >> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <
> palomino219@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > As you all know, we always want to reduce the size of the
> > hbase-server
> > > >> > module. This time we want to separate the balancer related code to
> > > >> another
> > > >> > sub module.
> > > >> >
> > > >> > The design doc:
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> > > >> >
> > > >> > You can see the bottom of the design doc, favor node balancer is a
> > > >> problem,
> > > >> > as it stores the favor node information in hbase:meta. Stack
> > mentioned
> > > >> that
> > > >> > the feature is already dead, maybe we could just purge it from our
> > > code
> > > >> > base.
> > > >> >
> > > >> > So here we want to know if there are still some users in the
> > community
> > > >> who
> > > >> > still use favor node balancer. Please share your experience and
> > > whether
> > > >> you
> > > >> > still want to use it.
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [E] Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Thiruvel Thirumoolan <th...@verizonmedia.com.INVALID>.
We at VerizonMedia (Yahoo!) have been running FavoredNodes for about 4
years now in production and it has helped us a lot with our scale.

Since we started working on this many years ago, we have contributed
patches to upstream, although not much recently. We will resume
contributing the remaining patches as part of our migration to 2.x. They
will be part of https://issues.apache.org/jira/browse/HBASE-15531. Most of
the code lives in the Favored Node based classes and I think that's helpful
for maintenance as well.

Hello Mallikarjun,
We are glad to see you use and benefit from it. As Stack mentioned, it
would be good to see a writeup of your experience and enhancements.

Thanks!
Thiruvel

On Tue, Apr 27, 2021 at 1:35 PM Stack <st...@duboce.net> wrote:

> On Mon, Apr 26, 2021 at 7:30 PM Mallikarjun <ma...@gmail.com>
> wrote:
>
> > Inline reply
> >
> > On Tue, Apr 27, 2021 at 1:03 AM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:
> > >
> > > > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <
> mallik.v.arjun@gmail.com>
> > > > wrote:
> > > >
> > > >> We use FavoredStochasticBalancer, which by description says the same
> > > thing
> > > >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to
> be
> > > >>
> > > >>
> > > >
> > > > Other concerns:
> > > >
> > > >  * Hard-coded triplet of nodes that will inevitably rot as machines
> > come
> > > > and go (Are there tools for remediation?)
> > >
> >
> > It doesn't really rot, if you think it with balancer responsible to
> > assigning regions
> >
> > 1. On every region assigned to a particular regionserver, the balancer
> > would have to reassign this triplet and hence there is no scope of rot
> > (Same logic applied to WAL as well). (On compaction hdfs blocks will be
> > pulled back if any spill over)
> >
> >
> I don't follow the above but no harm; I can wait for the write-up (smile).
>
>
>
> > 2. We used hostnames only (so, come and go is not going to be new nodes
> but
> > same hostnames)
> >
> >
> Ack.
>
>
> > Couple of outstanding problems though.
> >
> > 1. We couldn't increase replication factor to > 3. Which was fine so far
> > for our use cases. But we have had thoughts around fixing them.
> >
> >
> Not the end-of-the-world I'd say. Would be nice to have though.
>
>
>
> > 2. Balancer doesn't understand favored nodes construct, perfect balanced
> fn
> > among the rsgroup datanodes isn't possible, but with some variance like
> > 10-20% difference is expected
> >
> >
> Can be worked on.....
>
>
>
> >
> > > >  * A workaround for a facility that belongs in the NN
> > >
> >
> > Probably, you can argue both ways. Hbase is the owner of data
>
>
>
> Sort-of. NN hands out where replicas should be placed according to its
> configured policies. Then there is the HDFS balancer....
>
> ....
>
>
>
> > One more concern was that the feature was dead/unused. You seem to refute
> > > this notion of mine.
> > > S
> > >
> >
> > We have been using this for more than a year with hbase 2.1 in highly
> > critical workloads for our company. And several years with hbase 1.2 as
> > well with backporting rsgroup from master at that time. (2017-18 ish)
> >
> > And it has been very smooth operationally in hbase 2.1
> >
> >
> Sweet.
>
> Trying to get the other FN users to show up here on this thread to speak of
> their experience....
>
> Thanks for speaking up,
> S
>
>
> >
> > >
> > >
> > > >
> > > >
> > > >> Going a step back.
> > > >>
> > > >> Did we ever consider giving a thought towards truely multi-tenant
> > hbase?
> > > >>
> > > >
> > > > Always.
> > > >
> > > >
> > > >> Where each rsgroup has a group of datanodes and namespace tables
> data
> > > >> created under that particular rsgroup would sit on those datanodes
> > only?
> > > >> We
> > > >> have attempted to do that and have largely been very successful
> > running
> > > >> clusters of hundreds of terabytes with hundreds of
> > > >> regionservers(datanodes)
> > > >> per cluster.
> > > >>
> > > >>
> > > > So isolation of load by node? (I believe this is where the rsgroup
> > > feature
> > > > came from originally; the desire for a deploy like you describe
> above.
> > > > IIUC, its what Thiru and crew run).
> > > >
> > > >
> > > >
> > > >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> > > >> contributed by Thiruvel Thirumoolan -->
> > > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HBASE-2D15533&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0hlxL7SW6tggJAukksUnJkMcDJvBO7w1VPzLNvS9baM&m=CxuaNJnoCdjCh384YN2AaTDxatsoQd5j15UwzOOyEtg&s=uLteKIWCi9WvsVltCIY01718uVnqkLYKTpRPlz_Ke1o&e=
> > > >>
> > > >> On each balance operation, while the region is moved around (or
> while
> > > >> creating table), favored nodes are assigned based on the rsgroup
> that
> > > >> region is pinned to. And hence data is pinned to those datanodes
> only
> > > >> (Pinning favored nodes is best effort from the hdfs side, but there
> > are
> > > >> only a few exception scenarios where data will be spilled over and
> > they
> > > >> recover after a major compaction).
> > > >>
> > > >>
> > > > Sounds like you have studied this deploy in operation. Write it up?
> > Blog
> > > > post on hbase.apache.org?
> > > >
> > >
> >
> > Definitely will write up.
> >
> >
> > > >
> > > >
> > > >> 2. We have introduced several balancer cost functions to restore
> > things
> > > to
> > > >> normalcy (multi tenancy with fn pinning) such as when a node is
> dead,
> > or
> > > >> when fn's are imbalanced within the same rsgroup, etc.
> > > >>
> > > >> 3. We had diverse workloads under the same cluster and WAL isolation
> > > >> became
> > > >> a requirement and we went ahead with similar philosophy mentioned in
> > > line
> > > >> 1. Where WAL's are created with FN pinning so that they are tied to
> > > >> datanodes belonging to the same rsgroup. Some discussion seems to
> have
> > > >> happened here -->
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HBASE-2D21641&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0hlxL7SW6tggJAukksUnJkMcDJvBO7w1VPzLNvS9baM&m=CxuaNJnoCdjCh384YN2AaTDxatsoQd5j15UwzOOyEtg&s=D4DMuFwLNtesIeoCU9QIwHYM6I3eEPKrBUEYuIoZJ9A&e=
> > > >>
> > > >> There are several other enhancements we have worked on with respect
> to
> > > >> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> > > >> replication, etc.
> > > >>
> > > >> For above use cases, we would be needing fn information on
> hbase:meta.
> > > >>
> > > >> If the use case seems to be a fit for how we would want hbase to be
> > > taken
> > > >> forward as one of the supported use cases, willing to contribute our
> > > >> changes back to the community. (I was anyway planning to initiate
> this
> > > >> discussion)
> > > >>
> > > >
> > > > Contribs always welcome.
> > >
> >
> > Happy to see our thoughts are in line. We will prepare a plan on these
> > contributions.
> >
> >
> > > >
> > > > Thanks Malilkarjun,
> > > > S
> > > >
> > > >
> > > >
> > > >>
> > > >> To strengthen the above use case. Here is what one of our multi
> tenant
> > > >> cluster looks like
> > > >>
> > > >> RSGroups(Tenants): 21 (With tenant isolation)
> > > >> Regionservers: 275
> > > >> Regions Hosted: 6k
> > > >> Tables Hosted: 87
> > > >> Capacity: 250 TB (100TB used)
> > > >>
> > > >> ---
> > > >> Mallikarjun
> > > >>
> > > >>
> > > >> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <
> palomino219@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > As you all know, we always want to reduce the size of the
> > hbase-server
> > > >> > module. This time we want to separate the balancer related code to
> > > >> another
> > > >> > sub module.
> > > >> >
> > > >> > The design doc:
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> > > >> >
> > > >> > You can see the bottom of the design doc, favor node balancer is a
> > > >> problem,
> > > >> > as it stores the favor node information in hbase:meta. Stack
> > mentioned
> > > >> that
> > > >> > the feature is already dead, maybe we could just purge it from our
> > > code
> > > >> > base.
> > > >> >
> > > >> > So here we want to know if there are still some users in the
> > community
> > > >> who
> > > >> > still use favor node balancer. Please share your experience and
> > > whether
> > > >> you
> > > >> > still want to use it.
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Stack <st...@duboce.net>.
On Mon, Apr 26, 2021 at 7:30 PM Mallikarjun <ma...@gmail.com>
wrote:

> Inline reply
>
> On Tue, Apr 27, 2021 at 1:03 AM Stack <st...@duboce.net> wrote:
>
> > On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <ma...@gmail.com>
> > > wrote:
> > >
> > >> We use FavoredStochasticBalancer, which by description says the same
> > thing
> > >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
> > >>
> > >>
> > >
> > > Other concerns:
> > >
> > >  * Hard-coded triplet of nodes that will inevitably rot as machines
> come
> > > and go (Are there tools for remediation?)
> >
>
> It doesn't really rot, if you think it with balancer responsible to
> assigning regions
>
> 1. On every region assigned to a particular regionserver, the balancer
> would have to reassign this triplet and hence there is no scope of rot
> (Same logic applied to WAL as well). (On compaction hdfs blocks will be
> pulled back if any spill over)
>
>
I don't follow the above but no harm; I can wait for the write-up (smile).



> 2. We used hostnames only (so, come and go is not going to be new nodes but
> same hostnames)
>
>
Ack.


> Couple of outstanding problems though.
>
> 1. We couldn't increase replication factor to > 3. Which was fine so far
> for our use cases. But we have had thoughts around fixing them.
>
>
Not the end-of-the-world I'd say. Would be nice to have though.



> 2. Balancer doesn't understand favored nodes construct, perfect balanced fn
> among the rsgroup datanodes isn't possible, but with some variance like
> 10-20% difference is expected
>
>
Can be worked on.....



>
> > >  * A workaround for a facility that belongs in the NN
> >
>
> Probably, you can argue both ways. Hbase is the owner of data



Sort-of. NN hands out where replicas should be placed according to its
configured policies. Then there is the HDFS balancer....

....



> One more concern was that the feature was dead/unused. You seem to refute
> > this notion of mine.
> > S
> >
>
> We have been using this for more than a year with hbase 2.1 in highly
> critical workloads for our company. And several years with hbase 1.2 as
> well with backporting rsgroup from master at that time. (2017-18 ish)
>
> And it has been very smooth operationally in hbase 2.1
>
>
Sweet.

Trying to get the other FN users to show up here on this thread to speak of
their experience....

Thanks for speaking up,
S


>
> >
> >
> > >
> > >
> > >> Going a step back.
> > >>
> > >> Did we ever consider giving a thought towards truely multi-tenant
> hbase?
> > >>
> > >
> > > Always.
> > >
> > >
> > >> Where each rsgroup has a group of datanodes and namespace tables data
> > >> created under that particular rsgroup would sit on those datanodes
> only?
> > >> We
> > >> have attempted to do that and have largely been very successful
> running
> > >> clusters of hundreds of terabytes with hundreds of
> > >> regionservers(datanodes)
> > >> per cluster.
> > >>
> > >>
> > > So isolation of load by node? (I believe this is where the rsgroup
> > feature
> > > came from originally; the desire for a deploy like you describe above.
> > > IIUC, its what Thiru and crew run).
> > >
> > >
> > >
> > >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> > >> contributed by Thiruvel Thirumoolan -->
> > >> https://issues.apache.org/jira/browse/HBASE-15533
> > >>
> > >> On each balance operation, while the region is moved around (or while
> > >> creating table), favored nodes are assigned based on the rsgroup that
> > >> region is pinned to. And hence data is pinned to those datanodes only
> > >> (Pinning favored nodes is best effort from the hdfs side, but there
> are
> > >> only a few exception scenarios where data will be spilled over and
> they
> > >> recover after a major compaction).
> > >>
> > >>
> > > Sounds like you have studied this deploy in operation. Write it up?
> Blog
> > > post on hbase.apache.org?
> > >
> >
>
> Definitely will write up.
>
>
> > >
> > >
> > >> 2. We have introduced several balancer cost functions to restore
> things
> > to
> > >> normalcy (multi tenancy with fn pinning) such as when a node is dead,
> or
> > >> when fn's are imbalanced within the same rsgroup, etc.
> > >>
> > >> 3. We had diverse workloads under the same cluster and WAL isolation
> > >> became
> > >> a requirement and we went ahead with similar philosophy mentioned in
> > line
> > >> 1. Where WAL's are created with FN pinning so that they are tied to
> > >> datanodes belonging to the same rsgroup. Some discussion seems to have
> > >> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
> > >>
> > >> There are several other enhancements we have worked on with respect to
> > >> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> > >> replication, etc.
> > >>
> > >> For above use cases, we would be needing fn information on hbase:meta.
> > >>
> > >> If the use case seems to be a fit for how we would want hbase to be
> > taken
> > >> forward as one of the supported use cases, willing to contribute our
> > >> changes back to the community. (I was anyway planning to initiate this
> > >> discussion)
> > >>
> > >
> > > Contribs always welcome.
> >
>
> Happy to see our thoughts are in line. We will prepare a plan on these
> contributions.
>
>
> > >
> > > Thanks Malilkarjun,
> > > S
> > >
> > >
> > >
> > >>
> > >> To strengthen the above use case. Here is what one of our multi tenant
> > >> cluster looks like
> > >>
> > >> RSGroups(Tenants): 21 (With tenant isolation)
> > >> Regionservers: 275
> > >> Regions Hosted: 6k
> > >> Tables Hosted: 87
> > >> Capacity: 250 TB (100TB used)
> > >>
> > >> ---
> > >> Mallikarjun
> > >>
> > >>
> > >> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com>
> > >> wrote:
> > >>
> > >> > As you all know, we always want to reduce the size of the
> hbase-server
> > >> > module. This time we want to separate the balancer related code to
> > >> another
> > >> > sub module.
> > >> >
> > >> > The design doc:
> > >> >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> > >> >
> > >> > You can see the bottom of the design doc, favor node balancer is a
> > >> problem,
> > >> > as it stores the favor node information in hbase:meta. Stack
> mentioned
> > >> that
> > >> > the feature is already dead, maybe we could just purge it from our
> > code
> > >> > base.
> > >> >
> > >> > So here we want to know if there are still some users in the
> community
> > >> who
> > >> > still use favor node balancer. Please share your experience and
> > whether
> > >> you
> > >> > still want to use it.
> > >> >
> > >> > Thanks.
> > >> >
> > >>
> > >
> >
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Stack <st...@duboce.net>.
On Mon, Apr 26, 2021 at 7:30 PM Mallikarjun <ma...@gmail.com>
wrote:

> Inline reply
>
> On Tue, Apr 27, 2021 at 1:03 AM Stack <st...@duboce.net> wrote:
>
> > On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <ma...@gmail.com>
> > > wrote:
> > >
> > >> We use FavoredStochasticBalancer, which by description says the same
> > thing
> > >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
> > >>
> > >>
> > >
> > > Other concerns:
> > >
> > >  * Hard-coded triplet of nodes that will inevitably rot as machines
> come
> > > and go (Are there tools for remediation?)
> >
>
> It doesn't really rot, if you think it with balancer responsible to
> assigning regions
>
> 1. On every region assigned to a particular regionserver, the balancer
> would have to reassign this triplet and hence there is no scope of rot
> (Same logic applied to WAL as well). (On compaction hdfs blocks will be
> pulled back if any spill over)
>
>
I don't follow the above but no harm; I can wait for the write-up (smile).



> 2. We used hostnames only (so, come and go is not going to be new nodes but
> same hostnames)
>
>
Ack.


> Couple of outstanding problems though.
>
> 1. We couldn't increase replication factor to > 3. Which was fine so far
> for our use cases. But we have had thoughts around fixing them.
>
>
Not the end-of-the-world I'd say. Would be nice to have though.



> 2. Balancer doesn't understand favored nodes construct, perfect balanced fn
> among the rsgroup datanodes isn't possible, but with some variance like
> 10-20% difference is expected
>
>
Can be worked on.....



>
> > >  * A workaround for a facility that belongs in the NN
> >
>
> Probably, you can argue both ways. Hbase is the owner of data



Sort-of. NN hands out where replicas should be placed according to its
configured policies. Then there is the HDFS balancer....

....



> One more concern was that the feature was dead/unused. You seem to refute
> > this notion of mine.
> > S
> >
>
> We have been using this for more than a year with hbase 2.1 in highly
> critical workloads for our company. And several years with hbase 1.2 as
> well with backporting rsgroup from master at that time. (2017-18 ish)
>
> And it has been very smooth operationally in hbase 2.1
>
>
Sweet.

Trying to get the other FN users to show up here on this thread to speak of
their experience....

Thanks for speaking up,
S


>
> >
> >
> > >
> > >
> > >> Going a step back.
> > >>
> > >> Did we ever consider giving a thought towards truely multi-tenant
> hbase?
> > >>
> > >
> > > Always.
> > >
> > >
> > >> Where each rsgroup has a group of datanodes and namespace tables data
> > >> created under that particular rsgroup would sit on those datanodes
> only?
> > >> We
> > >> have attempted to do that and have largely been very successful
> running
> > >> clusters of hundreds of terabytes with hundreds of
> > >> regionservers(datanodes)
> > >> per cluster.
> > >>
> > >>
> > > So isolation of load by node? (I believe this is where the rsgroup
> > feature
> > > came from originally; the desire for a deploy like you describe above.
> > > IIUC, its what Thiru and crew run).
> > >
> > >
> > >
> > >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> > >> contributed by Thiruvel Thirumoolan -->
> > >> https://issues.apache.org/jira/browse/HBASE-15533
> > >>
> > >> On each balance operation, while the region is moved around (or while
> > >> creating table), favored nodes are assigned based on the rsgroup that
> > >> region is pinned to. And hence data is pinned to those datanodes only
> > >> (Pinning favored nodes is best effort from the hdfs side, but there
> are
> > >> only a few exception scenarios where data will be spilled over and
> they
> > >> recover after a major compaction).
> > >>
> > >>
> > > Sounds like you have studied this deploy in operation. Write it up?
> Blog
> > > post on hbase.apache.org?
> > >
> >
>
> Definitely will write up.
>
>
> > >
> > >
> > >> 2. We have introduced several balancer cost functions to restore
> things
> > to
> > >> normalcy (multi tenancy with fn pinning) such as when a node is dead,
> or
> > >> when fn's are imbalanced within the same rsgroup, etc.
> > >>
> > >> 3. We had diverse workloads under the same cluster and WAL isolation
> > >> became
> > >> a requirement and we went ahead with similar philosophy mentioned in
> > line
> > >> 1. Where WAL's are created with FN pinning so that they are tied to
> > >> datanodes belonging to the same rsgroup. Some discussion seems to have
> > >> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
> > >>
> > >> There are several other enhancements we have worked on with respect to
> > >> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> > >> replication, etc.
> > >>
> > >> For above use cases, we would be needing fn information on hbase:meta.
> > >>
> > >> If the use case seems to be a fit for how we would want hbase to be
> > taken
> > >> forward as one of the supported use cases, willing to contribute our
> > >> changes back to the community. (I was anyway planning to initiate this
> > >> discussion)
> > >>
> > >
> > > Contribs always welcome.
> >
>
> Happy to see our thoughts are in line. We will prepare a plan on these
> contributions.
>
>
> > >
> > > Thanks Malilkarjun,
> > > S
> > >
> > >
> > >
> > >>
> > >> To strengthen the above use case. Here is what one of our multi tenant
> > >> cluster looks like
> > >>
> > >> RSGroups(Tenants): 21 (With tenant isolation)
> > >> Regionservers: 275
> > >> Regions Hosted: 6k
> > >> Tables Hosted: 87
> > >> Capacity: 250 TB (100TB used)
> > >>
> > >> ---
> > >> Mallikarjun
> > >>
> > >>
> > >> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com>
> > >> wrote:
> > >>
> > >> > As you all know, we always want to reduce the size of the
> hbase-server
> > >> > module. This time we want to separate the balancer related code to
> > >> another
> > >> > sub module.
> > >> >
> > >> > The design doc:
> > >> >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> > >> >
> > >> > You can see the bottom of the design doc, favor node balancer is a
> > >> problem,
> > >> > as it stores the favor node information in hbase:meta. Stack
> mentioned
> > >> that
> > >> > the feature is already dead, maybe we could just purge it from our
> > code
> > >> > base.
> > >> >
> > >> > So here we want to know if there are still some users in the
> community
> > >> who
> > >> > still use favor node balancer. Please share your experience and
> > whether
> > >> you
> > >> > still want to use it.
> > >> >
> > >> > Thanks.
> > >> >
> > >>
> > >
> >
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Mallikarjun <ma...@gmail.com>.
Inline reply

On Tue, Apr 27, 2021 at 1:03 AM Stack <st...@duboce.net> wrote:

> On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:
>
> > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <ma...@gmail.com>
> > wrote:
> >
> >> We use FavoredStochasticBalancer, which by description says the same
> thing
> >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
> >>
> >>
> >
> > Other concerns:
> >
> >  * Hard-coded triplet of nodes that will inevitably rot as machines come
> > and go (Are there tools for remediation?)
>

It doesn't really rot, if you think it with balancer responsible to
assigning regions

1. On every region assigned to a particular regionserver, the balancer
would have to reassign this triplet and hence there is no scope of rot
(Same logic applied to WAL as well). (On compaction hdfs blocks will be
pulled back if any spill over)

2. We used hostnames only (so, come and go is not going to be new nodes but
same hostnames)

Couple of outstanding problems though.

1. We couldn't increase replication factor to > 3. Which was fine so far
for our use cases. But we have had thoughts around fixing them.

2. Balancer doesn't understand favored nodes construct, perfect balanced fn
among the rsgroup datanodes isn't possible, but with some variance like
10-20% difference is expected


> >  * A workaround for a facility that belongs in the NN
>

Probably, you can argue both ways. Hbase is the owner of data and hbase has
the authority to dictate where a particular region replica sits. Benefits
like data locality will be mostly around 1, rack awareness is more aligned
to this strategy and so on.

Moreover, HDFS has data pinning for clients to make use of it. Isn't it?


> >  * Opaque in operation
>

We haven't looked around wrapping these operations around metrics, so that
it is no longer opaque and reasons mentioned in the above point.


> >  * My understanding was that the feature was never finished; in
> particular
> > the balancer wasn't properly wired- up (Happy to be incorrect here).
> >
> >
> One more concern was that the feature was dead/unused. You seem to refute
> this notion of mine.
> S
>

We have been using this for more than a year with hbase 2.1 in highly
critical workloads for our company. And several years with hbase 1.2 as
well with backporting rsgroup from master at that time. (2017-18 ish)

And it has been very smooth operationally in hbase 2.1


>
>
> >
> >
> >> Going a step back.
> >>
> >> Did we ever consider giving a thought towards truely multi-tenant hbase?
> >>
> >
> > Always.
> >
> >
> >> Where each rsgroup has a group of datanodes and namespace tables data
> >> created under that particular rsgroup would sit on those datanodes only?
> >> We
> >> have attempted to do that and have largely been very successful running
> >> clusters of hundreds of terabytes with hundreds of
> >> regionservers(datanodes)
> >> per cluster.
> >>
> >>
> > So isolation of load by node? (I believe this is where the rsgroup
> feature
> > came from originally; the desire for a deploy like you describe above.
> > IIUC, its what Thiru and crew run).
> >
> >
> >
> >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> >> contributed by Thiruvel Thirumoolan -->
> >> https://issues.apache.org/jira/browse/HBASE-15533
> >>
> >> On each balance operation, while the region is moved around (or while
> >> creating table), favored nodes are assigned based on the rsgroup that
> >> region is pinned to. And hence data is pinned to those datanodes only
> >> (Pinning favored nodes is best effort from the hdfs side, but there are
> >> only a few exception scenarios where data will be spilled over and they
> >> recover after a major compaction).
> >>
> >>
> > Sounds like you have studied this deploy in operation. Write it up? Blog
> > post on hbase.apache.org?
> >
>

Definitely will write up.


> >
> >
> >> 2. We have introduced several balancer cost functions to restore things
> to
> >> normalcy (multi tenancy with fn pinning) such as when a node is dead, or
> >> when fn's are imbalanced within the same rsgroup, etc.
> >>
> >> 3. We had diverse workloads under the same cluster and WAL isolation
> >> became
> >> a requirement and we went ahead with similar philosophy mentioned in
> line
> >> 1. Where WAL's are created with FN pinning so that they are tied to
> >> datanodes belonging to the same rsgroup. Some discussion seems to have
> >> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
> >>
> >> There are several other enhancements we have worked on with respect to
> >> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> >> replication, etc.
> >>
> >> For above use cases, we would be needing fn information on hbase:meta.
> >>
> >> If the use case seems to be a fit for how we would want hbase to be
> taken
> >> forward as one of the supported use cases, willing to contribute our
> >> changes back to the community. (I was anyway planning to initiate this
> >> discussion)
> >>
> >
> > Contribs always welcome.
>

Happy to see our thoughts are in line. We will prepare a plan on these
contributions.


> >
> > Thanks Malilkarjun,
> > S
> >
> >
> >
> >>
> >> To strengthen the above use case. Here is what one of our multi tenant
> >> cluster looks like
> >>
> >> RSGroups(Tenants): 21 (With tenant isolation)
> >> Regionservers: 275
> >> Regions Hosted: 6k
> >> Tables Hosted: 87
> >> Capacity: 250 TB (100TB used)
> >>
> >> ---
> >> Mallikarjun
> >>
> >>
> >> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > As you all know, we always want to reduce the size of the hbase-server
> >> > module. This time we want to separate the balancer related code to
> >> another
> >> > sub module.
> >> >
> >> > The design doc:
> >> >
> >> >
> >>
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> >> >
> >> > You can see the bottom of the design doc, favor node balancer is a
> >> problem,
> >> > as it stores the favor node information in hbase:meta. Stack mentioned
> >> that
> >> > the feature is already dead, maybe we could just purge it from our
> code
> >> > base.
> >> >
> >> > So here we want to know if there are still some users in the community
> >> who
> >> > still use favor node balancer. Please share your experience and
> whether
> >> you
> >> > still want to use it.
> >> >
> >> > Thanks.
> >> >
> >>
> >
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Mallikarjun <ma...@gmail.com>.
Inline reply

On Tue, Apr 27, 2021 at 1:03 AM Stack <st...@duboce.net> wrote:

> On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:
>
> > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <ma...@gmail.com>
> > wrote:
> >
> >> We use FavoredStochasticBalancer, which by description says the same
> thing
> >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
> >>
> >>
> >
> > Other concerns:
> >
> >  * Hard-coded triplet of nodes that will inevitably rot as machines come
> > and go (Are there tools for remediation?)
>

It doesn't really rot, if you think it with balancer responsible to
assigning regions

1. On every region assigned to a particular regionserver, the balancer
would have to reassign this triplet and hence there is no scope of rot
(Same logic applied to WAL as well). (On compaction hdfs blocks will be
pulled back if any spill over)

2. We used hostnames only (so, come and go is not going to be new nodes but
same hostnames)

Couple of outstanding problems though.

1. We couldn't increase replication factor to > 3. Which was fine so far
for our use cases. But we have had thoughts around fixing them.

2. Balancer doesn't understand favored nodes construct, perfect balanced fn
among the rsgroup datanodes isn't possible, but with some variance like
10-20% difference is expected


> >  * A workaround for a facility that belongs in the NN
>

Probably, you can argue both ways. Hbase is the owner of data and hbase has
the authority to dictate where a particular region replica sits. Benefits
like data locality will be mostly around 1, rack awareness is more aligned
to this strategy and so on.

Moreover, HDFS has data pinning for clients to make use of it. Isn't it?


> >  * Opaque in operation
>

We haven't looked around wrapping these operations around metrics, so that
it is no longer opaque and reasons mentioned in the above point.


> >  * My understanding was that the feature was never finished; in
> particular
> > the balancer wasn't properly wired- up (Happy to be incorrect here).
> >
> >
> One more concern was that the feature was dead/unused. You seem to refute
> this notion of mine.
> S
>

We have been using this for more than a year with hbase 2.1 in highly
critical workloads for our company. And several years with hbase 1.2 as
well with backporting rsgroup from master at that time. (2017-18 ish)

And it has been very smooth operationally in hbase 2.1


>
>
> >
> >
> >> Going a step back.
> >>
> >> Did we ever consider giving a thought towards truely multi-tenant hbase?
> >>
> >
> > Always.
> >
> >
> >> Where each rsgroup has a group of datanodes and namespace tables data
> >> created under that particular rsgroup would sit on those datanodes only?
> >> We
> >> have attempted to do that and have largely been very successful running
> >> clusters of hundreds of terabytes with hundreds of
> >> regionservers(datanodes)
> >> per cluster.
> >>
> >>
> > So isolation of load by node? (I believe this is where the rsgroup
> feature
> > came from originally; the desire for a deploy like you describe above.
> > IIUC, its what Thiru and crew run).
> >
> >
> >
> >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> >> contributed by Thiruvel Thirumoolan -->
> >> https://issues.apache.org/jira/browse/HBASE-15533
> >>
> >> On each balance operation, while the region is moved around (or while
> >> creating table), favored nodes are assigned based on the rsgroup that
> >> region is pinned to. And hence data is pinned to those datanodes only
> >> (Pinning favored nodes is best effort from the hdfs side, but there are
> >> only a few exception scenarios where data will be spilled over and they
> >> recover after a major compaction).
> >>
> >>
> > Sounds like you have studied this deploy in operation. Write it up? Blog
> > post on hbase.apache.org?
> >
>

Definitely will write up.


> >
> >
> >> 2. We have introduced several balancer cost functions to restore things
> to
> >> normalcy (multi tenancy with fn pinning) such as when a node is dead, or
> >> when fn's are imbalanced within the same rsgroup, etc.
> >>
> >> 3. We had diverse workloads under the same cluster and WAL isolation
> >> became
> >> a requirement and we went ahead with similar philosophy mentioned in
> line
> >> 1. Where WAL's are created with FN pinning so that they are tied to
> >> datanodes belonging to the same rsgroup. Some discussion seems to have
> >> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
> >>
> >> There are several other enhancements we have worked on with respect to
> >> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> >> replication, etc.
> >>
> >> For above use cases, we would be needing fn information on hbase:meta.
> >>
> >> If the use case seems to be a fit for how we would want hbase to be
> taken
> >> forward as one of the supported use cases, willing to contribute our
> >> changes back to the community. (I was anyway planning to initiate this
> >> discussion)
> >>
> >
> > Contribs always welcome.
>

Happy to see our thoughts are in line. We will prepare a plan on these
contributions.


> >
> > Thanks Malilkarjun,
> > S
> >
> >
> >
> >>
> >> To strengthen the above use case. Here is what one of our multi tenant
> >> cluster looks like
> >>
> >> RSGroups(Tenants): 21 (With tenant isolation)
> >> Regionservers: 275
> >> Regions Hosted: 6k
> >> Tables Hosted: 87
> >> Capacity: 250 TB (100TB used)
> >>
> >> ---
> >> Mallikarjun
> >>
> >>
> >> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > As you all know, we always want to reduce the size of the hbase-server
> >> > module. This time we want to separate the balancer related code to
> >> another
> >> > sub module.
> >> >
> >> > The design doc:
> >> >
> >> >
> >>
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> >> >
> >> > You can see the bottom of the design doc, favor node balancer is a
> >> problem,
> >> > as it stores the favor node information in hbase:meta. Stack mentioned
> >> that
> >> > the feature is already dead, maybe we could just purge it from our
> code
> >> > base.
> >> >
> >> > So here we want to know if there are still some users in the community
> >> who
> >> > still use favor node balancer. Please share your experience and
> whether
> >> you
> >> > still want to use it.
> >> >
> >> > Thanks.
> >> >
> >>
> >
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Stack <st...@duboce.net>.
On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:

> On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <ma...@gmail.com>
> wrote:
>
>> We use FavoredStochasticBalancer, which by description says the same thing
>> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
>>
>>
>
> Other concerns:
>
>  * Hard-coded triplet of nodes that will inevitably rot as machines come
> and go (Are there tools for remediation?)
>  * A workaround for a facility that belongs in the NN
>  * Opaque in operation
>  * My understanding was that the feature was never finished; in particular
> the balancer wasn't properly wired- up (Happy to be incorrect here).
>
>
One more concern was that the feature was dead/unused. You seem to refute
this notion of mine.
S



>
>
>> Going a step back.
>>
>> Did we ever consider giving a thought towards truely multi-tenant hbase?
>>
>
> Always.
>
>
>> Where each rsgroup has a group of datanodes and namespace tables data
>> created under that particular rsgroup would sit on those datanodes only?
>> We
>> have attempted to do that and have largely been very successful running
>> clusters of hundreds of terabytes with hundreds of
>> regionservers(datanodes)
>> per cluster.
>>
>>
> So isolation of load by node? (I believe this is where the rsgroup feature
> came from originally; the desire for a deploy like you describe above.
> IIUC, its what Thiru and crew run).
>
>
>
>> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
>> contributed by Thiruvel Thirumoolan -->
>> https://issues.apache.org/jira/browse/HBASE-15533
>>
>> On each balance operation, while the region is moved around (or while
>> creating table), favored nodes are assigned based on the rsgroup that
>> region is pinned to. And hence data is pinned to those datanodes only
>> (Pinning favored nodes is best effort from the hdfs side, but there are
>> only a few exception scenarios where data will be spilled over and they
>> recover after a major compaction).
>>
>>
> Sounds like you have studied this deploy in operation. Write it up? Blog
> post on hbase.apache.org?
>
>
>
>> 2. We have introduced several balancer cost functions to restore things to
>> normalcy (multi tenancy with fn pinning) such as when a node is dead, or
>> when fn's are imbalanced within the same rsgroup, etc.
>>
>> 3. We had diverse workloads under the same cluster and WAL isolation
>> became
>> a requirement and we went ahead with similar philosophy mentioned in line
>> 1. Where WAL's are created with FN pinning so that they are tied to
>> datanodes belonging to the same rsgroup. Some discussion seems to have
>> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
>>
>> There are several other enhancements we have worked on with respect to
>> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
>> replication, etc.
>>
>> For above use cases, we would be needing fn information on hbase:meta.
>>
>> If the use case seems to be a fit for how we would want hbase to be taken
>> forward as one of the supported use cases, willing to contribute our
>> changes back to the community. (I was anyway planning to initiate this
>> discussion)
>>
>
> Contribs always welcome.
>
> Thanks Malilkarjun,
> S
>
>
>
>>
>> To strengthen the above use case. Here is what one of our multi tenant
>> cluster looks like
>>
>> RSGroups(Tenants): 21 (With tenant isolation)
>> Regionservers: 275
>> Regions Hosted: 6k
>> Tables Hosted: 87
>> Capacity: 250 TB (100TB used)
>>
>> ---
>> Mallikarjun
>>
>>
>> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > As you all know, we always want to reduce the size of the hbase-server
>> > module. This time we want to separate the balancer related code to
>> another
>> > sub module.
>> >
>> > The design doc:
>> >
>> >
>> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
>> >
>> > You can see the bottom of the design doc, favor node balancer is a
>> problem,
>> > as it stores the favor node information in hbase:meta. Stack mentioned
>> that
>> > the feature is already dead, maybe we could just purge it from our code
>> > base.
>> >
>> > So here we want to know if there are still some users in the community
>> who
>> > still use favor node balancer. Please share your experience and whether
>> you
>> > still want to use it.
>> >
>> > Thanks.
>> >
>>
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Stack <st...@duboce.net>.
On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:

> On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <ma...@gmail.com>
> wrote:
>
>> We use FavoredStochasticBalancer, which by description says the same thing
>> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
>>
>>
>
> Other concerns:
>
>  * Hard-coded triplet of nodes that will inevitably rot as machines come
> and go (Are there tools for remediation?)
>  * A workaround for a facility that belongs in the NN
>  * Opaque in operation
>  * My understanding was that the feature was never finished; in particular
> the balancer wasn't properly wired- up (Happy to be incorrect here).
>
>
One more concern was that the feature was dead/unused. You seem to refute
this notion of mine.
S



>
>
>> Going a step back.
>>
>> Did we ever consider giving a thought towards truely multi-tenant hbase?
>>
>
> Always.
>
>
>> Where each rsgroup has a group of datanodes and namespace tables data
>> created under that particular rsgroup would sit on those datanodes only?
>> We
>> have attempted to do that and have largely been very successful running
>> clusters of hundreds of terabytes with hundreds of
>> regionservers(datanodes)
>> per cluster.
>>
>>
> So isolation of load by node? (I believe this is where the rsgroup feature
> came from originally; the desire for a deploy like you describe above.
> IIUC, its what Thiru and crew run).
>
>
>
>> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
>> contributed by Thiruvel Thirumoolan -->
>> https://issues.apache.org/jira/browse/HBASE-15533
>>
>> On each balance operation, while the region is moved around (or while
>> creating table), favored nodes are assigned based on the rsgroup that
>> region is pinned to. And hence data is pinned to those datanodes only
>> (Pinning favored nodes is best effort from the hdfs side, but there are
>> only a few exception scenarios where data will be spilled over and they
>> recover after a major compaction).
>>
>>
> Sounds like you have studied this deploy in operation. Write it up? Blog
> post on hbase.apache.org?
>
>
>
>> 2. We have introduced several balancer cost functions to restore things to
>> normalcy (multi tenancy with fn pinning) such as when a node is dead, or
>> when fn's are imbalanced within the same rsgroup, etc.
>>
>> 3. We had diverse workloads under the same cluster and WAL isolation
>> became
>> a requirement and we went ahead with similar philosophy mentioned in line
>> 1. Where WAL's are created with FN pinning so that they are tied to
>> datanodes belonging to the same rsgroup. Some discussion seems to have
>> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
>>
>> There are several other enhancements we have worked on with respect to
>> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
>> replication, etc.
>>
>> For above use cases, we would be needing fn information on hbase:meta.
>>
>> If the use case seems to be a fit for how we would want hbase to be taken
>> forward as one of the supported use cases, willing to contribute our
>> changes back to the community. (I was anyway planning to initiate this
>> discussion)
>>
>
> Contribs always welcome.
>
> Thanks Malilkarjun,
> S
>
>
>
>>
>> To strengthen the above use case. Here is what one of our multi tenant
>> cluster looks like
>>
>> RSGroups(Tenants): 21 (With tenant isolation)
>> Regionservers: 275
>> Regions Hosted: 6k
>> Tables Hosted: 87
>> Capacity: 250 TB (100TB used)
>>
>> ---
>> Mallikarjun
>>
>>
>> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > As you all know, we always want to reduce the size of the hbase-server
>> > module. This time we want to separate the balancer related code to
>> another
>> > sub module.
>> >
>> > The design doc:
>> >
>> >
>> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
>> >
>> > You can see the bottom of the design doc, favor node balancer is a
>> problem,
>> > as it stores the favor node information in hbase:meta. Stack mentioned
>> that
>> > the feature is already dead, maybe we could just purge it from our code
>> > base.
>> >
>> > So here we want to know if there are still some users in the community
>> who
>> > still use favor node balancer. Please share your experience and whether
>> you
>> > still want to use it.
>> >
>> > Thanks.
>> >
>>
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Stack <st...@duboce.net>.
On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <ma...@gmail.com>
wrote:

> We use FavoredStochasticBalancer, which by description says the same thing
> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
>


Does this work?


>  favor node balancer is a problem, as it stores the favor node information
> > in hbase:meta.
> >
>
>
Other concerns:

 * Hard-coded triplet of nodes that will inevitably rot as machines come
and go (Are there tools for remediation?)
 * A workaround for a facility that belongs in the NN
 * Opaque in operation
 * My understanding was that the feature was never finished; in particular
the balancer wasn't properly wired- up (Happy to be incorrect here).



> Going a step back.
>
> Did we ever consider giving a thought towards truely multi-tenant hbase?
>

Always.


> Where each rsgroup has a group of datanodes and namespace tables data
> created under that particular rsgroup would sit on those datanodes only? We
> have attempted to do that and have largely been very successful running
> clusters of hundreds of terabytes with hundreds of regionservers(datanodes)
> per cluster.
>
>
So isolation of load by node? (I believe this is where the rsgroup feature
came from originally; the desire for a deploy like you describe above.
IIUC, its what Thiru and crew run).



> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> contributed by Thiruvel Thirumoolan -->
> https://issues.apache.org/jira/browse/HBASE-15533
>
> On each balance operation, while the region is moved around (or while
> creating table), favored nodes are assigned based on the rsgroup that
> region is pinned to. And hence data is pinned to those datanodes only
> (Pinning favored nodes is best effort from the hdfs side, but there are
> only a few exception scenarios where data will be spilled over and they
> recover after a major compaction).
>
>
Sounds like you have studied this deploy in operation. Write it up? Blog
post on hbase.apache.org?



> 2. We have introduced several balancer cost functions to restore things to
> normalcy (multi tenancy with fn pinning) such as when a node is dead, or
> when fn's are imbalanced within the same rsgroup, etc.
>
> 3. We had diverse workloads under the same cluster and WAL isolation became
> a requirement and we went ahead with similar philosophy mentioned in line
> 1. Where WAL's are created with FN pinning so that they are tied to
> datanodes belonging to the same rsgroup. Some discussion seems to have
> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
>
> There are several other enhancements we have worked on with respect to
> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> replication, etc.
>
> For above use cases, we would be needing fn information on hbase:meta.
>
> If the use case seems to be a fit for how we would want hbase to be taken
> forward as one of the supported use cases, willing to contribute our
> changes back to the community. (I was anyway planning to initiate this
> discussion)
>

Contribs always welcome.

Thanks Malilkarjun,
S



>
> To strengthen the above use case. Here is what one of our multi tenant
> cluster looks like
>
> RSGroups(Tenants): 21 (With tenant isolation)
> Regionservers: 275
> Regions Hosted: 6k
> Tables Hosted: 87
> Capacity: 250 TB (100TB used)
>
> ---
> Mallikarjun
>
>
> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > As you all know, we always want to reduce the size of the hbase-server
> > module. This time we want to separate the balancer related code to
> another
> > sub module.
> >
> > The design doc:
> >
> >
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> >
> > You can see the bottom of the design doc, favor node balancer is a
> problem,
> > as it stores the favor node information in hbase:meta. Stack mentioned
> that
> > the feature is already dead, maybe we could just purge it from our code
> > base.
> >
> > So here we want to know if there are still some users in the community
> who
> > still use favor node balancer. Please share your experience and whether
> you
> > still want to use it.
> >
> > Thanks.
> >
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Stack <st...@duboce.net>.
On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <ma...@gmail.com>
wrote:

> We use FavoredStochasticBalancer, which by description says the same thing
> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
>


Does this work?


>  favor node balancer is a problem, as it stores the favor node information
> > in hbase:meta.
> >
>
>
Other concerns:

 * Hard-coded triplet of nodes that will inevitably rot as machines come
and go (Are there tools for remediation?)
 * A workaround for a facility that belongs in the NN
 * Opaque in operation
 * My understanding was that the feature was never finished; in particular
the balancer wasn't properly wired- up (Happy to be incorrect here).



> Going a step back.
>
> Did we ever consider giving a thought towards truely multi-tenant hbase?
>

Always.


> Where each rsgroup has a group of datanodes and namespace tables data
> created under that particular rsgroup would sit on those datanodes only? We
> have attempted to do that and have largely been very successful running
> clusters of hundreds of terabytes with hundreds of regionservers(datanodes)
> per cluster.
>
>
So isolation of load by node? (I believe this is where the rsgroup feature
came from originally; the desire for a deploy like you describe above.
IIUC, its what Thiru and crew run).



> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> contributed by Thiruvel Thirumoolan -->
> https://issues.apache.org/jira/browse/HBASE-15533
>
> On each balance operation, while the region is moved around (or while
> creating table), favored nodes are assigned based on the rsgroup that
> region is pinned to. And hence data is pinned to those datanodes only
> (Pinning favored nodes is best effort from the hdfs side, but there are
> only a few exception scenarios where data will be spilled over and they
> recover after a major compaction).
>
>
Sounds like you have studied this deploy in operation. Write it up? Blog
post on hbase.apache.org?



> 2. We have introduced several balancer cost functions to restore things to
> normalcy (multi tenancy with fn pinning) such as when a node is dead, or
> when fn's are imbalanced within the same rsgroup, etc.
>
> 3. We had diverse workloads under the same cluster and WAL isolation became
> a requirement and we went ahead with similar philosophy mentioned in line
> 1. Where WAL's are created with FN pinning so that they are tied to
> datanodes belonging to the same rsgroup. Some discussion seems to have
> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
>
> There are several other enhancements we have worked on with respect to
> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> replication, etc.
>
> For above use cases, we would be needing fn information on hbase:meta.
>
> If the use case seems to be a fit for how we would want hbase to be taken
> forward as one of the supported use cases, willing to contribute our
> changes back to the community. (I was anyway planning to initiate this
> discussion)
>

Contribs always welcome.

Thanks Malilkarjun,
S



>
> To strengthen the above use case. Here is what one of our multi tenant
> cluster looks like
>
> RSGroups(Tenants): 21 (With tenant isolation)
> Regionservers: 275
> Regions Hosted: 6k
> Tables Hosted: 87
> Capacity: 250 TB (100TB used)
>
> ---
> Mallikarjun
>
>
> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > As you all know, we always want to reduce the size of the hbase-server
> > module. This time we want to separate the balancer related code to
> another
> > sub module.
> >
> > The design doc:
> >
> >
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> >
> > You can see the bottom of the design doc, favor node balancer is a
> problem,
> > as it stores the favor node information in hbase:meta. Stack mentioned
> that
> > the feature is already dead, maybe we could just purge it from our code
> > base.
> >
> > So here we want to know if there are still some users in the community
> who
> > still use favor node balancer. Please share your experience and whether
> you
> > still want to use it.
> >
> > Thanks.
> >
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Mallikarjun <ma...@gmail.com>.
We use FavoredStochasticBalancer, which by description says the same thing
as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be

 favor node balancer is a problem, as it stores the favor node information
> in hbase:meta.
>

Going a step back.

Did we ever consider giving a thought towards truely multi-tenant hbase?
Where each rsgroup has a group of datanodes and namespace tables data
created under that particular rsgroup would sit on those datanodes only? We
have attempted to do that and have largely been very successful running
clusters of hundreds of terabytes with hundreds of regionservers(datanodes)
per cluster.

1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
contributed by Thiruvel Thirumoolan -->
https://issues.apache.org/jira/browse/HBASE-15533

On each balance operation, while the region is moved around (or while
creating table), favored nodes are assigned based on the rsgroup that
region is pinned to. And hence data is pinned to those datanodes only
(Pinning favored nodes is best effort from the hdfs side, but there are
only a few exception scenarios where data will be spilled over and they
recover after a major compaction).

2. We have introduced several balancer cost functions to restore things to
normalcy (multi tenancy with fn pinning) such as when a node is dead, or
when fn's are imbalanced within the same rsgroup, etc.

3. We had diverse workloads under the same cluster and WAL isolation became
a requirement and we went ahead with similar philosophy mentioned in line
1. Where WAL's are created with FN pinning so that they are tied to
datanodes belonging to the same rsgroup. Some discussion seems to have
happened here --> https://issues.apache.org/jira/browse/HBASE-21641

There are several other enhancements we have worked on with respect to
rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
replication, etc.

For above use cases, we would be needing fn information on hbase:meta.

If the use case seems to be a fit for how we would want hbase to be taken
forward as one of the supported use cases, willing to contribute our
changes back to the community. (I was anyway planning to initiate this
discussion)

To strengthen the above use case. Here is what one of our multi tenant
cluster looks like

RSGroups(Tenants): 21 (With tenant isolation)
Regionservers: 275
Regions Hosted: 6k
Tables Hosted: 87
Capacity: 250 TB (100TB used)

---
Mallikarjun


On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> As you all know, we always want to reduce the size of the hbase-server
> module. This time we want to separate the balancer related code to another
> sub module.
>
> The design doc:
>
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
>
> You can see the bottom of the design doc, favor node balancer is a problem,
> as it stores the favor node information in hbase:meta. Stack mentioned that
> the feature is already dead, maybe we could just purge it from our code
> base.
>
> So here we want to know if there are still some users in the community who
> still use favor node balancer. Please share your experience and whether you
> still want to use it.
>
> Thanks.
>

Re: [SURVEY] The current usage of favor node balancer across the community

Posted by Mallikarjun <ma...@gmail.com>.
We use FavoredStochasticBalancer, which by description says the same thing
as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be

 favor node balancer is a problem, as it stores the favor node information
> in hbase:meta.
>

Going a step back.

Did we ever consider giving a thought towards truely multi-tenant hbase?
Where each rsgroup has a group of datanodes and namespace tables data
created under that particular rsgroup would sit on those datanodes only? We
have attempted to do that and have largely been very successful running
clusters of hundreds of terabytes with hundreds of regionservers(datanodes)
per cluster.

1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
contributed by Thiruvel Thirumoolan -->
https://issues.apache.org/jira/browse/HBASE-15533

On each balance operation, while the region is moved around (or while
creating table), favored nodes are assigned based on the rsgroup that
region is pinned to. And hence data is pinned to those datanodes only
(Pinning favored nodes is best effort from the hdfs side, but there are
only a few exception scenarios where data will be spilled over and they
recover after a major compaction).

2. We have introduced several balancer cost functions to restore things to
normalcy (multi tenancy with fn pinning) such as when a node is dead, or
when fn's are imbalanced within the same rsgroup, etc.

3. We had diverse workloads under the same cluster and WAL isolation became
a requirement and we went ahead with similar philosophy mentioned in line
1. Where WAL's are created with FN pinning so that they are tied to
datanodes belonging to the same rsgroup. Some discussion seems to have
happened here --> https://issues.apache.org/jira/browse/HBASE-21641

There are several other enhancements we have worked on with respect to
rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
replication, etc.

For above use cases, we would be needing fn information on hbase:meta.

If the use case seems to be a fit for how we would want hbase to be taken
forward as one of the supported use cases, willing to contribute our
changes back to the community. (I was anyway planning to initiate this
discussion)

To strengthen the above use case. Here is what one of our multi tenant
cluster looks like

RSGroups(Tenants): 21 (With tenant isolation)
Regionservers: 275
Regions Hosted: 6k
Tables Hosted: 87
Capacity: 250 TB (100TB used)

---
Mallikarjun


On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> As you all know, we always want to reduce the size of the hbase-server
> module. This time we want to separate the balancer related code to another
> sub module.
>
> The design doc:
>
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
>
> You can see the bottom of the design doc, favor node balancer is a problem,
> as it stores the favor node information in hbase:meta. Stack mentioned that
> the feature is already dead, maybe we could just purge it from our code
> base.
>
> So here we want to know if there are still some users in the community who
> still use favor node balancer. Please share your experience and whether you
> still want to use it.
>
> Thanks.
>