You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Viraj Jasani <vj...@apache.org> on 2021/09/07 18:27:16 UTC

Blog post series on "Evolution of Region assignment in HBase architecture"

As some of the HBase users are still running HBase 1.x versions in their
production environment, and branch-1 is trending toward EOL, now is really
the right time to evaluate as well as understand the features and core
design changes provided by HBase 2.x versions.

As the majority of us are already aware, one of the key features with
significant architectural changes provided by HBase 2 is
AssignmentManagerV2 (AMv2).
However, we don't seem to have one place explaining 1) *the evolution
of AM* and
2) how it manages region assignments with better scalability, reliability
and fault-tolerance.
Keeping this in mind, Andrew and I have published a series of two-part blog
posts explaining this evolution. Part 1 provides a) some basic introduction
to HBase concepts, and b) AM and it's shortcomings from previous versions
that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2 and
how AMv2 leverages it, and also state diagrams explaining some of the
complex region assignment workflows. The intention of state diagrams is for
dev/users to be able to a) understand region assignment workflows in-depth,
b) easier code walk-through and c) debug and root cause issues with
better knowledge.

Part 1:
https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
Part 2:
https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Thanks Viraj for the great work.

Assignment and SCP are the core part of HBase, I suggest everyone in the
community who are interest in the architecture of HBase read these blog
posts.

wonder if we could translate these blog posts to Chinese so our Chinese
friends could understand them better.

Viraj Jasani <vj...@apache.org> 于2021年10月8日周五 下午3:14写道:

> We have the "Part 3" of the blog series published.
> Thanks to the co-writers: Duo Zhang and Andrew Purtell.
>
> Part 3:
>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-3-e03b814ae92
>
> On Mon, Sep 13, 2021 at 10:53 PM Viraj Jasani <vj...@apache.org> wrote:
>
> > Thanks Duo for your offer to coordinate on writing "Part 3" of this
> > series, sounds great!
> > Although I see TRSP#assign being used by SCP directly while assigning the
> > regions, I am yet to take a detailed look into HBASE-20881
> > <https://issues.apache.org/jira/browse/HBASE-20881> and the relevant
> > work. Let me reach out to you over Slack and we can take it from there.
> >
> > On Sun, Sep 12, 2021 at 7:02 PM 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> >> Thank you Viraj and Andrew, the blog posts are outstanding!
> >>
> >> And I think we'd better have a part 3, about the
> ServerCrashProcedure(SCP)
> >> :)
> >>
> >> In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
> >> UnassignRegionProcedure, and one of the reasons why we removed them all
> >> and
> >> introduced a single TRSP to do assign/unassign/move/reopen, is because
> of
> >> SCP.
> >>
> >> If a region server crashed, obviously, we can not assign regions to it
> any
> >> more, so we should have a way to stop the procedure which are still
> trying
> >> to assign regions to the dead server. And even for unassigning a region,
> >> we
> >> still need to make it online first and then unassign it. For example,
> when
> >> disabling a table, we must make sure that all the data in memstore have
> >> been flushed to storage, so we will need make it online, and then do a
> >> clean close.
> >> In 2.0 and 2.1, we had 3 procedures for region assignment, and there
> were
> >> lots of corner cases when we want to interrupt them from SCP, which made
> >> the code really hard to understand and buggy. So finally, we introduced
> a
> >> TRSP to replace them all. So SCP only needs to interrupt one type of
> >> procedure.
> >>
> >> This is the story :)
> >>
> >> I could help if you guys want to write the part 3 about SCP :)
> >>
> >> Thanks.
> >>
> >> Viraj Jasani <vj...@apache.org> 于2021年9月8日周三 上午2:27写道:
> >>
> >> > As some of the HBase users are still running HBase 1.x versions in
> their
> >> > production environment, and branch-1 is trending toward EOL, now is
> >> really
> >> > the right time to evaluate as well as understand the features and core
> >> > design changes provided by HBase 2.x versions.
> >> >
> >> > As the majority of us are already aware, one of the key features with
> >> > significant architectural changes provided by HBase 2 is
> >> > AssignmentManagerV2 (AMv2).
> >> > However, we don't seem to have one place explaining 1) *the evolution
> >> > of AM* and
> >> > 2) how it manages region assignments with better scalability,
> >> reliability
> >> > and fault-tolerance.
> >> > Keeping this in mind, Andrew and I have published a series of two-part
> >> blog
> >> > posts explaining this evolution. Part 1 provides a) some basic
> >> introduction
> >> > to HBase concepts, and b) AM and it's shortcomings from previous
> >> versions
> >> > that AMv2 is trying to resolve. Part 2 provides detailed info about
> Pv2
> >> and
> >> > how AMv2 leverages it, and also state diagrams explaining some of the
> >> > complex region assignment workflows. The intention of state diagrams
> is
> >> for
> >> > dev/users to be able to a) understand region assignment workflows
> >> in-depth,
> >> > b) easier code walk-through and c) debug and root cause issues with
> >> > better knowledge.
> >> >
> >> > Part 1:
> >> >
> >> >
> >>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
> >> > Part 2:
> >> >
> >> >
> >>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
> >> >
> >>
> >
>

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Thanks Viraj for the great work.

Assignment and SCP are the core part of HBase, I suggest everyone in the
community who are interest in the architecture of HBase read these blog
posts.

wonder if we could translate these blog posts to Chinese so our Chinese
friends could understand them better.

Viraj Jasani <vj...@apache.org> 于2021年10月8日周五 下午3:14写道:

> We have the "Part 3" of the blog series published.
> Thanks to the co-writers: Duo Zhang and Andrew Purtell.
>
> Part 3:
>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-3-e03b814ae92
>
> On Mon, Sep 13, 2021 at 10:53 PM Viraj Jasani <vj...@apache.org> wrote:
>
> > Thanks Duo for your offer to coordinate on writing "Part 3" of this
> > series, sounds great!
> > Although I see TRSP#assign being used by SCP directly while assigning the
> > regions, I am yet to take a detailed look into HBASE-20881
> > <https://issues.apache.org/jira/browse/HBASE-20881> and the relevant
> > work. Let me reach out to you over Slack and we can take it from there.
> >
> > On Sun, Sep 12, 2021 at 7:02 PM 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> >> Thank you Viraj and Andrew, the blog posts are outstanding!
> >>
> >> And I think we'd better have a part 3, about the
> ServerCrashProcedure(SCP)
> >> :)
> >>
> >> In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
> >> UnassignRegionProcedure, and one of the reasons why we removed them all
> >> and
> >> introduced a single TRSP to do assign/unassign/move/reopen, is because
> of
> >> SCP.
> >>
> >> If a region server crashed, obviously, we can not assign regions to it
> any
> >> more, so we should have a way to stop the procedure which are still
> trying
> >> to assign regions to the dead server. And even for unassigning a region,
> >> we
> >> still need to make it online first and then unassign it. For example,
> when
> >> disabling a table, we must make sure that all the data in memstore have
> >> been flushed to storage, so we will need make it online, and then do a
> >> clean close.
> >> In 2.0 and 2.1, we had 3 procedures for region assignment, and there
> were
> >> lots of corner cases when we want to interrupt them from SCP, which made
> >> the code really hard to understand and buggy. So finally, we introduced
> a
> >> TRSP to replace them all. So SCP only needs to interrupt one type of
> >> procedure.
> >>
> >> This is the story :)
> >>
> >> I could help if you guys want to write the part 3 about SCP :)
> >>
> >> Thanks.
> >>
> >> Viraj Jasani <vj...@apache.org> 于2021年9月8日周三 上午2:27写道:
> >>
> >> > As some of the HBase users are still running HBase 1.x versions in
> their
> >> > production environment, and branch-1 is trending toward EOL, now is
> >> really
> >> > the right time to evaluate as well as understand the features and core
> >> > design changes provided by HBase 2.x versions.
> >> >
> >> > As the majority of us are already aware, one of the key features with
> >> > significant architectural changes provided by HBase 2 is
> >> > AssignmentManagerV2 (AMv2).
> >> > However, we don't seem to have one place explaining 1) *the evolution
> >> > of AM* and
> >> > 2) how it manages region assignments with better scalability,
> >> reliability
> >> > and fault-tolerance.
> >> > Keeping this in mind, Andrew and I have published a series of two-part
> >> blog
> >> > posts explaining this evolution. Part 1 provides a) some basic
> >> introduction
> >> > to HBase concepts, and b) AM and it's shortcomings from previous
> >> versions
> >> > that AMv2 is trying to resolve. Part 2 provides detailed info about
> Pv2
> >> and
> >> > how AMv2 leverages it, and also state diagrams explaining some of the
> >> > complex region assignment workflows. The intention of state diagrams
> is
> >> for
> >> > dev/users to be able to a) understand region assignment workflows
> >> in-depth,
> >> > b) easier code walk-through and c) debug and root cause issues with
> >> > better knowledge.
> >> >
> >> > Part 1:
> >> >
> >> >
> >>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
> >> > Part 2:
> >> >
> >> >
> >>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
> >> >
> >>
> >
>

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Posted by Viraj Jasani <vj...@apache.org>.
We have the "Part 3" of the blog series published.
Thanks to the co-writers: Duo Zhang and Andrew Purtell.

Part 3:
https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-3-e03b814ae92

On Mon, Sep 13, 2021 at 10:53 PM Viraj Jasani <vj...@apache.org> wrote:

> Thanks Duo for your offer to coordinate on writing "Part 3" of this
> series, sounds great!
> Although I see TRSP#assign being used by SCP directly while assigning the
> regions, I am yet to take a detailed look into HBASE-20881
> <https://issues.apache.org/jira/browse/HBASE-20881> and the relevant
> work. Let me reach out to you over Slack and we can take it from there.
>
> On Sun, Sep 12, 2021 at 7:02 PM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
>> Thank you Viraj and Andrew, the blog posts are outstanding!
>>
>> And I think we'd better have a part 3, about the ServerCrashProcedure(SCP)
>> :)
>>
>> In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
>> UnassignRegionProcedure, and one of the reasons why we removed them all
>> and
>> introduced a single TRSP to do assign/unassign/move/reopen, is because of
>> SCP.
>>
>> If a region server crashed, obviously, we can not assign regions to it any
>> more, so we should have a way to stop the procedure which are still trying
>> to assign regions to the dead server. And even for unassigning a region,
>> we
>> still need to make it online first and then unassign it. For example, when
>> disabling a table, we must make sure that all the data in memstore have
>> been flushed to storage, so we will need make it online, and then do a
>> clean close.
>> In 2.0 and 2.1, we had 3 procedures for region assignment, and there were
>> lots of corner cases when we want to interrupt them from SCP, which made
>> the code really hard to understand and buggy. So finally, we introduced a
>> TRSP to replace them all. So SCP only needs to interrupt one type of
>> procedure.
>>
>> This is the story :)
>>
>> I could help if you guys want to write the part 3 about SCP :)
>>
>> Thanks.
>>
>> Viraj Jasani <vj...@apache.org> 于2021年9月8日周三 上午2:27写道:
>>
>> > As some of the HBase users are still running HBase 1.x versions in their
>> > production environment, and branch-1 is trending toward EOL, now is
>> really
>> > the right time to evaluate as well as understand the features and core
>> > design changes provided by HBase 2.x versions.
>> >
>> > As the majority of us are already aware, one of the key features with
>> > significant architectural changes provided by HBase 2 is
>> > AssignmentManagerV2 (AMv2).
>> > However, we don't seem to have one place explaining 1) *the evolution
>> > of AM* and
>> > 2) how it manages region assignments with better scalability,
>> reliability
>> > and fault-tolerance.
>> > Keeping this in mind, Andrew and I have published a series of two-part
>> blog
>> > posts explaining this evolution. Part 1 provides a) some basic
>> introduction
>> > to HBase concepts, and b) AM and it's shortcomings from previous
>> versions
>> > that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2
>> and
>> > how AMv2 leverages it, and also state diagrams explaining some of the
>> > complex region assignment workflows. The intention of state diagrams is
>> for
>> > dev/users to be able to a) understand region assignment workflows
>> in-depth,
>> > b) easier code walk-through and c) debug and root cause issues with
>> > better knowledge.
>> >
>> > Part 1:
>> >
>> >
>> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
>> > Part 2:
>> >
>> >
>> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
>> >
>>
>

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Posted by Viraj Jasani <vj...@apache.org>.
We have the "Part 3" of the blog series published.
Thanks to the co-writers: Duo Zhang and Andrew Purtell.

Part 3:
https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-3-e03b814ae92

On Mon, Sep 13, 2021 at 10:53 PM Viraj Jasani <vj...@apache.org> wrote:

> Thanks Duo for your offer to coordinate on writing "Part 3" of this
> series, sounds great!
> Although I see TRSP#assign being used by SCP directly while assigning the
> regions, I am yet to take a detailed look into HBASE-20881
> <https://issues.apache.org/jira/browse/HBASE-20881> and the relevant
> work. Let me reach out to you over Slack and we can take it from there.
>
> On Sun, Sep 12, 2021 at 7:02 PM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
>> Thank you Viraj and Andrew, the blog posts are outstanding!
>>
>> And I think we'd better have a part 3, about the ServerCrashProcedure(SCP)
>> :)
>>
>> In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
>> UnassignRegionProcedure, and one of the reasons why we removed them all
>> and
>> introduced a single TRSP to do assign/unassign/move/reopen, is because of
>> SCP.
>>
>> If a region server crashed, obviously, we can not assign regions to it any
>> more, so we should have a way to stop the procedure which are still trying
>> to assign regions to the dead server. And even for unassigning a region,
>> we
>> still need to make it online first and then unassign it. For example, when
>> disabling a table, we must make sure that all the data in memstore have
>> been flushed to storage, so we will need make it online, and then do a
>> clean close.
>> In 2.0 and 2.1, we had 3 procedures for region assignment, and there were
>> lots of corner cases when we want to interrupt them from SCP, which made
>> the code really hard to understand and buggy. So finally, we introduced a
>> TRSP to replace them all. So SCP only needs to interrupt one type of
>> procedure.
>>
>> This is the story :)
>>
>> I could help if you guys want to write the part 3 about SCP :)
>>
>> Thanks.
>>
>> Viraj Jasani <vj...@apache.org> 于2021年9月8日周三 上午2:27写道:
>>
>> > As some of the HBase users are still running HBase 1.x versions in their
>> > production environment, and branch-1 is trending toward EOL, now is
>> really
>> > the right time to evaluate as well as understand the features and core
>> > design changes provided by HBase 2.x versions.
>> >
>> > As the majority of us are already aware, one of the key features with
>> > significant architectural changes provided by HBase 2 is
>> > AssignmentManagerV2 (AMv2).
>> > However, we don't seem to have one place explaining 1) *the evolution
>> > of AM* and
>> > 2) how it manages region assignments with better scalability,
>> reliability
>> > and fault-tolerance.
>> > Keeping this in mind, Andrew and I have published a series of two-part
>> blog
>> > posts explaining this evolution. Part 1 provides a) some basic
>> introduction
>> > to HBase concepts, and b) AM and it's shortcomings from previous
>> versions
>> > that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2
>> and
>> > how AMv2 leverages it, and also state diagrams explaining some of the
>> > complex region assignment workflows. The intention of state diagrams is
>> for
>> > dev/users to be able to a) understand region assignment workflows
>> in-depth,
>> > b) easier code walk-through and c) debug and root cause issues with
>> > better knowledge.
>> >
>> > Part 1:
>> >
>> >
>> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
>> > Part 2:
>> >
>> >
>> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
>> >
>>
>

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Posted by Viraj Jasani <vj...@apache.org>.
Thanks Duo for your offer to coordinate on writing "Part 3" of this series,
sounds great!
Although I see TRSP#assign being used by SCP directly while assigning the
regions, I am yet to take a detailed look into HBASE-20881
<https://issues.apache.org/jira/browse/HBASE-20881> and the relevant work.
Let me reach out to you over Slack and we can take it from there.

On Sun, Sep 12, 2021 at 7:02 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> Thank you Viraj and Andrew, the blog posts are outstanding!
>
> And I think we'd better have a part 3, about the ServerCrashProcedure(SCP)
> :)
>
> In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
> UnassignRegionProcedure, and one of the reasons why we removed them all and
> introduced a single TRSP to do assign/unassign/move/reopen, is because of
> SCP.
>
> If a region server crashed, obviously, we can not assign regions to it any
> more, so we should have a way to stop the procedure which are still trying
> to assign regions to the dead server. And even for unassigning a region, we
> still need to make it online first and then unassign it. For example, when
> disabling a table, we must make sure that all the data in memstore have
> been flushed to storage, so we will need make it online, and then do a
> clean close.
> In 2.0 and 2.1, we had 3 procedures for region assignment, and there were
> lots of corner cases when we want to interrupt them from SCP, which made
> the code really hard to understand and buggy. So finally, we introduced a
> TRSP to replace them all. So SCP only needs to interrupt one type of
> procedure.
>
> This is the story :)
>
> I could help if you guys want to write the part 3 about SCP :)
>
> Thanks.
>
> Viraj Jasani <vj...@apache.org> 于2021年9月8日周三 上午2:27写道:
>
> > As some of the HBase users are still running HBase 1.x versions in their
> > production environment, and branch-1 is trending toward EOL, now is
> really
> > the right time to evaluate as well as understand the features and core
> > design changes provided by HBase 2.x versions.
> >
> > As the majority of us are already aware, one of the key features with
> > significant architectural changes provided by HBase 2 is
> > AssignmentManagerV2 (AMv2).
> > However, we don't seem to have one place explaining 1) *the evolution
> > of AM* and
> > 2) how it manages region assignments with better scalability, reliability
> > and fault-tolerance.
> > Keeping this in mind, Andrew and I have published a series of two-part
> blog
> > posts explaining this evolution. Part 1 provides a) some basic
> introduction
> > to HBase concepts, and b) AM and it's shortcomings from previous versions
> > that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2
> and
> > how AMv2 leverages it, and also state diagrams explaining some of the
> > complex region assignment workflows. The intention of state diagrams is
> for
> > dev/users to be able to a) understand region assignment workflows
> in-depth,
> > b) easier code walk-through and c) debug and root cause issues with
> > better knowledge.
> >
> > Part 1:
> >
> >
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
> > Part 2:
> >
> >
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
> >
>

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Posted by Viraj Jasani <vj...@apache.org>.
Thanks Duo for your offer to coordinate on writing "Part 3" of this series,
sounds great!
Although I see TRSP#assign being used by SCP directly while assigning the
regions, I am yet to take a detailed look into HBASE-20881
<https://issues.apache.org/jira/browse/HBASE-20881> and the relevant work.
Let me reach out to you over Slack and we can take it from there.

On Sun, Sep 12, 2021 at 7:02 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> Thank you Viraj and Andrew, the blog posts are outstanding!
>
> And I think we'd better have a part 3, about the ServerCrashProcedure(SCP)
> :)
>
> In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
> UnassignRegionProcedure, and one of the reasons why we removed them all and
> introduced a single TRSP to do assign/unassign/move/reopen, is because of
> SCP.
>
> If a region server crashed, obviously, we can not assign regions to it any
> more, so we should have a way to stop the procedure which are still trying
> to assign regions to the dead server. And even for unassigning a region, we
> still need to make it online first and then unassign it. For example, when
> disabling a table, we must make sure that all the data in memstore have
> been flushed to storage, so we will need make it online, and then do a
> clean close.
> In 2.0 and 2.1, we had 3 procedures for region assignment, and there were
> lots of corner cases when we want to interrupt them from SCP, which made
> the code really hard to understand and buggy. So finally, we introduced a
> TRSP to replace them all. So SCP only needs to interrupt one type of
> procedure.
>
> This is the story :)
>
> I could help if you guys want to write the part 3 about SCP :)
>
> Thanks.
>
> Viraj Jasani <vj...@apache.org> 于2021年9月8日周三 上午2:27写道:
>
> > As some of the HBase users are still running HBase 1.x versions in their
> > production environment, and branch-1 is trending toward EOL, now is
> really
> > the right time to evaluate as well as understand the features and core
> > design changes provided by HBase 2.x versions.
> >
> > As the majority of us are already aware, one of the key features with
> > significant architectural changes provided by HBase 2 is
> > AssignmentManagerV2 (AMv2).
> > However, we don't seem to have one place explaining 1) *the evolution
> > of AM* and
> > 2) how it manages region assignments with better scalability, reliability
> > and fault-tolerance.
> > Keeping this in mind, Andrew and I have published a series of two-part
> blog
> > posts explaining this evolution. Part 1 provides a) some basic
> introduction
> > to HBase concepts, and b) AM and it's shortcomings from previous versions
> > that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2
> and
> > how AMv2 leverages it, and also state diagrams explaining some of the
> > complex region assignment workflows. The intention of state diagrams is
> for
> > dev/users to be able to a) understand region assignment workflows
> in-depth,
> > b) easier code walk-through and c) debug and root cause issues with
> > better knowledge.
> >
> > Part 1:
> >
> >
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
> > Part 2:
> >
> >
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
> >
>

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Thank you Viraj and Andrew, the blog posts are outstanding!

And I think we'd better have a part 3, about the ServerCrashProcedure(SCP)
:)

In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
UnassignRegionProcedure, and one of the reasons why we removed them all and
introduced a single TRSP to do assign/unassign/move/reopen, is because of
SCP.

If a region server crashed, obviously, we can not assign regions to it any
more, so we should have a way to stop the procedure which are still trying
to assign regions to the dead server. And even for unassigning a region, we
still need to make it online first and then unassign it. For example, when
disabling a table, we must make sure that all the data in memstore have
been flushed to storage, so we will need make it online, and then do a
clean close.
In 2.0 and 2.1, we had 3 procedures for region assignment, and there were
lots of corner cases when we want to interrupt them from SCP, which made
the code really hard to understand and buggy. So finally, we introduced a
TRSP to replace them all. So SCP only needs to interrupt one type of
procedure.

This is the story :)

I could help if you guys want to write the part 3 about SCP :)

Thanks.

Viraj Jasani <vj...@apache.org> 于2021年9月8日周三 上午2:27写道:

> As some of the HBase users are still running HBase 1.x versions in their
> production environment, and branch-1 is trending toward EOL, now is really
> the right time to evaluate as well as understand the features and core
> design changes provided by HBase 2.x versions.
>
> As the majority of us are already aware, one of the key features with
> significant architectural changes provided by HBase 2 is
> AssignmentManagerV2 (AMv2).
> However, we don't seem to have one place explaining 1) *the evolution
> of AM* and
> 2) how it manages region assignments with better scalability, reliability
> and fault-tolerance.
> Keeping this in mind, Andrew and I have published a series of two-part blog
> posts explaining this evolution. Part 1 provides a) some basic introduction
> to HBase concepts, and b) AM and it's shortcomings from previous versions
> that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2 and
> how AMv2 leverages it, and also state diagrams explaining some of the
> complex region assignment workflows. The intention of state diagrams is for
> dev/users to be able to a) understand region assignment workflows in-depth,
> b) easier code walk-through and c) debug and root cause issues with
> better knowledge.
>
> Part 1:
>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
> Part 2:
>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
>

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Thank you Viraj and Andrew, the blog posts are outstanding!

And I think we'd better have a part 3, about the ServerCrashProcedure(SCP)
:)

In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
UnassignRegionProcedure, and one of the reasons why we removed them all and
introduced a single TRSP to do assign/unassign/move/reopen, is because of
SCP.

If a region server crashed, obviously, we can not assign regions to it any
more, so we should have a way to stop the procedure which are still trying
to assign regions to the dead server. And even for unassigning a region, we
still need to make it online first and then unassign it. For example, when
disabling a table, we must make sure that all the data in memstore have
been flushed to storage, so we will need make it online, and then do a
clean close.
In 2.0 and 2.1, we had 3 procedures for region assignment, and there were
lots of corner cases when we want to interrupt them from SCP, which made
the code really hard to understand and buggy. So finally, we introduced a
TRSP to replace them all. So SCP only needs to interrupt one type of
procedure.

This is the story :)

I could help if you guys want to write the part 3 about SCP :)

Thanks.

Viraj Jasani <vj...@apache.org> 于2021年9月8日周三 上午2:27写道:

> As some of the HBase users are still running HBase 1.x versions in their
> production environment, and branch-1 is trending toward EOL, now is really
> the right time to evaluate as well as understand the features and core
> design changes provided by HBase 2.x versions.
>
> As the majority of us are already aware, one of the key features with
> significant architectural changes provided by HBase 2 is
> AssignmentManagerV2 (AMv2).
> However, we don't seem to have one place explaining 1) *the evolution
> of AM* and
> 2) how it manages region assignments with better scalability, reliability
> and fault-tolerance.
> Keeping this in mind, Andrew and I have published a series of two-part blog
> posts explaining this evolution. Part 1 provides a) some basic introduction
> to HBase concepts, and b) AM and it's shortcomings from previous versions
> that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2 and
> how AMv2 leverages it, and also state diagrams explaining some of the
> complex region assignment workflows. The intention of state diagrams is for
> dev/users to be able to a) understand region assignment workflows in-depth,
> b) easier code walk-through and c) debug and root cause issues with
> better knowledge.
>
> Part 1:
>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
> Part 2:
>
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
>