You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Paul Carey <pa...@gmail.com> on 2020/07/03 08:12:11 UTC

Violating strong consistency after the fact

Hi

I'd like to understand how HBase deals with the situation where the
only available DataNodes for a given offline Region contain stale
data. Will HBase allow the Region to be brought online again,
effectively making the inconsistency permanent, or will it refuse to
do so?

My question is motivated from seeing how Kafka and Elasticsearch
handle this scenario. They both allow the inconsistency to become
permanent, Kafka via unclean leader election, and Elasticsearch via
the allocate_stale_primary command.

To better understand my question, please consider the following example:

- HDFS is configured with `dfs.replication=2` and
`dfs.namenode.replication.min=1`
- DataNodes DN1 and DN2 contain the blocks for Region R1
- DN2 goes offline
- R1 receives a writes which succeeds as it can be written successfully to DN1
- DN1 goes offline before the NameNode can replicate the
under-replicated block containing the write to another DataNode
- At this point the R1 is offline
- DN2 comes back online, but it does not contain the missed write

There are now two options:

- R1 is brought back online, violating consistency
- R1 remains offline, indefinitely, until DN1 is brought back online

How does HBase deal with this situation?

Many thanks

Paul

Re: Violating strong consistency after the fact

Posted by Wellington Chevreuil <we...@gmail.com>.

On details about hdfs write process:
https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-1/

Em sex., 3 de jul. de 2020 às 15:21, Paul Carey <pa...@gmail.com>
escreveu:

> That's very helpful, many thanks.
>
> On Fri, Jul 3, 2020 at 2:36 PM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
> >
> > You can see my design doc for async dfs output
> >
> >
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#heading=h.2jvw6cxnmirr
> >
> >
> > See the footnote below section 3.4. For the current HDFS pipeline
> > implementation, it could be a problem for replication in HBase, though it
> > rarely happens.
> >
> > And now HBase has its own AsyncFSWAL implementation, HBASE-14004 is used
> to
> > resolve the problem(although later we make things wrong and HBASE-24625
> is
> > the fix).
> >
> > And for WAL recovery, it will not be a problem. We will only return
> success
> > to client after all the replicas have been successfully committed, so if
> > DN2 goes offline, we will close the current file and commit it, and open
> a
> > new file to write WAL.
> >
> > Thanks.
> >
> > Paul Carey <pa...@gmail.com> 于2020年7月3日周五 下午7:40写道：
> >
> > > >  If the hdfs write succeeded while u had only one DN available, then
> the
> > > other replica on the offline DN would be invalid now.
> > >
> > > Interesting, I wasn't aware of this. Are there any docs you could
> > > point me towards where this is described? I've had a look in Hadoop:
> > > The Definitive Guide and the official docs, but hadn't come across
> > > this.
> > >
> > > On Fri, Jul 3, 2020 at 11:19 AM Wellington Chevreuil
> > > <we...@gmail.com> wrote:
> > > >
> > > > This is actually an hdfs consistency question, not hbase. If the hdfs
> > > write
> > > > succeeded while u had only one DN available, then the other replica
> on
> > > the
> > > > offline DN would be invalid now. Then what u have is an under
> replicated
> > > > block, and of your only available DN goes offline before it could be
> > > > replicated, the file that block belongs to now is corrupt. If I turn
> on
> > > the
> > > > previous offline DN, it would still be corrupt as the replica it has
> is
> > > not
> > > > valid anymore (NN knows which is the last valid version of the
> replica),
> > > so
> > > > unless u can bring back the DN that has the only valid replica, your
> > > hfilr
> > > > is corrupt and your data is lost.
> > > >
> > > > On Fri, 3 Jul 2020, 09:12 Paul Carey, <pa...@gmail.com>
> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I'd like to understand how HBase deals with the situation where the
> > > > > only available DataNodes for a given offline Region contain stale
> > > > > data. Will HBase allow the Region to be brought online again,
> > > > > effectively making the inconsistency permanent, or will it refuse
> to
> > > > > do so?
> > > > >
> > > > > My question is motivated from seeing how Kafka and Elasticsearch
> > > > > handle this scenario. They both allow the inconsistency to become
> > > > > permanent, Kafka via unclean leader election, and Elasticsearch via
> > > > > the allocate_stale_primary command.
> > > > >
> > > > > To better understand my question, please consider the following
> > > example:
> > > > >
> > > > > - HDFS is configured with `dfs.replication=2` and
> > > > > `dfs.namenode.replication.min=1`
> > > > > - DataNodes DN1 and DN2 contain the blocks for Region R1
> > > > > - DN2 goes offline
> > > > > - R1 receives a writes which succeeds as it can be written
> > > successfully to
> > > > > DN1
> > > > > - DN1 goes offline before the NameNode can replicate the
> > > > > under-replicated block containing the write to another DataNode
> > > > > - At this point the R1 is offline
> > > > > - DN2 comes back online, but it does not contain the missed write
> > > > >
> > > > > There are now two options:
> > > > >
> > > > > - R1 is brought back online, violating consistency
> > > > > - R1 remains offline, indefinitely, until DN1 is brought back
> online
> > > > >
> > > > > How does HBase deal with this situation?
> > > > >
> > > > > Many thanks
> > > > >
> > > > > Paul
> > > > >
> > >
>

Re: Violating strong consistency after the fact

Posted by Paul Carey <pa...@gmail.com>.

That's very helpful, many thanks.

On Fri, Jul 3, 2020 at 2:36 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:
>
> You can see my design doc for async dfs output
>
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#heading=h.2jvw6cxnmirr
>
>
> See the footnote below section 3.4. For the current HDFS pipeline
> implementation, it could be a problem for replication in HBase, though it
> rarely happens.
>
> And now HBase has its own AsyncFSWAL implementation, HBASE-14004 is used to
> resolve the problem(although later we make things wrong and HBASE-24625 is
> the fix).
>
> And for WAL recovery, it will not be a problem. We will only return success
> to client after all the replicas have been successfully committed, so if
> DN2 goes offline, we will close the current file and commit it, and open a
> new file to write WAL.
>
> Thanks.
>
> Paul Carey <pa...@gmail.com> 于2020年7月3日周五 下午7:40写道：
>
> > >  If the hdfs write succeeded while u had only one DN available, then the
> > other replica on the offline DN would be invalid now.
> >
> > Interesting, I wasn't aware of this. Are there any docs you could
> > point me towards where this is described? I've had a look in Hadoop:
> > The Definitive Guide and the official docs, but hadn't come across
> > this.
> >
> > On Fri, Jul 3, 2020 at 11:19 AM Wellington Chevreuil
> > <we...@gmail.com> wrote:
> > >
> > > This is actually an hdfs consistency question, not hbase. If the hdfs
> > write
> > > succeeded while u had only one DN available, then the other replica on
> > the
> > > offline DN would be invalid now. Then what u have is an under replicated
> > > block, and of your only available DN goes offline before it could be
> > > replicated, the file that block belongs to now is corrupt. If I turn on
> > the
> > > previous offline DN, it would still be corrupt as the replica it has is
> > not
> > > valid anymore (NN knows which is the last valid version of the replica),
> > so
> > > unless u can bring back the DN that has the only valid replica, your
> > hfilr
> > > is corrupt and your data is lost.
> > >
> > > On Fri, 3 Jul 2020, 09:12 Paul Carey, <pa...@gmail.com> wrote:
> > >
> > > > Hi
> > > >
> > > > I'd like to understand how HBase deals with the situation where the
> > > > only available DataNodes for a given offline Region contain stale
> > > > data. Will HBase allow the Region to be brought online again,
> > > > effectively making the inconsistency permanent, or will it refuse to
> > > > do so?
> > > >
> > > > My question is motivated from seeing how Kafka and Elasticsearch
> > > > handle this scenario. They both allow the inconsistency to become
> > > > permanent, Kafka via unclean leader election, and Elasticsearch via
> > > > the allocate_stale_primary command.
> > > >
> > > > To better understand my question, please consider the following
> > example:
> > > >
> > > > - HDFS is configured with `dfs.replication=2` and
> > > > `dfs.namenode.replication.min=1`
> > > > - DataNodes DN1 and DN2 contain the blocks for Region R1
> > > > - DN2 goes offline
> > > > - R1 receives a writes which succeeds as it can be written
> > successfully to
> > > > DN1
> > > > - DN1 goes offline before the NameNode can replicate the
> > > > under-replicated block containing the write to another DataNode
> > > > - At this point the R1 is offline
> > > > - DN2 comes back online, but it does not contain the missed write
> > > >
> > > > There are now two options:
> > > >
> > > > - R1 is brought back online, violating consistency
> > > > - R1 remains offline, indefinitely, until DN1 is brought back online
> > > >
> > > > How does HBase deal with this situation?
> > > >
> > > > Many thanks
> > > >
> > > > Paul
> > > >
> >

Re: Violating strong consistency after the fact

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

You can see my design doc for async dfs output

https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#heading=h.2jvw6cxnmirr


See the footnote below section 3.4. For the current HDFS pipeline
implementation, it could be a problem for replication in HBase, though it
rarely happens.

And now HBase has its own AsyncFSWAL implementation, HBASE-14004 is used to
resolve the problem(although later we make things wrong and HBASE-24625 is
the fix).

And for WAL recovery, it will not be a problem. We will only return success
to client after all the replicas have been successfully committed, so if
DN2 goes offline, we will close the current file and commit it, and open a
new file to write WAL.

Thanks.

Paul Carey <pa...@gmail.com> 于2020年7月3日周五 下午7:40写道：

> >  If the hdfs write succeeded while u had only one DN available, then the
> other replica on the offline DN would be invalid now.
>
> Interesting, I wasn't aware of this. Are there any docs you could
> point me towards where this is described? I've had a look in Hadoop:
> The Definitive Guide and the official docs, but hadn't come across
> this.
>
> On Fri, Jul 3, 2020 at 11:19 AM Wellington Chevreuil
> <we...@gmail.com> wrote:
> >
> > This is actually an hdfs consistency question, not hbase. If the hdfs
> write
> > succeeded while u had only one DN available, then the other replica on
> the
> > offline DN would be invalid now. Then what u have is an under replicated
> > block, and of your only available DN goes offline before it could be
> > replicated, the file that block belongs to now is corrupt. If I turn on
> the
> > previous offline DN, it would still be corrupt as the replica it has is
> not
> > valid anymore (NN knows which is the last valid version of the replica),
> so
> > unless u can bring back the DN that has the only valid replica, your
> hfilr
> > is corrupt and your data is lost.
> >
> > On Fri, 3 Jul 2020, 09:12 Paul Carey, <pa...@gmail.com> wrote:
> >
> > > Hi
> > >
> > > I'd like to understand how HBase deals with the situation where the
> > > only available DataNodes for a given offline Region contain stale
> > > data. Will HBase allow the Region to be brought online again,
> > > effectively making the inconsistency permanent, or will it refuse to
> > > do so?
> > >
> > > My question is motivated from seeing how Kafka and Elasticsearch
> > > handle this scenario. They both allow the inconsistency to become
> > > permanent, Kafka via unclean leader election, and Elasticsearch via
> > > the allocate_stale_primary command.
> > >
> > > To better understand my question, please consider the following
> example:
> > >
> > > - HDFS is configured with `dfs.replication=2` and
> > > `dfs.namenode.replication.min=1`
> > > - DataNodes DN1 and DN2 contain the blocks for Region R1
> > > - DN2 goes offline
> > > - R1 receives a writes which succeeds as it can be written
> successfully to
> > > DN1
> > > - DN1 goes offline before the NameNode can replicate the
> > > under-replicated block containing the write to another DataNode
> > > - At this point the R1 is offline
> > > - DN2 comes back online, but it does not contain the missed write
> > >
> > > There are now two options:
> > >
> > > - R1 is brought back online, violating consistency
> > > - R1 remains offline, indefinitely, until DN1 is brought back online
> > >
> > > How does HBase deal with this situation?
> > >
> > > Many thanks
> > >
> > > Paul
> > >
>

Re: Violating strong consistency after the fact

Posted by Paul Carey <pa...@gmail.com>.

>  If the hdfs write succeeded while u had only one DN available, then the other replica on the offline DN would be invalid now.

Interesting, I wasn't aware of this. Are there any docs you could
point me towards where this is described? I've had a look in Hadoop:
The Definitive Guide and the official docs, but hadn't come across
this.

On Fri, Jul 3, 2020 at 11:19 AM Wellington Chevreuil
<we...@gmail.com> wrote:
>
> This is actually an hdfs consistency question, not hbase. If the hdfs write
> succeeded while u had only one DN available, then the other replica on the
> offline DN would be invalid now. Then what u have is an under replicated
> block, and of your only available DN goes offline before it could be
> replicated, the file that block belongs to now is corrupt. If I turn on the
> previous offline DN, it would still be corrupt as the replica it has is not
> valid anymore (NN knows which is the last valid version of the replica), so
> unless u can bring back the DN that has the only valid replica, your hfilr
> is corrupt and your data is lost.
>
> On Fri, 3 Jul 2020, 09:12 Paul Carey, <pa...@gmail.com> wrote:
>
> > Hi
> >
> > I'd like to understand how HBase deals with the situation where the
> > only available DataNodes for a given offline Region contain stale
> > data. Will HBase allow the Region to be brought online again,
> > effectively making the inconsistency permanent, or will it refuse to
> > do so?
> >
> > My question is motivated from seeing how Kafka and Elasticsearch
> > handle this scenario. They both allow the inconsistency to become
> > permanent, Kafka via unclean leader election, and Elasticsearch via
> > the allocate_stale_primary command.
> >
> > To better understand my question, please consider the following example:
> >
> > - HDFS is configured with `dfs.replication=2` and
> > `dfs.namenode.replication.min=1`
> > - DataNodes DN1 and DN2 contain the blocks for Region R1
> > - DN2 goes offline
> > - R1 receives a writes which succeeds as it can be written successfully to
> > DN1
> > - DN1 goes offline before the NameNode can replicate the
> > under-replicated block containing the write to another DataNode
> > - At this point the R1 is offline
> > - DN2 comes back online, but it does not contain the missed write
> >
> > There are now two options:
> >
> > - R1 is brought back online, violating consistency
> > - R1 remains offline, indefinitely, until DN1 is brought back online
> >
> > How does HBase deal with this situation?
> >
> > Many thanks
> >
> > Paul
> >

Re: Violating strong consistency after the fact

Posted by Wellington Chevreuil <we...@gmail.com>.

This is actually an hdfs consistency question, not hbase. If the hdfs write
succeeded while u had only one DN available, then the other replica on the
offline DN would be invalid now. Then what u have is an under replicated
block, and of your only available DN goes offline before it could be
replicated, the file that block belongs to now is corrupt. If I turn on the
previous offline DN, it would still be corrupt as the replica it has is not
valid anymore (NN knows which is the last valid version of the replica), so
unless u can bring back the DN that has the only valid replica, your hfilr
is corrupt and your data is lost.

On Fri, 3 Jul 2020, 09:12 Paul Carey, <pa...@gmail.com> wrote:

> Hi
>
> I'd like to understand how HBase deals with the situation where the
> only available DataNodes for a given offline Region contain stale
> data. Will HBase allow the Region to be brought online again,
> effectively making the inconsistency permanent, or will it refuse to
> do so?
>
> My question is motivated from seeing how Kafka and Elasticsearch
> handle this scenario. They both allow the inconsistency to become
> permanent, Kafka via unclean leader election, and Elasticsearch via
> the allocate_stale_primary command.
>
> To better understand my question, please consider the following example:
>
> - HDFS is configured with `dfs.replication=2` and
> `dfs.namenode.replication.min=1`
> - DataNodes DN1 and DN2 contain the blocks for Region R1
> - DN2 goes offline
> - R1 receives a writes which succeeds as it can be written successfully to
> DN1
> - DN1 goes offline before the NameNode can replicate the
> under-replicated block containing the write to another DataNode
> - At this point the R1 is offline
> - DN2 comes back online, but it does not contain the missed write
>
> There are now two options:
>
> - R1 is brought back online, violating consistency
> - R1 remains offline, indefinitely, until DN1 is brought back online
>
> How does HBase deal with this situation?
>
> Many thanks
>
> Paul
>