You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Lucas Stanley <lu...@gmail.com> on 2013/06/11 02:26:39 UTC

HBase failure scenarios

Hi,

In the Strata 2013 training lectures, Jonathan Hsieh from Cloudera said
something about HBase syncs which I'm trying to understand further.

He said that HBase sync guarantees only that a write goes to the local disk
on the region server responsible for that region and in-memory copies go on
2 other machines in the HBase cluster.

But I thought that when the write goes to the WAL on the first region
server, that the HDFS append would push that write to 3 machines total in
the HDFS cluster. In order for the append write to the WAL to be
successful, doesn't the DataNode on that machine have to pipeline the write
to 2 other DataNodes?

I'm not sure what Jonathan was referring to when he said that 2 in-memory
copies go to other HBase machines? Even when the memstore on the first
region server gets full, doesn't the flush to the HFile get written on 3
HDFS nodes in total?

Re: HBase failure scenarios

Posted by yonghu <yo...@gmail.com>.

Hi Lucas,

First, the write request for HBase consists of two parts:
1. Write into WAL;
2. Write into Memstore, when Memstore reaches the threshold, the data in
Memstore will be flushed into disk.

In my understanding, there are two data synchronization points:

The first one is write to WAL. As WAL is persistent on the local disk, it
will be propagated into the other 2 nodes (suppose the replica number is 3).
The second on is when Memstore reaches the threshold, and the data in
Memsotre will be flushed into disk. When this happens, it will also cause
the the pipeline data writing.

regards

Yong

On Tue, Jun 11, 2013 at 2:39 AM, Lucas Stanley <lu...@gmail.com> wrote:

> Thanks Azuryy!
>
> So, when a write is successful to the WAL on the responsible region server,
> in fact that means that the write was committed to 3 total DataNodes,
> correct?
>
>
> On Mon, Jun 10, 2013 at 5:37 PM, Azuryy Yu <az...@gmail.com> wrote:
>
> > yes. datanode write is pipeline. and only if pipeline writing finished,
> dn
> > return ok.
> >
> > --Send from my Sony mobile.
> > On Jun 11, 2013 8:27 AM, "Lucas Stanley" <lu...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > In the Strata 2013 training lectures, Jonathan Hsieh from Cloudera said
> > > something about HBase syncs which I'm trying to understand further.
> > >
> > > He said that HBase sync guarantees only that a write goes to the local
> > disk
> > > on the region server responsible for that region and in-memory copies
> go
> > on
> > > 2 other machines in the HBase cluster.
> > >
> > > But I thought that when the write goes to the WAL on the first region
> > > server, that the HDFS append would push that write to 3 machines total
> in
> > > the HDFS cluster. In order for the append write to the WAL to be
> > > successful, doesn't the DataNode on that machine have to pipeline the
> > write
> > > to 2 other DataNodes?
> > >
> > > I'm not sure what Jonathan was referring to when he said that 2
> in-memory
> > > copies go to other HBase machines? Even when the memstore on the first
> > > region server gets full, doesn't the flush to the HFile get written on
> 3
> > > HDFS nodes in total?
> > >
> >
>

Re: HBase failure scenarios

Posted by Azuryy Yu <az...@gmail.com>.

yes,
when a write is successful to the WAL on the responsible region server,
in fact that means that the write was committed to 3 total DataNodes

--Send from my Sony mobile.
On Jun 11, 2013 8:40 AM, "Lucas Stanley" <lu...@gmail.com> wrote:

> Thanks Azuryy!
>
> So, when a write is successful to the WAL on the responsible region server,
> in fact that means that the write was committed to 3 total DataNodes,
> correct?
>
>
> On Mon, Jun 10, 2013 at 5:37 PM, Azuryy Yu <az...@gmail.com> wrote:
>
> > yes. datanode write is pipeline. and only if pipeline writing finished,
> dn
> > return ok.
> >
> > --Send from my Sony mobile.
> > On Jun 11, 2013 8:27 AM, "Lucas Stanley" <lu...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > In the Strata 2013 training lectures, Jonathan Hsieh from Cloudera said
> > > something about HBase syncs which I'm trying to understand further.
> > >
> > > He said that HBase sync guarantees only that a write goes to the local
> > disk
> > > on the region server responsible for that region and in-memory copies
> go
> > on
> > > 2 other machines in the HBase cluster.
> > >
> > > But I thought that when the write goes to the WAL on the first region
> > > server, that the HDFS append would push that write to 3 machines total
> in
> > > the HDFS cluster. In order for the append write to the WAL to be
> > > successful, doesn't the DataNode on that machine have to pipeline the
> > write
> > > to 2 other DataNodes?
> > >
> > > I'm not sure what Jonathan was referring to when he said that 2
> in-memory
> > > copies go to other HBase machines? Even when the memstore on the first
> > > region server gets full, doesn't the flush to the HFile get written on
> 3
> > > HDFS nodes in total?
> > >
> >
>

Re: HBase failure scenarios

Posted by Lucas Stanley <lu...@gmail.com>.

Thanks Azuryy!

So, when a write is successful to the WAL on the responsible region server,
in fact that means that the write was committed to 3 total DataNodes,
correct?


On Mon, Jun 10, 2013 at 5:37 PM, Azuryy Yu <az...@gmail.com> wrote:

> yes. datanode write is pipeline. and only if pipeline writing finished, dn
> return ok.
>
> --Send from my Sony mobile.
> On Jun 11, 2013 8:27 AM, "Lucas Stanley" <lu...@gmail.com> wrote:
>
> > Hi,
> >
> > In the Strata 2013 training lectures, Jonathan Hsieh from Cloudera said
> > something about HBase syncs which I'm trying to understand further.
> >
> > He said that HBase sync guarantees only that a write goes to the local
> disk
> > on the region server responsible for that region and in-memory copies go
> on
> > 2 other machines in the HBase cluster.
> >
> > But I thought that when the write goes to the WAL on the first region
> > server, that the HDFS append would push that write to 3 machines total in
> > the HDFS cluster. In order for the append write to the WAL to be
> > successful, doesn't the DataNode on that machine have to pipeline the
> write
> > to 2 other DataNodes?
> >
> > I'm not sure what Jonathan was referring to when he said that 2 in-memory
> > copies go to other HBase machines? Even when the memstore on the first
> > region server gets full, doesn't the flush to the HFile get written on 3
> > HDFS nodes in total?
> >
>

Re: HBase failure scenarios

Posted by Azuryy Yu <az...@gmail.com>.

yes. datanode write is pipeline. and only if pipeline writing finished, dn
return ok.

--Send from my Sony mobile.
On Jun 11, 2013 8:27 AM, "Lucas Stanley" <lu...@gmail.com> wrote:

> Hi,
>
> In the Strata 2013 training lectures, Jonathan Hsieh from Cloudera said
> something about HBase syncs which I'm trying to understand further.
>
> He said that HBase sync guarantees only that a write goes to the local disk
> on the region server responsible for that region and in-memory copies go on
> 2 other machines in the HBase cluster.
>
> But I thought that when the write goes to the WAL on the first region
> server, that the HDFS append would push that write to 3 machines total in
> the HDFS cluster. In order for the append write to the WAL to be
> successful, doesn't the DataNode on that machine have to pipeline the write
> to 2 other DataNodes?
>
> I'm not sure what Jonathan was referring to when he said that 2 in-memory
> copies go to other HBase machines? Even when the memstore on the first
> region server gets full, doesn't the flush to the HFile get written on 3
> HDFS nodes in total?
>