You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Seth Ladd <se...@gmail.com> on 2009/12/11 19:20:11 UTC

When does a row become highly available?

Aloha,

We're currently investigating HBase (0.20.2) and are really enjoying
the experience.  We're now curious how much High Availability we
should expect.  Specifically, after we insert a row into HBase, when
does it become HA?  That is, is it immediately shared across multiple
nodes in the cluster?  I don't quite understand the relationship
between a Region and its backing file in HDFS.

Thanks for any tips or background you can provide.

Seth

Re: When does a row become highly available?

Posted by Seth Ladd <se...@gmail.com>.

Thanks for the tip!  If 0.21 has the "durable rows" fix, then I'd love
to try it.  Without, it'll be hard to recommend HBase (though,
everything else so far is a great fit)

I'm using the bundled ec2 scripts, which default to existing AMIs.
This will be a good opportunity to try to build AMI's myself using the
bundled scripts.

Thanks again, once I have it up and tested I'll let the list know how it goes,
Seth

On Fri, Dec 11, 2009 at 1:21 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> You can already do it if you want. The Hadoop's 0.21 branch is pretty
> stable since they are QA'ing it at the moment. HBase's trunk (which
> will become 0.21) is also quite stable so you could easily do the same
> test.
>
> J-D
>
> On Fri, Dec 11, 2009 at 1:16 PM, Seth Ladd <se...@gmail.com> wrote:
>> Thanks for the open and informative reply. Looking forward to testing 0.21
>> when available!
>>
>> On Dec 11, 2009, at 11:36 AM, Andrew Purtell <ap...@apache.org> wrote:
>>
>>> Currently HDFS does not guarantee that a write is fully replicated before
>>> a sync() call completes. The problem is the write appears to complete from
>>> the client's perspective -- HBase completes the write RPC -- but really it
>>> should be blocked for some further period of time. The client won't get a
>>> failure indication when instead it should so it can know it must retry the
>>> write. There are configuration options which can narrow this window but
>>> until HDFS has a working sync() not close it shut tight.
>>>
>>> HBase is a "special" client of HDFS in many respects, so while this is
>>> obviously really important for us, it is not so for the majority of HDFS
>>> users which run mapreduce jobs on it. HDFS level failures leading to data
>>> loss result in task retries and recreation of any temporary data lost, no
>>> harm done. So it has been some time coming. Getting a working sync() in
>>> Hadoop 0.21 is finally going to happen for us.
>>>
>>>  - Andy
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Jean-Daniel Cryans <jd...@apache.org>
>>> To: hbase-user@hadoop.apache.org
>>> Sent: Fri, December 11, 2009 10:59:55 AM
>>> Subject: Re: When does a row become highly available?
>>>
>>> That's the not so working HDFS append feature showing it's ugly face,
>>> small amounts of data can be lost (configurable max of ~62MB).
>>>
>>> J-D
>>>
>>> On Fri, Dec 11, 2009 at 10:55 AM, Seth Ladd <se...@gmail.com> wrote:
>>>>>>
>>>>>> Which confuses me, if the write goes straight to a RegionServer, but
>>>>>> then the RegionServer fails before the MemStore is flushed, did I just
>>>>>> lose data?
>>>>>
>>>>> No that's the goal of the write-ahead-log (WAL).
>>>>
>>>> Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
>>>> instances, 1 master, and 3 slaves.
>>>>
>>>> I created a table, and inserted a single row.
>>>> I performed a read (get) to test the insert, and sure enough the row
>>>> was returned.
>>>> I then noted which slave held the table, and terminated the slave via
>>>> the AWS management console.
>>>> I then waited approx 30 seconds.
>>>> I used the web interfaces (port 60030 and 60010) to note that the
>>>> region was indeed moved to another slave.
>>>> I performed a read on the same row, but did *not* find the row.
>>>>
>>>> So it looks like the region for the table was moved, but no data was
>>>> moved over.
>>>>
>>>> Was that a valid test?  I would expect the row to get moved with the
>>>> region.
>>>>
>>>> Thanks,
>>>> Seth
>>>>
>>>
>>>
>>>
>>
>

Re: When does a row become highly available?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

You can already do it if you want. The Hadoop's 0.21 branch is pretty
stable since they are QA'ing it at the moment. HBase's trunk (which
will become 0.21) is also quite stable so you could easily do the same
test.

J-D

On Fri, Dec 11, 2009 at 1:16 PM, Seth Ladd <se...@gmail.com> wrote:
> Thanks for the open and informative reply. Looking forward to testing 0.21
> when available!
>
> On Dec 11, 2009, at 11:36 AM, Andrew Purtell <ap...@apache.org> wrote:
>
>> Currently HDFS does not guarantee that a write is fully replicated before
>> a sync() call completes. The problem is the write appears to complete from
>> the client's perspective -- HBase completes the write RPC -- but really it
>> should be blocked for some further period of time. The client won't get a
>> failure indication when instead it should so it can know it must retry the
>> write. There are configuration options which can narrow this window but
>> until HDFS has a working sync() not close it shut tight.
>>
>> HBase is a "special" client of HDFS in many respects, so while this is
>> obviously really important for us, it is not so for the majority of HDFS
>> users which run mapreduce jobs on it. HDFS level failures leading to data
>> loss result in task retries and recreation of any temporary data lost, no
>> harm done. So it has been some time coming. Getting a working sync() in
>> Hadoop 0.21 is finally going to happen for us.
>>
>>  - Andy
>>
>>
>>
>>
>>
>> ________________________________
>> From: Jean-Daniel Cryans <jd...@apache.org>
>> To: hbase-user@hadoop.apache.org
>> Sent: Fri, December 11, 2009 10:59:55 AM
>> Subject: Re: When does a row become highly available?
>>
>> That's the not so working HDFS append feature showing it's ugly face,
>> small amounts of data can be lost (configurable max of ~62MB).
>>
>> J-D
>>
>> On Fri, Dec 11, 2009 at 10:55 AM, Seth Ladd <se...@gmail.com> wrote:
>>>>>
>>>>> Which confuses me, if the write goes straight to a RegionServer, but
>>>>> then the RegionServer fails before the MemStore is flushed, did I just
>>>>> lose data?
>>>>
>>>> No that's the goal of the write-ahead-log (WAL).
>>>
>>> Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
>>> instances, 1 master, and 3 slaves.
>>>
>>> I created a table, and inserted a single row.
>>> I performed a read (get) to test the insert, and sure enough the row
>>> was returned.
>>> I then noted which slave held the table, and terminated the slave via
>>> the AWS management console.
>>> I then waited approx 30 seconds.
>>> I used the web interfaces (port 60030 and 60010) to note that the
>>> region was indeed moved to another slave.
>>> I performed a read on the same row, but did *not* find the row.
>>>
>>> So it looks like the region for the table was moved, but no data was
>>> moved over.
>>>
>>> Was that a valid test?  I would expect the row to get moved with the
>>> region.
>>>
>>> Thanks,
>>> Seth
>>>
>>
>>
>>
>

Re: When does a row become highly available?

Posted by Seth Ladd <se...@gmail.com>.

Thanks for the open and informative reply. Looking forward to testing  
0.21 when available!

On Dec 11, 2009, at 11:36 AM, Andrew Purtell <ap...@apache.org>  
wrote:

> Currently HDFS does not guarantee that a write is fully replicated  
> before
> a sync() call completes. The problem is the write appears to  
> complete from
> the client's perspective -- HBase completes the write RPC -- but  
> really it
> should be blocked for some further period of time. The client won't  
> get a
> failure indication when instead it should so it can know it must  
> retry the
> write. There are configuration options which can narrow this window  
> but
> until HDFS has a working sync() not close it shut tight.
>
> HBase is a "special" client of HDFS in many respects, so while this is
> obviously really important for us, it is not so for the majority of  
> HDFS
> users which run mapreduce jobs on it. HDFS level failures leading to  
> data
> loss result in task retries and recreation of any temporary data  
> lost, no
> harm done. So it has been some time coming. Getting a working sync()  
> in
> Hadoop 0.21 is finally going to happen for us.
>
>   - Andy
>
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: hbase-user@hadoop.apache.org
> Sent: Fri, December 11, 2009 10:59:55 AM
> Subject: Re: When does a row become highly available?
>
> That's the not so working HDFS append feature showing it's ugly face,
> small amounts of data can be lost (configurable max of ~62MB).
>
> J-D
>
> On Fri, Dec 11, 2009 at 10:55 AM, Seth Ladd <se...@gmail.com>  
> wrote:
>>>> Which confuses me, if the write goes straight to a RegionServer,  
>>>> but
>>>> then the RegionServer fails before the MemStore is flushed, did I  
>>>> just
>>>> lose data?
>>>
>>> No that's the goal of the write-ahead-log (WAL).
>>
>> Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
>> instances, 1 master, and 3 slaves.
>>
>> I created a table, and inserted a single row.
>> I performed a read (get) to test the insert, and sure enough the row
>> was returned.
>> I then noted which slave held the table, and terminated the slave via
>> the AWS management console.
>> I then waited approx 30 seconds.
>> I used the web interfaces (port 60030 and 60010) to note that the
>> region was indeed moved to another slave.
>> I performed a read on the same row, but did *not* find the row.
>>
>> So it looks like the region for the table was moved, but no data  
>> was moved over.
>>
>> Was that a valid test?  I would expect the row to get moved with  
>> the region.
>>
>> Thanks,
>> Seth
>>
>
>
>

Re: When does a row become highly available?

Posted by Andrew Purtell <ap...@apache.org>.

Currently HDFS does not guarantee that a write is fully replicated before
a sync() call completes. The problem is the write appears to complete from
the client's perspective -- HBase completes the write RPC -- but really it
should be blocked for some further period of time. The client won't get a
failure indication when instead it should so it can know it must retry the
write. There are configuration options which can narrow this window but
until HDFS has a working sync() not close it shut tight. 

HBase is a "special" client of HDFS in many respects, so while this is
obviously really important for us, it is not so for the majority of HDFS
users which run mapreduce jobs on it. HDFS level failures leading to data
loss result in task retries and recreation of any temporary data lost, no
harm done. So it has been some time coming. Getting a working sync() in
Hadoop 0.21 is finally going to happen for us.

   - Andy

________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
To: hbase-user@hadoop.apache.org
Sent: Fri, December 11, 2009 10:59:55 AM
Subject: Re: When does a row become highly available?

That's the not so working HDFS append feature showing it's ugly face,
small amounts of data can be lost (configurable max of ~62MB).

J-D

On Fri, Dec 11, 2009 at 10:55 AM, Seth Ladd <se...@gmail.com> wrote:
>>> Which confuses me, if the write goes straight to a RegionServer, but
>>> then the RegionServer fails before the MemStore is flushed, did I just
>>> lose data?
>>
>> No that's the goal of the write-ahead-log (WAL).
>
> Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
> instances, 1 master, and 3 slaves.
>
> I created a table, and inserted a single row.
> I performed a read (get) to test the insert, and sure enough the row
> was returned.
> I then noted which slave held the table, and terminated the slave via
> the AWS management console.
> I then waited approx 30 seconds.
> I used the web interfaces (port 60030 and 60010) to note that the
> region was indeed moved to another slave.
> I performed a read on the same row, but did *not* find the row.
>
> So it looks like the region for the table was moved, but no data was moved over.
>
> Was that a valid test?  I would expect the row to get moved with the region.
>
> Thanks,
> Seth
>

Re: When does a row become highly available?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

That's the not so working HDFS append feature showing it's ugly face,
small amounts of data can be lost (configurable max of ~62MB).

J-D

On Fri, Dec 11, 2009 at 10:55 AM, Seth Ladd <se...@gmail.com> wrote:
>>> Which confuses me, if the write goes straight to a RegionServer, but
>>> then the RegionServer fails before the MemStore is flushed, did I just
>>> lose data?
>>
>> No that's the goal of the write-ahead-log (WAL).
>
> Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
> instances, 1 master, and 3 slaves.
>
> I created a table, and inserted a single row.
> I performed a read (get) to test the insert, and sure enough the row
> was returned.
> I then noted which slave held the table, and terminated the slave via
> the AWS management console.
> I then waited approx 30 seconds.
> I used the web interfaces (port 60030 and 60010) to note that the
> region was indeed moved to another slave.
> I performed a read on the same row, but did *not* find the row.
>
> So it looks like the region for the table was moved, but no data was moved over.
>
> Was that a valid test?  I would expect the row to get moved with the region.
>
> Thanks,
> Seth
>

RE: When does a row become highly available?

Posted by HBASE <hb...@patientcentral.com>.

Also, as I recall one may set the replication factor both when
1. creating the table
2. inserting data

Anyone please correct me if I'm incorrect.

I set things up before, added a new server, and noticed that existing tables didn't automatically update the replication factor to the new dfs.replication after I restarted everything. I ran the following command to correct it:

bin/hadoop dfs -setrep -R -w 3 /
(sets to replication factor of three for everything under the / in HDFS)

HTH
-Matt Davies

-----Original Message-----
From: HBASE [mailto:hbase@patientcentral.com] 
Sent: Friday, December 11, 2009 11:59 AM
To: hbase-user@hadoop.apache.org
Subject: RE: When does a row become highly available?

Seth,

Have you updated the default dfs.replication from 1 to some other value?

Best,
Matt Davies

-----Original Message-----
From: Seth Ladd [mailto:sethladd@gmail.com] 
Sent: Friday, December 11, 2009 11:55 AM
To: hbase-user@hadoop.apache.org
Subject: Re: When does a row become highly available?

>> Which confuses me, if the write goes straight to a RegionServer, but
>> then the RegionServer fails before the MemStore is flushed, did I just
>> lose data?
>
> No that's the goal of the write-ahead-log (WAL).

Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
instances, 1 master, and 3 slaves.

I created a table, and inserted a single row.
I performed a read (get) to test the insert, and sure enough the row
was returned.
I then noted which slave held the table, and terminated the slave via
the AWS management console.
I then waited approx 30 seconds.
I used the web interfaces (port 60030 and 60010) to note that the
region was indeed moved to another slave.
I performed a read on the same row, but did *not* find the row.

So it looks like the region for the table was moved, but no data was moved over.

Was that a valid test?  I would expect the row to get moved with the region.

Thanks,
Seth

RE: When does a row become highly available?

Posted by HBASE <hb...@patientcentral.com>.

Seth,

Have you updated the default dfs.replication from 1 to some other value?

Best,
Matt Davies

-----Original Message-----
From: Seth Ladd [mailto:sethladd@gmail.com] 
Sent: Friday, December 11, 2009 11:55 AM
To: hbase-user@hadoop.apache.org
Subject: Re: When does a row become highly available?

>> Which confuses me, if the write goes straight to a RegionServer, but
>> then the RegionServer fails before the MemStore is flushed, did I just
>> lose data?
>
> No that's the goal of the write-ahead-log (WAL).

Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
instances, 1 master, and 3 slaves.

I created a table, and inserted a single row.
I performed a read (get) to test the insert, and sure enough the row
was returned.
I then noted which slave held the table, and terminated the slave via
the AWS management console.
I then waited approx 30 seconds.
I used the web interfaces (port 60030 and 60010) to note that the
region was indeed moved to another slave.
I performed a read on the same row, but did *not* find the row.

So it looks like the region for the table was moved, but no data was moved over.

Was that a valid test?  I would expect the row to get moved with the region.

Thanks,
Seth

Re: When does a row become highly available?

Posted by Seth Ladd <se...@gmail.com>.

>> Which confuses me, if the write goes straight to a RegionServer, but
>> then the RegionServer fails before the MemStore is flushed, did I just
>> lose data?
>
> No that's the goal of the write-ahead-log (WAL).

Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
instances, 1 master, and 3 slaves.

I created a table, and inserted a single row.
I performed a read (get) to test the insert, and sure enough the row
was returned.
I then noted which slave held the table, and terminated the slave via
the AWS management console.
I then waited approx 30 seconds.
I used the web interfaces (port 60030 and 60010) to note that the
region was indeed moved to another slave.
I performed a read on the same row, but did *not* find the row.

So it looks like the region for the table was moved, but no data was moved over.

Was that a valid test?  I would expect the row to get moved with the region.

Thanks,
Seth

Re: When does a row become highly available?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Yes, you can do a bin/hadoop dfs -ls /hbase/.logs and see them all.

J-D

On Fri, Dec 11, 2009 at 10:50 AM, Seth Ladd <se...@gmail.com> wrote:
>>> So do all writes go through the Master?  Clearly I'm a bit confused here :)
>>
>> No. The Region Server logs every write in the WAL. If the machine
>> fails, then whatever is in that WAL will be replayed by the Master
>> because he's the one noticing the failure. He will then redistribute
>
> Ah, is the WAL stored in HDFS as well?
>
> Thanks for your helpful and quick replies,
> Seth
>

Re: When does a row become highly available?

Posted by Seth Ladd <se...@gmail.com>.

>> So do all writes go through the Master?  Clearly I'm a bit confused here :)
>
> No. The Region Server logs every write in the WAL. If the machine
> fails, then whatever is in that WAL will be replayed by the Master
> because he's the one noticing the failure. He will then redistribute

Ah, is the WAL stored in HDFS as well?

Thanks for your helpful and quick replies,
Seth

Re: When does a row become highly available?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

On Fri, Dec 11, 2009 at 10:35 AM, Seth Ladd <se...@gmail.com> wrote:
>> You are talking about durability, not HA.
>
> Good point, thanks.  I meant HA for the data, but data durability
> makes more sense.
>
>> To have a better understanding I recommend reading our architecture
>> page http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture and the
>> Bigtable paper.
>
> Thanks, I've been studying that today.
>
>> In short, when you write a row it goes into the write-ahead-log and
>> then right after that in MemStore. Once the MemStore is full (64MB) or
>> for some other reasons, it is flushed to disk where the file is
>> replicated (transparently).
>
> Each RegionStore has its own WAL, yes?  From the Architecture page:

Each Region Server, I don't know what RegionStore is ;)

>
> When a write request is received, it is first written to a write-ahead
> log called a HLog. All write requests for every region the region
> server is serving are written to the same log. Once the request has
> been written to the HLog, it is stored in an in-memory cache called
> the Memcache. There is one Memcache for each HStore.
>
> Which confuses me, if the write goes straight to a RegionServer, but
> then the RegionServer fails before the MemStore is flushed, did I just
> lose data?

No that's the goal of the write-ahead-log (WAL).

>
>> If the node fails, the Master will process the WAL so that you don't
>
> So do all writes go through the Master?  Clearly I'm a bit confused here :)

No. The Region Server logs every write in the WAL. If the machine
fails, then whatever is in that WAL will be replayed by the Master
because he's the one noticing the failure. He will then redistribute
the parts of the WAL to other region servers that get assigned with
the region from the dead node.

>
>> lose rows in the MemStore. Prior to Hadoop 0.21 (unreleased), the
>
> Moral of the story is to upgrade to 0.21 ASAP. :)

yes

>
> Thanks!
>
> Seth
>

Re: When does a row become highly available?

Posted by Seth Ladd <se...@gmail.com>.

> You are talking about durability, not HA.

Good point, thanks.  I meant HA for the data, but data durability
makes more sense.

> To have a better understanding I recommend reading our architecture
> page http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture and the
> Bigtable paper.

Thanks, I've been studying that today.

> In short, when you write a row it goes into the write-ahead-log and
> then right after that in MemStore. Once the MemStore is full (64MB) or
> for some other reasons, it is flushed to disk where the file is
> replicated (transparently).

Each RegionStore has its own WAL, yes?  From the Architecture page:

When a write request is received, it is first written to a write-ahead
log called a HLog. All write requests for every region the region
server is serving are written to the same log. Once the request has
been written to the HLog, it is stored in an in-memory cache called
the Memcache. There is one Memcache for each HStore.

Which confuses me, if the write goes straight to a RegionServer, but
then the RegionServer fails before the MemStore is flushed, did I just
lose data?

> If the node fails, the Master will process the WAL so that you don't

So do all writes go through the Master?  Clearly I'm a bit confused here :)

> lose rows in the MemStore. Prior to Hadoop 0.21 (unreleased), the

Moral of the story is to upgrade to 0.21 ASAP. :)

Thanks!

Seth

Re: When does a row become highly available?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Seth,

You are talking about durability, not HA.

To have a better understanding I recommend reading our architecture
page http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture and the
Bigtable paper.

In short, when you write a row it goes into the write-ahead-log and
then right after that in MemStore. Once the MemStore is full (64MB) or
for some other reasons, it is flushed to disk where the file is
replicated (transparently).

If the node fails, the Master will process the WAL so that you don't
lose rows in the MemStore. Prior to Hadoop 0.21 (unreleased), the
append feature is badly crippled so there's a chance that edits
written to the WAL may be lost because HDFS can't guarantee fs sync.

J-D

On Fri, Dec 11, 2009 at 10:20 AM, Seth Ladd <se...@gmail.com> wrote:
> Aloha,
>
> We're currently investigating HBase (0.20.2) and are really enjoying
> the experience.  We're now curious how much High Availability we
> should expect.  Specifically, after we insert a row into HBase, when
> does it become HA?  That is, is it immediately shared across multiple
> nodes in the cluster?  I don't quite understand the relationship
> between a Region and its backing file in HDFS.
>
> Thanks for any tips or background you can provide.
>
> Seth
>

Re: When does a row become highly available?

Posted by stack <st...@duboce.net>.

Short answer: Its available immediately.  Row is locked for update.
Subsequent reads will find the update.

Long answer: Judging by your question, a quick read of the BigTable paper is
in order I'd say.  $your_favorite_search_engine 'google bigtable'.

Ask more questions,
St.Ack

On Fri, Dec 11, 2009 at 10:20 AM, Seth Ladd <se...@gmail.com> wrote:

> Aloha,
>
> We're currently investigating HBase (0.20.2) and are really enjoying
> the experience.  We're now curious how much High Availability we
> should expect.  Specifically, after we insert a row into HBase, when
> does it become HA?  That is, is it immediately shared across multiple
> nodes in the cluster?  I don't quite understand the relationship
> between a Region and its backing file in HDFS.
>
> Thanks for any tips or background you can provide.
>
> Seth
>