You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jack Levin <ma...@gmail.com> on 2010/09/22 00:00:44 UTC

all regions unregistered over time.

First, I saw:


2010-09-21 11:30:05,122 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put
ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue
2010-09-21 11:30:05,122 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711
2010-09-21 11:30:05,122 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
of server 10.103.2.5,60020,1285042335711: logSplit: false,
rootRescanned: false, n
umberOfMetaRegions: 1, onlineMetaRegions.size(): 0

repeated rapidly for 20 mins or so.

Then:

Bunch of regions got unassigned:


2010-09-21 12:00:07,782 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions
from 10.103.2.3,60020,1285042333293
2010-09-21 12:00:07,782 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Going to close region
img816,img2103r.jpg,1285003791610.1592893332
2010-09-21 12:00:07,782 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Going to close region
img534,92166039.jpg,1284949117852.1009352950
2010-09-21 12:00:07,782 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Going to close region
img36,abcwu.jpg,1285001278990.272235177


Restarting master did not help.  Ultimately what brought the cluster
back up, is full shutdown of regionservers, and masters, and then
bring all up.

Any ideas what might have happened here?

We are running:

HBase Version	0.89.20100726, r979826
Hadoop Version	0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957
Regions On FS	5057

3 zookeepers and 13 regionservers.

-Jack

Re: all regions unregistered over time.

Posted by Stack <st...@duboce.net>.

Fair enough.
St.Ack

On Wed, Sep 22, 2010 at 11:09 AM, Jack Levin <ma...@gmail.com> wrote:
> Lzo of image data, which is already Jpeg?  Probably not a great idea, yes?
>
> -Jack
>
> On Wed, Sep 22, 2010 at 11:06 AM, Stack <st...@duboce.net> wrote:
>> Are you lzo'ing Jack?  If not, you probably should.
>> St.Ack
>>
>> On Wed, Sep 22, 2010 at 3:17 AM, Jack Levin <ma...@gmail.com> wrote:
>>> So our cell sizes will be 350kb on average with 5-10 terabytes per server, I just want to keep the count of Regions under 1000, per server
>>>
>>> -Jack
>>>
>>>
>>> On Sep 22, 2010, at 2:44 AM, Ryan Rawson <ry...@gmail.com> wrote:
>>>
>>>> Region size is one of those tricky things, there are a few factors to consider:
>>>>
>>>> - regions are the basic element of availability and distribution.
>>>> - HBase scales by having regions across many servers.  Thus if you
>>>> have 2 regions for 16GB data, on a 20 node machine you are a net loss
>>>> there.
>>>> - High region count has been known to make things slow, this is
>>>> getting better, but it is probably better to have 700 regions than
>>>> 3000 for the same amount of data.
>>>> - Low region count prevents parallel scalability as per point #2.
>>>> This really cant be stressed enough, since a common problem is loading
>>>> 200MB data into HBase then wondering why your awesome 10 node cluster
>>>> is mostly idle.
>>>> - There is not much memory footprint difference between 1 region and
>>>> 10 in terms of indexes, etc, held by the regionserver.
>>>>
>>>> Generally speaking I stick to the default, go smaller for hot tables,
>>>> or manually split them, and go with a 1GB region size on our largest
>>>> 900 GB table.
>>>>
>>>> -ryan
>>>>
>>>> On Wed, Sep 22, 2010 at 12:01 AM, Jack Levin <ma...@gmail.com> wrote:
>>>>> Yes, I am thinking to put 10 to 15 million files on each regionserver
>>>>> (well, not literally, but be controlled by regionserver).   So thats
>>>>> close to 4 TB worth of regions, which is about 4GB per region should
>>>>> we target 1000 regions per server.  Note, not all files are 'hot', and
>>>>> I only expect to keep about 1% super hot, and 5% relatively hot, the
>>>>> rest are cold.  So in terms of keeping hbase blocks in RAM, that
>>>>> should be adequate, for the rest we can afford a trip into hdfs.
>>>>>
>>>>> If servers are running 8 GB of ram, and are shared for regionservers
>>>>> and datanodes, how much heap should I allocate to each?  6GB for RS
>>>>> and 1GB  for DN?
>>>>>
>>>>> Also, on the question whether 8 core x 16G Ram helps a Master server
>>>>> to bring up the cluster faster, the answer is definitely - yes.   It
>>>>> took only 90 seconds to load 5000 regions across 13 servers, where
>>>>> same task for Dual Core 8G Ram, took nearly 10 minutes.
>>>>>
>>>>> -Jack
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 21, 2010 at 11:38 PM, Stack <st...@duboce.net> wrote:
>>>>>> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <ma...@gmail.com> wrote:
>>>>>>> Its definitely binary, and I can even load it in my browser but
>>>>>>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>>>>>>> application/octet-stream there is no base64 encoding at all.
>>>>>>>
>>>>>>
>>>>>> OK.  Good.  If it were base64'd, you'd see it.
>>>>>>
>>>>>>> Btw, out of curiosity I have region max file size set to 1GB now, but
>>>>>>> what if I set it to say 10G or 50G?  Is their significant overhead in
>>>>>>> address seeking via HDFS?
>>>>>>>
>>>>>>
>>>>>> You could do that.  We don't have much experience running regions of
>>>>>> that size.  You should for sure pre-split your table on creation if
>>>>>> you go this route (See HBaseAdmin API [1]).  This method is not
>>>>>> available in shell so you'd have to script it or write a little java
>>>>>> to do it).
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>> 1. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>>>>>> byte[][])
>>>>>>
>>>>>
>>>
>>
>

Re: all regions unregistered over time.

Posted by Jack Levin <ma...@gmail.com>.

Lzo of image data, which is already Jpeg?  Probably not a great idea, yes?

-Jack

On Wed, Sep 22, 2010 at 11:06 AM, Stack <st...@duboce.net> wrote:
> Are you lzo'ing Jack?  If not, you probably should.
> St.Ack
>
> On Wed, Sep 22, 2010 at 3:17 AM, Jack Levin <ma...@gmail.com> wrote:
>> So our cell sizes will be 350kb on average with 5-10 terabytes per server, I just want to keep the count of Regions under 1000, per server
>>
>> -Jack
>>
>>
>> On Sep 22, 2010, at 2:44 AM, Ryan Rawson <ry...@gmail.com> wrote:
>>
>>> Region size is one of those tricky things, there are a few factors to consider:
>>>
>>> - regions are the basic element of availability and distribution.
>>> - HBase scales by having regions across many servers.  Thus if you
>>> have 2 regions for 16GB data, on a 20 node machine you are a net loss
>>> there.
>>> - High region count has been known to make things slow, this is
>>> getting better, but it is probably better to have 700 regions than
>>> 3000 for the same amount of data.
>>> - Low region count prevents parallel scalability as per point #2.
>>> This really cant be stressed enough, since a common problem is loading
>>> 200MB data into HBase then wondering why your awesome 10 node cluster
>>> is mostly idle.
>>> - There is not much memory footprint difference between 1 region and
>>> 10 in terms of indexes, etc, held by the regionserver.
>>>
>>> Generally speaking I stick to the default, go smaller for hot tables,
>>> or manually split them, and go with a 1GB region size on our largest
>>> 900 GB table.
>>>
>>> -ryan
>>>
>>> On Wed, Sep 22, 2010 at 12:01 AM, Jack Levin <ma...@gmail.com> wrote:
>>>> Yes, I am thinking to put 10 to 15 million files on each regionserver
>>>> (well, not literally, but be controlled by regionserver).   So thats
>>>> close to 4 TB worth of regions, which is about 4GB per region should
>>>> we target 1000 regions per server.  Note, not all files are 'hot', and
>>>> I only expect to keep about 1% super hot, and 5% relatively hot, the
>>>> rest are cold.  So in terms of keeping hbase blocks in RAM, that
>>>> should be adequate, for the rest we can afford a trip into hdfs.
>>>>
>>>> If servers are running 8 GB of ram, and are shared for regionservers
>>>> and datanodes, how much heap should I allocate to each?  6GB for RS
>>>> and 1GB  for DN?
>>>>
>>>> Also, on the question whether 8 core x 16G Ram helps a Master server
>>>> to bring up the cluster faster, the answer is definitely - yes.   It
>>>> took only 90 seconds to load 5000 regions across 13 servers, where
>>>> same task for Dual Core 8G Ram, took nearly 10 minutes.
>>>>
>>>> -Jack
>>>>
>>>>
>>>>
>>>> On Tue, Sep 21, 2010 at 11:38 PM, Stack <st...@duboce.net> wrote:
>>>>> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <ma...@gmail.com> wrote:
>>>>>> Its definitely binary, and I can even load it in my browser but
>>>>>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>>>>>> application/octet-stream there is no base64 encoding at all.
>>>>>>
>>>>>
>>>>> OK.  Good.  If it were base64'd, you'd see it.
>>>>>
>>>>>> Btw, out of curiosity I have region max file size set to 1GB now, but
>>>>>> what if I set it to say 10G or 50G?  Is their significant overhead in
>>>>>> address seeking via HDFS?
>>>>>>
>>>>>
>>>>> You could do that.  We don't have much experience running regions of
>>>>> that size.  You should for sure pre-split your table on creation if
>>>>> you go this route (See HBaseAdmin API [1]).  This method is not
>>>>> available in shell so you'd have to script it or write a little java
>>>>> to do it).
>>>>>
>>>>> St.Ack
>>>>>
>>>>> 1. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>>>>> byte[][])
>>>>>
>>>>
>>
>

Re: all regions unregistered over time.

Posted by Stack <st...@duboce.net>.

Are you lzo'ing Jack?  If not, you probably should.
St.Ack

On Wed, Sep 22, 2010 at 3:17 AM, Jack Levin <ma...@gmail.com> wrote:
> So our cell sizes will be 350kb on average with 5-10 terabytes per server, I just want to keep the count of Regions under 1000, per server
>
> -Jack
>
>
> On Sep 22, 2010, at 2:44 AM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> Region size is one of those tricky things, there are a few factors to consider:
>>
>> - regions are the basic element of availability and distribution.
>> - HBase scales by having regions across many servers.  Thus if you
>> have 2 regions for 16GB data, on a 20 node machine you are a net loss
>> there.
>> - High region count has been known to make things slow, this is
>> getting better, but it is probably better to have 700 regions than
>> 3000 for the same amount of data.
>> - Low region count prevents parallel scalability as per point #2.
>> This really cant be stressed enough, since a common problem is loading
>> 200MB data into HBase then wondering why your awesome 10 node cluster
>> is mostly idle.
>> - There is not much memory footprint difference between 1 region and
>> 10 in terms of indexes, etc, held by the regionserver.
>>
>> Generally speaking I stick to the default, go smaller for hot tables,
>> or manually split them, and go with a 1GB region size on our largest
>> 900 GB table.
>>
>> -ryan
>>
>> On Wed, Sep 22, 2010 at 12:01 AM, Jack Levin <ma...@gmail.com> wrote:
>>> Yes, I am thinking to put 10 to 15 million files on each regionserver
>>> (well, not literally, but be controlled by regionserver).   So thats
>>> close to 4 TB worth of regions, which is about 4GB per region should
>>> we target 1000 regions per server.  Note, not all files are 'hot', and
>>> I only expect to keep about 1% super hot, and 5% relatively hot, the
>>> rest are cold.  So in terms of keeping hbase blocks in RAM, that
>>> should be adequate, for the rest we can afford a trip into hdfs.
>>>
>>> If servers are running 8 GB of ram, and are shared for regionservers
>>> and datanodes, how much heap should I allocate to each?  6GB for RS
>>> and 1GB  for DN?
>>>
>>> Also, on the question whether 8 core x 16G Ram helps a Master server
>>> to bring up the cluster faster, the answer is definitely - yes.   It
>>> took only 90 seconds to load 5000 regions across 13 servers, where
>>> same task for Dual Core 8G Ram, took nearly 10 minutes.
>>>
>>> -Jack
>>>
>>>
>>>
>>> On Tue, Sep 21, 2010 at 11:38 PM, Stack <st...@duboce.net> wrote:
>>>> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <ma...@gmail.com> wrote:
>>>>> Its definitely binary, and I can even load it in my browser but
>>>>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>>>>> application/octet-stream there is no base64 encoding at all.
>>>>>
>>>>
>>>> OK.  Good.  If it were base64'd, you'd see it.
>>>>
>>>>> Btw, out of curiosity I have region max file size set to 1GB now, but
>>>>> what if I set it to say 10G or 50G?  Is their significant overhead in
>>>>> address seeking via HDFS?
>>>>>
>>>>
>>>> You could do that.  We don't have much experience running regions of
>>>> that size.  You should for sure pre-split your table on creation if
>>>> you go this route (See HBaseAdmin API [1]).  This method is not
>>>> available in shell so you'd have to script it or write a little java
>>>> to do it).
>>>>
>>>> St.Ack
>>>>
>>>> 1. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>>>> byte[][])
>>>>
>>>
>

Re: all regions unregistered over time.

Posted by Jack Levin <ma...@gmail.com>.

So our cell sizes will be 350kb on average with 5-10 terabytes per server, I just want to keep the count of Regions under 1000, per server

-Jack


On Sep 22, 2010, at 2:44 AM, Ryan Rawson <ry...@gmail.com> wrote:

> Region size is one of those tricky things, there are a few factors to consider:
> 
> - regions are the basic element of availability and distribution.
> - HBase scales by having regions across many servers.  Thus if you
> have 2 regions for 16GB data, on a 20 node machine you are a net loss
> there.
> - High region count has been known to make things slow, this is
> getting better, but it is probably better to have 700 regions than
> 3000 for the same amount of data.
> - Low region count prevents parallel scalability as per point #2.
> This really cant be stressed enough, since a common problem is loading
> 200MB data into HBase then wondering why your awesome 10 node cluster
> is mostly idle.
> - There is not much memory footprint difference between 1 region and
> 10 in terms of indexes, etc, held by the regionserver.
> 
> Generally speaking I stick to the default, go smaller for hot tables,
> or manually split them, and go with a 1GB region size on our largest
> 900 GB table.
> 
> -ryan
> 
> On Wed, Sep 22, 2010 at 12:01 AM, Jack Levin <ma...@gmail.com> wrote:
>> Yes, I am thinking to put 10 to 15 million files on each regionserver
>> (well, not literally, but be controlled by regionserver).   So thats
>> close to 4 TB worth of regions, which is about 4GB per region should
>> we target 1000 regions per server.  Note, not all files are 'hot', and
>> I only expect to keep about 1% super hot, and 5% relatively hot, the
>> rest are cold.  So in terms of keeping hbase blocks in RAM, that
>> should be adequate, for the rest we can afford a trip into hdfs.
>> 
>> If servers are running 8 GB of ram, and are shared for regionservers
>> and datanodes, how much heap should I allocate to each?  6GB for RS
>> and 1GB  for DN?
>> 
>> Also, on the question whether 8 core x 16G Ram helps a Master server
>> to bring up the cluster faster, the answer is definitely - yes.   It
>> took only 90 seconds to load 5000 regions across 13 servers, where
>> same task for Dual Core 8G Ram, took nearly 10 minutes.
>> 
>> -Jack
>> 
>> 
>> 
>> On Tue, Sep 21, 2010 at 11:38 PM, Stack <st...@duboce.net> wrote:
>>> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <ma...@gmail.com> wrote:
>>>> Its definitely binary, and I can even load it in my browser but
>>>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>>>> application/octet-stream there is no base64 encoding at all.
>>>> 
>>> 
>>> OK.  Good.  If it were base64'd, you'd see it.
>>> 
>>>> Btw, out of curiosity I have region max file size set to 1GB now, but
>>>> what if I set it to say 10G or 50G?  Is their significant overhead in
>>>> address seeking via HDFS?
>>>> 
>>> 
>>> You could do that.  We don't have much experience running regions of
>>> that size.  You should for sure pre-split your table on creation if
>>> you go this route (See HBaseAdmin API [1]).  This method is not
>>> available in shell so you'd have to script it or write a little java
>>> to do it).
>>> 
>>> St.Ack
>>> 
>>> 1. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>>> byte[][])
>>> 
>>

Re: all regions unregistered over time.

Posted by Ryan Rawson <ry...@gmail.com>.

Region size is one of those tricky things, there are a few factors to consider:

- regions are the basic element of availability and distribution.
- HBase scales by having regions across many servers.  Thus if you
have 2 regions for 16GB data, on a 20 node machine you are a net loss
there.
- High region count has been known to make things slow, this is
getting better, but it is probably better to have 700 regions than
3000 for the same amount of data.
- Low region count prevents parallel scalability as per point #2.
This really cant be stressed enough, since a common problem is loading
200MB data into HBase then wondering why your awesome 10 node cluster
is mostly idle.
- There is not much memory footprint difference between 1 region and
10 in terms of indexes, etc, held by the regionserver.

Generally speaking I stick to the default, go smaller for hot tables,
or manually split them, and go with a 1GB region size on our largest
900 GB table.

-ryan

On Wed, Sep 22, 2010 at 12:01 AM, Jack Levin <ma...@gmail.com> wrote:
> Yes, I am thinking to put 10 to 15 million files on each regionserver
> (well, not literally, but be controlled by regionserver).   So thats
> close to 4 TB worth of regions, which is about 4GB per region should
> we target 1000 regions per server.  Note, not all files are 'hot', and
> I only expect to keep about 1% super hot, and 5% relatively hot, the
> rest are cold.  So in terms of keeping hbase blocks in RAM, that
> should be adequate, for the rest we can afford a trip into hdfs.
>
> If servers are running 8 GB of ram, and are shared for regionservers
> and datanodes, how much heap should I allocate to each?  6GB for RS
> and 1GB  for DN?
>
> Also, on the question whether 8 core x 16G Ram helps a Master server
> to bring up the cluster faster, the answer is definitely - yes.   It
> took only 90 seconds to load 5000 regions across 13 servers, where
> same task for Dual Core 8G Ram, took nearly 10 minutes.
>
> -Jack
>
>
>
> On Tue, Sep 21, 2010 at 11:38 PM, Stack <st...@duboce.net> wrote:
>> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <ma...@gmail.com> wrote:
>>> Its definitely binary, and I can even load it in my browser but
>>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>>> application/octet-stream there is no base64 encoding at all.
>>>
>>
>> OK.  Good.  If it were base64'd, you'd see it.
>>
>>> Btw, out of curiosity I have region max file size set to 1GB now, but
>>> what if I set it to say 10G or 50G?  Is their significant overhead in
>>> address seeking via HDFS?
>>>
>>
>> You could do that.  We don't have much experience running regions of
>> that size.  You should for sure pre-split your table on creation if
>> you go this route (See HBaseAdmin API [1]).  This method is not
>> available in shell so you'd have to script it or write a little java
>> to do it).
>>
>> St.Ack
>>
>> 1. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>> byte[][])
>>
>

Re: all regions unregistered over time.

Posted by Jack Levin <ma...@gmail.com>.

Yes, I am thinking to put 10 to 15 million files on each regionserver
(well, not literally, but be controlled by regionserver).   So thats
close to 4 TB worth of regions, which is about 4GB per region should
we target 1000 regions per server.  Note, not all files are 'hot', and
I only expect to keep about 1% super hot, and 5% relatively hot, the
rest are cold.  So in terms of keeping hbase blocks in RAM, that
should be adequate, for the rest we can afford a trip into hdfs.

If servers are running 8 GB of ram, and are shared for regionservers
and datanodes, how much heap should I allocate to each?  6GB for RS
and 1GB  for DN?

Also, on the question whether 8 core x 16G Ram helps a Master server
to bring up the cluster faster, the answer is definitely - yes.   It
took only 90 seconds to load 5000 regions across 13 servers, where
same task for Dual Core 8G Ram, took nearly 10 minutes.

-Jack

On Tue, Sep 21, 2010 at 11:38 PM, Stack <st...@duboce.net> wrote:
> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <ma...@gmail.com> wrote:
>> Its definitely binary, and I can even load it in my browser but
>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>> application/octet-stream there is no base64 encoding at all.
>>
>
> OK.  Good.  If it were base64'd, you'd see it.
>
>> Btw, out of curiosity I have region max file size set to 1GB now, but
>> what if I set it to say 10G or 50G?  Is their significant overhead in
>> address seeking via HDFS?
>>
>
> You could do that.  We don't have much experience running regions of
> that size.  You should for sure pre-split your table on creation if
> you go this route (See HBaseAdmin API [1]).  This method is not
> available in shell so you'd have to script it or write a little java
> to do it).
>
> St.Ack
>
> 1. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
> byte[][])
>

Re: all regions unregistered over time.

Posted by Stack <st...@duboce.net>.

On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <ma...@gmail.com> wrote:
> Its definitely binary, and I can even load it in my browser but
> setting appropriate headers.  So I guess for PUT and GET via Accept:
> application/octet-stream there is no base64 encoding at all.
>

OK.  Good.  If it were base64'd, you'd see it.

> Btw, out of curiosity I have region max file size set to 1GB now, but
> what if I set it to say 10G or 50G?  Is their significant overhead in
> address seeking via HDFS?
>

You could do that.  We don't have much experience running regions of
that size.  You should for sure pre-split your table on creation if
you go this route (See HBaseAdmin API [1]).  This method is not
available in shell so you'd have to script it or write a little java
to do it).

St.Ack

1. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[][])

Re: all regions unregistered over time.

Posted by Jack Levin <ma...@gmail.com>.

Its definitely binary, and I can even load it in my browser but
setting appropriate headers.  So I guess for PUT and GET via Accept:
application/octet-stream there is no base64 encoding at all.

Btw, out of curiosity I have region max file size set to 1GB now, but
what if I set it to say 10G or 50G?  Is their significant overhead in
address seeking via HDFS?

-Jack

On Tue, Sep 21, 2010 at 10:47 PM, Stack <st...@duboce.net> wrote:
> I don't know.  Can you dump the curl output to a file or STDOUT and
> take a look at it?
> St.Ack
>
> On Tue, Sep 21, 2010 at 10:36 PM, Jack Levin <ma...@gmail.com> wrote:
>> Thanks, patched version is now running (actually SU's version was
>> patched, ty).  Another question is in regards to REST doing base64,
>> when you send a header  "Accept: application/octet-stream", I get just
>> byte stream, e.g. no base64, does this mean internally the cell is
>> split our byte by byte without any conversion to base64? Hence no
>> overhead?
>>
>> -Jack
>>
>> On Tue, Sep 21, 2010 at 10:09 PM, Stack <st...@duboce.net> wrote:
>>> Given the log snippet, I'd guess its because your hbase doesn't have HBASE-2643.
>>>
>>> The above makes it so we continue through an EOF exception when
>>> splitting logs where before we'd fail the splitting, requeue, split,
>>> then fail again.
>>>
>>> Here is comment recently added to our little hbase book at src/docbkx/book.xml:
>>>
>>>      <section>
>>>        <title>How EOFExceptions are treated when splitting a crashed
>>>        RegionServers' WALs</title>
>>>
>>>        <para>If we get an EOF while splitting logs, we proceed with the split
>>>        even when <varname>hbase.hlog.split.skip.errors</varname> ==
>>>        <constant>false</constant>. An EOF while reading the last log in the
>>>        set of files to split is near-guaranteed since the RegionServer likely
>>>        crashed mid-write of a record. But we'll continue even if we got an
>>>        EOF reading other than the last file in the set.<footnote>
>>>            <para>For background, see <link
>>>            xlink:href="https://issues.apache.org/jira/browse/HBASE-2643">HBASE-2643
>>>            Figure how to deal with eof splitting logs</link></para>
>>>          </footnote></para>
>>>      </section>
>>>
>>> St.Ack
>>>
>>> On Tue, Sep 21, 2010 at 3:00 PM, Jack Levin <ma...@gmail.com> wrote:
>>>> First, I saw:
>>>>
>>>>
>>>> 2010-09-21 11:30:05,122 DEBUG
>>>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put
>>>> ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue
>>>> 2010-09-21 11:30:05,122 DEBUG
>>>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
>>>> todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711
>>>> 2010-09-21 11:30:05,122 INFO
>>>> org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
>>>> of server 10.103.2.5,60020,1285042335711: logSplit: false,
>>>> rootRescanned: false, n
>>>> umberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>>>
>>>> repeated rapidly for 20 mins or so.
>>>>
>>>> Then:
>>>>
>>>> Bunch of regions got unassigned:
>>>>
>>>>
>>>> 2010-09-21 12:00:07,782 DEBUG
>>>> org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions
>>>> from 10.103.2.3,60020,1285042333293
>>>> 2010-09-21 12:00:07,782 DEBUG
>>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>>> img816,img2103r.jpg,1285003791610.1592893332
>>>> 2010-09-21 12:00:07,782 DEBUG
>>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>>> img534,92166039.jpg,1284949117852.1009352950
>>>> 2010-09-21 12:00:07,782 DEBUG
>>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>>> img36,abcwu.jpg,1285001278990.272235177
>>>>
>>>>
>>>> Restarting master did not help.  Ultimately what brought the cluster
>>>> back up, is full shutdown of regionservers, and masters, and then
>>>> bring all up.
>>>>
>>>> Any ideas what might have happened here?
>>>>
>>>> We are running:
>>>>
>>>> HBase Version   0.89.20100726, r979826
>>>> Hadoop Version  0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957
>>>> Regions On FS   5057
>>>>
>>>> 3 zookeepers and 13 regionservers.
>>>>
>>>> -Jack
>>>>
>>>
>>
>

Re: all regions unregistered over time.

Posted by Stack <st...@duboce.net>.

I don't know.  Can you dump the curl output to a file or STDOUT and
take a look at it?
St.Ack

On Tue, Sep 21, 2010 at 10:36 PM, Jack Levin <ma...@gmail.com> wrote:
> Thanks, patched version is now running (actually SU's version was
> patched, ty).  Another question is in regards to REST doing base64,
> when you send a header  "Accept: application/octet-stream", I get just
> byte stream, e.g. no base64, does this mean internally the cell is
> split our byte by byte without any conversion to base64? Hence no
> overhead?
>
> -Jack
>
> On Tue, Sep 21, 2010 at 10:09 PM, Stack <st...@duboce.net> wrote:
>> Given the log snippet, I'd guess its because your hbase doesn't have HBASE-2643.
>>
>> The above makes it so we continue through an EOF exception when
>> splitting logs where before we'd fail the splitting, requeue, split,
>> then fail again.
>>
>> Here is comment recently added to our little hbase book at src/docbkx/book.xml:
>>
>>      <section>
>>        <title>How EOFExceptions are treated when splitting a crashed
>>        RegionServers' WALs</title>
>>
>>        <para>If we get an EOF while splitting logs, we proceed with the split
>>        even when <varname>hbase.hlog.split.skip.errors</varname> ==
>>        <constant>false</constant>. An EOF while reading the last log in the
>>        set of files to split is near-guaranteed since the RegionServer likely
>>        crashed mid-write of a record. But we'll continue even if we got an
>>        EOF reading other than the last file in the set.<footnote>
>>            <para>For background, see <link
>>            xlink:href="https://issues.apache.org/jira/browse/HBASE-2643">HBASE-2643
>>            Figure how to deal with eof splitting logs</link></para>
>>          </footnote></para>
>>      </section>
>>
>> St.Ack
>>
>> On Tue, Sep 21, 2010 at 3:00 PM, Jack Levin <ma...@gmail.com> wrote:
>>> First, I saw:
>>>
>>>
>>> 2010-09-21 11:30:05,122 DEBUG
>>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put
>>> ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue
>>> 2010-09-21 11:30:05,122 DEBUG
>>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
>>> todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711
>>> 2010-09-21 11:30:05,122 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
>>> of server 10.103.2.5,60020,1285042335711: logSplit: false,
>>> rootRescanned: false, n
>>> umberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>>
>>> repeated rapidly for 20 mins or so.
>>>
>>> Then:
>>>
>>> Bunch of regions got unassigned:
>>>
>>>
>>> 2010-09-21 12:00:07,782 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions
>>> from 10.103.2.3,60020,1285042333293
>>> 2010-09-21 12:00:07,782 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>> img816,img2103r.jpg,1285003791610.1592893332
>>> 2010-09-21 12:00:07,782 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>> img534,92166039.jpg,1284949117852.1009352950
>>> 2010-09-21 12:00:07,782 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>> img36,abcwu.jpg,1285001278990.272235177
>>>
>>>
>>> Restarting master did not help.  Ultimately what brought the cluster
>>> back up, is full shutdown of regionservers, and masters, and then
>>> bring all up.
>>>
>>> Any ideas what might have happened here?
>>>
>>> We are running:
>>>
>>> HBase Version   0.89.20100726, r979826
>>> Hadoop Version  0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957
>>> Regions On FS   5057
>>>
>>> 3 zookeepers and 13 regionservers.
>>>
>>> -Jack
>>>
>>
>

Re: all regions unregistered over time.

Posted by Jack Levin <ma...@gmail.com>.

Thanks, patched version is now running (actually SU's version was
patched, ty).  Another question is in regards to REST doing base64,
when you send a header  "Accept: application/octet-stream", I get just
byte stream, e.g. no base64, does this mean internally the cell is
split our byte by byte without any conversion to base64? Hence no
overhead?

-Jack

On Tue, Sep 21, 2010 at 10:09 PM, Stack <st...@duboce.net> wrote:
> Given the log snippet, I'd guess its because your hbase doesn't have HBASE-2643.
>
> The above makes it so we continue through an EOF exception when
> splitting logs where before we'd fail the splitting, requeue, split,
> then fail again.
>
> Here is comment recently added to our little hbase book at src/docbkx/book.xml:
>
>      <section>
>        <title>How EOFExceptions are treated when splitting a crashed
>        RegionServers' WALs</title>
>
>        <para>If we get an EOF while splitting logs, we proceed with the split
>        even when <varname>hbase.hlog.split.skip.errors</varname> ==
>        <constant>false</constant>. An EOF while reading the last log in the
>        set of files to split is near-guaranteed since the RegionServer likely
>        crashed mid-write of a record. But we'll continue even if we got an
>        EOF reading other than the last file in the set.<footnote>
>            <para>For background, see <link
>            xlink:href="https://issues.apache.org/jira/browse/HBASE-2643">HBASE-2643
>            Figure how to deal with eof splitting logs</link></para>
>          </footnote></para>
>      </section>
>
> St.Ack
>
> On Tue, Sep 21, 2010 at 3:00 PM, Jack Levin <ma...@gmail.com> wrote:
>> First, I saw:
>>
>>
>> 2010-09-21 11:30:05,122 DEBUG
>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put
>> ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue
>> 2010-09-21 11:30:05,122 DEBUG
>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
>> todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711
>> 2010-09-21 11:30:05,122 INFO
>> org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
>> of server 10.103.2.5,60020,1285042335711: logSplit: false,
>> rootRescanned: false, n
>> umberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>
>> repeated rapidly for 20 mins or so.
>>
>> Then:
>>
>> Bunch of regions got unassigned:
>>
>>
>> 2010-09-21 12:00:07,782 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions
>> from 10.103.2.3,60020,1285042333293
>> 2010-09-21 12:00:07,782 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>> img816,img2103r.jpg,1285003791610.1592893332
>> 2010-09-21 12:00:07,782 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>> img534,92166039.jpg,1284949117852.1009352950
>> 2010-09-21 12:00:07,782 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>> img36,abcwu.jpg,1285001278990.272235177
>>
>>
>> Restarting master did not help.  Ultimately what brought the cluster
>> back up, is full shutdown of regionservers, and masters, and then
>> bring all up.
>>
>> Any ideas what might have happened here?
>>
>> We are running:
>>
>> HBase Version   0.89.20100726, r979826
>> Hadoop Version  0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957
>> Regions On FS   5057
>>
>> 3 zookeepers and 13 regionservers.
>>
>> -Jack
>>
>

Re: all regions unregistered over time.

Posted by Stack <st...@duboce.net>.

Given the log snippet, I'd guess its because your hbase doesn't have HBASE-2643.

The above makes it so we continue through an EOF exception when
splitting logs where before we'd fail the splitting, requeue, split,
then fail again.

Here is comment recently added to our little hbase book at src/docbkx/book.xml:

      <section>
        <title>How EOFExceptions are treated when splitting a crashed
        RegionServers' WALs</title>

        <para>If we get an EOF while splitting logs, we proceed with the split
        even when <varname>hbase.hlog.split.skip.errors</varname> ==
        <constant>false</constant>. An EOF while reading the last log in the
        set of files to split is near-guaranteed since the RegionServer likely
        crashed mid-write of a record. But we'll continue even if we got an
        EOF reading other than the last file in the set.<footnote>
            <para>For background, see <link
            xlink:href="https://issues.apache.org/jira/browse/HBASE-2643">HBASE-2643
            Figure how to deal with eof splitting logs</link></para>
          </footnote></para>
      </section>

St.Ack

On Tue, Sep 21, 2010 at 3:00 PM, Jack Levin <ma...@gmail.com> wrote:
> First, I saw:
>
>
> 2010-09-21 11:30:05,122 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put
> ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue
> 2010-09-21 11:30:05,122 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
> todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711
> 2010-09-21 11:30:05,122 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
> of server 10.103.2.5,60020,1285042335711: logSplit: false,
> rootRescanned: false, n
> umberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>
> repeated rapidly for 20 mins or so.
>
> Then:
>
> Bunch of regions got unassigned:
>
>
> 2010-09-21 12:00:07,782 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions
> from 10.103.2.3,60020,1285042333293
> 2010-09-21 12:00:07,782 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Going to close region
> img816,img2103r.jpg,1285003791610.1592893332
> 2010-09-21 12:00:07,782 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Going to close region
> img534,92166039.jpg,1284949117852.1009352950
> 2010-09-21 12:00:07,782 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Going to close region
> img36,abcwu.jpg,1285001278990.272235177
>
>
> Restarting master did not help.  Ultimately what brought the cluster
> back up, is full shutdown of regionservers, and masters, and then
> bring all up.
>
> Any ideas what might have happened here?
>
> We are running:
>
> HBase Version   0.89.20100726, r979826
> Hadoop Version  0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957
> Regions On FS   5057
>
> 3 zookeepers and 13 regionservers.
>
> -Jack
>