You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jun Li <jl...@gmail.com> on 2009/03/27 09:58:25 UTC

try to run PerformanceEvaluation and encounter RetriesExhaustedException

Hi, I have just set up a small Linux machine cluster with 3 physical
machines (dual core, with RedHat Enterprise 5.) to run Hadoop and HBase. No
virtual machines are involved at this time in the test bed. The version of
HBase and Hadoop that I chose is hbase-0.19.0 and hadoop-0.19.0.

To better understand the performance, I run the PerformanceEvaluation which
is supported in the test directory comes with the release.  When I used the
client number N=1,  all the testing about sequentialWrite, sequentialRead,
randomWrite, randomRead, scan, just work fine, for the row number chosen by
default, which is R = 1024*1024, with each row containing 1 KB of data.

However, when I choose N= 4, with row number selected for 102400 (even
smaller than 1024*1024 described above), and run the following command:

 bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation  --rows=10240
sequentialWrite 10

the Map/Reduce fails, and I check the logs, It has the error message of:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server 15.25.119.59:60020 for region
TestTable,,1238136022072, row '0000010239', but failed after 10
attempts.
Exceptions:
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space

	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:841)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:932)
	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
	at org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:370)
	at org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:393)
	at org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:583)
	at org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:182)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child.main(Child.java:155)


All my HBase runtime configuration parameters, such as JVM Heap, are chosen
by the default setting.  Could you provide help on addressing this issue?

Also, it seems the on the Wiki page,
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation, the focus is on
the case of N=1. Could you share with me what is the higher number of N
(concurrent clients) that have been tested on HBase, given the comparable
number of rows  at the range of 1024*1024, or above?

Thank you and Best Regards.

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by Michael Greene <mi...@gmail.com>.

That link wasn't working for me. There's a lot on this list, but that
means there's more reason to want 0.20 when it's available.
http://tinyurl.com/hbase20issues

Michael

Erik Holstad <er...@gmail.com> wrote:
> Hadoop, but I would guess 4-6 weeks or so. Hopefully we will have a
> functioning trunk in not to long
> with most of the 0.20 features functioning, but there are still some open
> issues for 0.20, have a look at
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12313132

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by Erik Holstad <er...@gmail.com>.

Hi Stuart!
We are still waiting for Hadoop 0.20 to be released and after we have that
maybe 2 weeks or so
to finalize our release and get most of the bugs fixed, not really sure how
they are doing over at
Hadoop, but I would guess 4-6 weeks or so. Hopefully we will have a
functioning trunk in not to long
with most of the 0.20 features functioning, but there are still some open
issues for 0.20, have a look at
https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12313132

Regards Erik

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by Stuart White <st...@gmail.com>.

On Fri, Apr 3, 2009 at 10:10 AM, stack <st...@duboce.net> wrote:

> HBase 0.20.0 will be very different to 0.19.0 in character.  It should be
> more 'live' able -- there are less synchronizations -- and it should be more
> performant.

Is there a timeframe for when 0.20.0 will be released?

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by stack <st...@duboce.net>.

Thanks for the detailed description of your experiences.

On Thu, Apr 2, 2009 at 10:03 AM, Jun Li <jl...@gmail.com> wrote:

> ...
>
> (1)   I first changed HBASE_HEAPSIZE defined in hbase-env.sh from 1 GB to 2
> GB, and run: bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation
> sequentialWrite 4.  It fails at the map phase of M/R, due to the
> RetriesExhausedException, same as what I reported before.
>

Previous you ran into OOME.  Now you see RetriesExhaustedException?  Does
the exception show as an error  or just in DEBUG level logging?  Probably
the former.  You might try upping the regionserver lease from 60 seconds to
120 or 180 seconds.

> So based on my experiments, it seems that by changing the heap size from 1
> GB to 2GB and modifying io.map.index.skip to 128, there is not much
> observable help to resolve the “RetriesExhausedException”. By having the
> low
> number of rows (from 1 M rows to 10240 rows), the exception disappears,
> implies that the number of concurrent clients, and thus the number of the
> connections to the servers, is not the root cause of the problem that I
> have. The root cause should likely be related to the size of the rows, in
> this particular example. And thus, I did not try to change the setting of
> “mapred.map.tasks”, which has the default setting of 2 already.
>

You might check the regionserver that exhausted the replies.  Check its
logs.  Look particularly at compactions -- are they keeping up with the
upload or are the number of files being compacted continually rising?  Maybe
the host was struggling with compact load.  Are region splits delayed?   For
example, after your upload, did the number of regions rise substantially?
If you run the shell, can you get the row that we exhausted retries on?

My objective is to see how well HBase can support for concurrent clients,
> with modest row number at this time, say 1 M rows. Could you provide other
> suggestions, or it is in the road map for future release to fix the related
> problem? If you like to see detailed logs to infer the root cause, I would
> be happy to do that.
>

The Retries Exhausted does seem to be rearing its ugly head of late during
bulk uploads in 0.19.1.  Somethings up, some kind of temporary lock-up I'm
guessing since we retry ten times IIRC with a backoff between each try.
Some one of us needs to dig in and figure whats going on.  Sorry for the
trouble its given you.  As I've said in earlier mail, I seemed to have
better luck than you (At the end of this page, I've started to record
numbers for 8 concurrent clients doing reads, writes and scans:
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation).

As to concurrent clients, we gate how many can come in by the number of RPC
listeners we put up.  You might experiment upping (or lowering) the number.

HBase 0.20.0 will be very different to 0.19.0 in character.  It should be
more 'live' able -- there are less synchronizations -- and it should be more
performant.

St.Ack

>
> On Tue, Mar 31, 2009 at 3:35 PM, stack <st...@duboce.net> wrote:
>
> > On Wed, Apr 1, 2009 at 12:29 AM, stack <st...@duboce.net> wrote:
> >
> > > On Tue, Mar 31, 2009 at 8:08 AM, Jun Li <jl...@gmail.com> wrote:
> > >
> > > I was using defaults.  Maybe my hardware is better than yours.  Tell us
> > > about yours (RAM)?  I suggested io.map.index.skip because you were
> > OOME'ing
> > > and thats the thing that most directly effects memory use.
> > >
> >
> > You could try upping your regionserver heap if you have enough RAM.  Try
> > setting $HBASE_HOME/conf/hbase-env.sh HBASE_HEAPSIZE to 2G.
> > St.Ack
> >
>

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by Jun Li <jl...@gmail.com>.

Hi Stack,

Each of the 3 Linux machines that I am using for the experiments have RAM of
8GB (1 for HMaster, 2 for Region Servers). At the steady state, with Hadoop
and Hbase running in these physical machines (No VM Involved), each machine
still have about 4 GB free memory, according to what is shown in
/proc/meminfo.

I followed what you suggested in your previous two email responses, to
change the configurations of Hadoop and Hbase, but the
“RetriesExhaustedException” seems to not be able to fixed.

The following is my detailed configuration changes and the running of
“org.apache.hadoop.hbase.PerformanceEvalution”, in different steps.

(1)   I first changed HBASE_HEAPSIZE defined in hbase-env.sh from 1 GB to 2
GB, and run: bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation
sequentialWrite 4.  It fails at the map phase of M/R, due to the
RetriesExhausedException, same as what I reported before.

I reduced the concurrent client from 4 to 2, the problem still exists.

I changed the default of 1024*1024 rows of the “TestTable”, by setting
“—rows=102400”. Under N =4 or 2. Same failure happens.

(2)    I further changed “io.map.index.skip” defined in hadoop-default.xml,
to 8, and then 128,  and kept the heap-size of 2GB, and ran the above
experiment descirbed in (1) and repeated the setting of concurrent clients N
(4 or 2) and the row number (either 1024*1024, or 102400). All failed due to
the same error.

(3)   I kept the io.map.index.skip to 128 and heap-size of 2GB, and then set
the row number “—row=10240” , then I can have the concurrent clients to 4,
then 8, then 20. The PerformanceEvaluation program runs successfully.

So based on my experiments, it seems that by changing the heap size from 1
GB to 2GB and modifying io.map.index.skip to 128, there is not much
observable help to resolve the “RetriesExhausedException”. By having the low
number of rows (from 1 M rows to 10240 rows), the exception disappears,
implies that the number of concurrent clients, and thus the number of the
connections to the servers, is not the root cause of the problem that I
have. The root cause should likely be related to the size of the rows, in
this particular example. And thus, I did not try to change the setting of
“mapred.map.tasks”, which has the default setting of 2 already.

My objective is to see how well HBase can support for concurrent clients,
with modest row number at this time, say 1 M rows. Could you provide other
suggestions, or it is in the road map for future release to fix the related
problem? If you like to see detailed logs to infer the root cause, I would
be happy to do that.

Regards,

Jun

On Tue, Mar 31, 2009 at 3:35 PM, stack <st...@duboce.net> wrote:

> On Wed, Apr 1, 2009 at 12:29 AM, stack <st...@duboce.net> wrote:
>
> > On Tue, Mar 31, 2009 at 8:08 AM, Jun Li <jl...@gmail.com> wrote:
> >
> > I was using defaults.  Maybe my hardware is better than yours.  Tell us
> > about yours (RAM)?  I suggested io.map.index.skip because you were
> OOME'ing
> > and thats the thing that most directly effects memory use.
> >
>
> You could try upping your regionserver heap if you have enough RAM.  Try
> setting $HBASE_HOME/conf/hbase-env.sh HBASE_HEAPSIZE to 2G.
> St.Ack
>

Re: Novice Hbase user needs more help

Posted by Erik Holstad <er...@gmail.com>.

Hi Ron!
Not sure if you know already, but when reformatting HDFS it is usually a
good idea to clean up old tmp
files too, otherwise you might bumb into some other problems. They are
usually located in /tmp if you
haven't changed that.

Regards Erik

RE: Novice Hbase user - Hbase restart problem solved

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.

 
Hi St. Ack, Erik,

Thanks very much for the help. I now have Hbase back up and running. I
actually completely deleted the HDFS directory, and reformatted from
scratch. I also deleted everything pertaining to Hadoop and Hbase in the
/tmp directory before doing a new invocation, as Erik suggested. 

However, my best guess is to what happened is that I started Hbase from
a very old shell, opened before I added the environment var HBASE_HOME
to the .mycshrc file. I checked, and thet newer version of the .mycshrc
file never got sourced in that shell. Probably that was the big
"whoops". Anyway, thanks again for the tips - certainly will be useful
to note for future use.

Now I'll get back to work and see if I can get around my bulk import
problem, using Ryan's doCommit() method.

 Ron

___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, MSIN K7-90
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.taylor@pnl.gov
www.pnl.gov

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
stack
Sent: Thursday, April 02, 2009 12:19 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Novice Hbase user needs more help

It doesn't look like your format actually reformatted because it seems
to have left around the hbase directory in hdfs -- but probably minus
its content (The rootdir is there but not the hbase.version file that
hbase writes on bootstrap).  Try removing it: e.g. ./bin/hadoop fs -rmr
$HBASE_HOMEDIR.  Then try restarting hbase.

St.Ack


On Thu, Apr 2, 2009 at 3:45 AM, Taylor, Ronald C
<ro...@pnl.gov>wrote:

>
> Hello Erik,
>
> Thanks for the info. Unfortunately, at the moment I seem to have 
> regressed to the point where I can't even bring up Hbase. That is, I 
> decided to clean things out before doing more work, and so I tried to 
> do a restart: Hbase was already down due to the malfunctioning of my 
> Java program, but I made sure by issuing a stop-hbase.sh command. I 
> then issued a stop-dfs.sh command to Hadoop, which worked OK. I then 
> did a format using
>
>   bin/hadoop namenode -format
>
>  and restarted Hadoop. That also appeared to work OK, according to the

> Hadoop log files.
>
> But when I try to restart Hbase (which, as I said, had aborted due to 
> the previously mentioned problem in my Java upload program), I get the

> error msgs below in the Hbase log file. There is a file named
>
>  /hbase/hbase.version
>
> that the error msgs talk about. I cannot remember seeing or setting up

> any such file when I first installed Hbase, and I cannot find it in 
> the Hbase subdirirectories now. All I had to do when I installed Hbase

> was make very minor mods to two of the files in the .../conf subdir, 
> so I am at a loss as to why this hbase.version file is now required 
> and why a block for it cannot be found.
>
> And I cannot find anything in the docs on this "hbase.version" file, 
> or anything else that might be helpful in this context of the error 
> msgs on "No live nodes contain current block". I would deeply 
> appreciate any help at this point, just to get Hbase back up and
running.
>
>  Ron
>
> %%%%%%%%%%%%%%%%%%
>
> Error msgs from the Hbase log file when I tried to do a startup:
>
> Wed Apr  1 18:10:55 PDT 2009 Starting master on sidney ulimit -n 1024
> 2009-04-01 18:10:55,856 INFO org.apache.hadoop.hbase.master.HMaster:
> vmName=Java HotSpot(TM) Server VM, vmVendor=Sun Microsystems Inc.,
> vmVersion=11.0-b16
>
> 2009-04-01 18:10:55,857 INFO org.apache.hadoop.hbase.master.HMaster:
> vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError, 
> -Dhbase.log.dir=/sid/Hbase/hbase-0.19.0/bin/../logs,
> -Dhbase.log.file=hbase-hadoop-master-sidney.log,
> -Dhbase.home.dir=/sid/Hbase/hbase-0.19.0/bin/.., 
> -Dhbase.id.str=hadoop, -Dhbase.root.logger=INFO,DRFA,
> -Djava.library.path=/sid/Hbase/hbase-0.19.0/bin/../lib/native/Linux-i3
> 86
> -32]
>
> 2009-04-01 18:10:56,187 INFO org.apache.hadoop.hbase.master.HMaster:
> Root region dir: hdfs://localhost:5302/hbase/-ROOT-/70236052
> 2009-04-01 18:10:56,201 INFO org.apache.hadoop.hdfs.DFSClient: Could 
> not obtain block blk_6227212323375236304_1002 from any node:
> java.io.IOException: No live nodes contain current block
> 2009-04-01 18:10:59,205 INFO org.apache.hadoop.hdfs.DFSClient: Could 
> not obtain block blk_6227212323375236304_1002 from any node:
> java.io.IOException: No live nodes contain current block
> 2009-04-01 18:11:02,208 INFO org.apache.hadoop.hdfs.DFSClient: Could 
> not obtain block blk_6227212323375236304_1002 from any node:
> java.io.IOException: No live nodes contain current block
>
> 2009-04-01 18:11:05,212 WARN org.apache.hadoop.hdfs.DFSClient: DFS
Read:
> java.io.IOException: Could not obtain block:
> blk_6227212323375236304_1002 file=/hbase/hbase.version
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClie
> nt
> .java:1708)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.
> ja
> va:1536)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:16
> 63
> )
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:15
> 93
> )
>        at java.io.DataInputStream.readUnsignedShort(Unknown Source)
>        at java.io.DataInputStream.readUTF(Unknown Source)
>        at
> org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:101)
>        at
> org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:120)
>        at
> org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:211)
>        at
> org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:155)
>        at
>
org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:
> 96)
>        at
>
org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:
> 78)
>        at
> org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:966)
>        at
> org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1010)
>
> ___________________________________________
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group Pacific Northwest 
> National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, MSIN K7-90
> Richland, WA  99352 USA
> Office:  509-372-6568
> Email: ronald.taylor@pnl.gov
> www.pnl.gov
>
> -----Original Message-----
> From: Erik Holstad [mailto:erikholstad@gmail.com]
> Sent: Wednesday, April 01, 2009 3:47 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Novice Hbase user needs help with data upload - gets a 
> RetriesExhaustedException, followed by NoServerForRegionException
>
> Hi Ron!
> you can try to look at:
>
> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#5 and 6
>
> http://hadoop.apache.org/hbase/docs/r0.19.0/api/overview-summary.html#
> ov
> erview_description
>
> Some similar problems can be found in:
>
> http://www.nabble.com/RetriesExhaustedException--for-TableReduce-td225
> 69 
> 113.html<http://www.nabble.com/RetriesExhaustedException--for-TableRed
> uce-td22569%0A113.html> 
> http://www.nabble.com/RetriesExhaustedException!-td22408156.html<http:
> //www.nabble.com/RetriesExhaustedException%21-td22408156.html>
>
> Hope that it can be of help
> Regards Erik
>

Re: Novice Hbase user needs more help

Posted by stack <st...@duboce.net>.

It doesn't look like your format actually reformatted because it seems to
have left around the hbase directory in hdfs -- but probably minus its
content (The rootdir is there but not the hbase.version file that hbase
writes on bootstrap).  Try removing it: e.g. ./bin/hadoop fs -rmr
$HBASE_HOMEDIR.  Then try restarting hbase.

St.Ack


On Thu, Apr 2, 2009 at 3:45 AM, Taylor, Ronald C <ro...@pnl.gov>wrote:

>
> Hello Erik,
>
> Thanks for the info. Unfortunately, at the moment I seem to have
> regressed to the point where I can't even bring up Hbase. That is, I
> decided to clean things out before doing more work, and so I tried to do
> a restart: Hbase was already down due to the malfunctioning of my Java
> program, but I made sure by issuing a stop-hbase.sh command. I then
> issued a stop-dfs.sh command to Hadoop, which worked OK. I then did a
> format using
>
>   bin/hadoop namenode -format
>
>  and restarted Hadoop. That also appeared to work OK, according to the
> Hadoop log files.
>
> But when I try to restart Hbase (which, as I said, had aborted due to
> the previously mentioned problem in my Java upload program), I get the
> error msgs below in the Hbase log file. There is a file named
>
>  /hbase/hbase.version
>
> that the error msgs talk about. I cannot remember seeing or setting up
> any such file when I first installed Hbase, and I cannot find it in the
> Hbase subdirirectories now. All I had to do when I installed Hbase was
> make very minor mods to two of the files in the .../conf subdir, so I am
> at a loss as to why this hbase.version file is now required and why a
> block for it cannot be found.
>
> And I cannot find anything in the docs on this "hbase.version" file, or
> anything else that might be helpful in this context of the error msgs on
> "No live nodes contain current block". I would deeply appreciate any
> help at this point, just to get Hbase back up and running.
>
>  Ron
>
> %%%%%%%%%%%%%%%%%%
>
> Error msgs from the Hbase log file when I tried to do a startup:
>
> Wed Apr  1 18:10:55 PDT 2009 Starting master on sidney ulimit -n 1024
> 2009-04-01 18:10:55,856 INFO org.apache.hadoop.hbase.master.HMaster:
> vmName=Java HotSpot(TM) Server VM, vmVendor=Sun Microsystems Inc.,
> vmVersion=11.0-b16
>
> 2009-04-01 18:10:55,857 INFO org.apache.hadoop.hbase.master.HMaster:
> vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError,
> -Dhbase.log.dir=/sid/Hbase/hbase-0.19.0/bin/../logs,
> -Dhbase.log.file=hbase-hadoop-master-sidney.log,
> -Dhbase.home.dir=/sid/Hbase/hbase-0.19.0/bin/.., -Dhbase.id.str=hadoop,
> -Dhbase.root.logger=INFO,DRFA,
> -Djava.library.path=/sid/Hbase/hbase-0.19.0/bin/../lib/native/Linux-i386
> -32]
>
> 2009-04-01 18:10:56,187 INFO org.apache.hadoop.hbase.master.HMaster:
> Root region dir: hdfs://localhost:5302/hbase/-ROOT-/70236052
> 2009-04-01 18:10:56,201 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> obtain block blk_6227212323375236304_1002 from any node:
> java.io.IOException: No live nodes contain current block
> 2009-04-01 18:10:59,205 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> obtain block blk_6227212323375236304_1002 from any node:
> java.io.IOException: No live nodes contain current block
> 2009-04-01 18:11:02,208 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> obtain block blk_6227212323375236304_1002 from any node:
> java.io.IOException: No live nodes contain current block
>
> 2009-04-01 18:11:05,212 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
> java.io.IOException: Could not obtain block:
> blk_6227212323375236304_1002 file=/hbase/hbase.version
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient
> .java:1708)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.ja
> va:1536)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663
> )
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593
> )
>        at java.io.DataInputStream.readUnsignedShort(Unknown Source)
>        at java.io.DataInputStream.readUTF(Unknown Source)
>        at
> org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:101)
>        at
> org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:120)
>        at
> org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:211)
>        at
> org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:155)
>        at
> org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:
> 96)
>        at
> org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:
> 78)
>        at
> org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:966)
>        at
> org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1010)
>
> ___________________________________________
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group
> Pacific Northwest National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, MSIN K7-90
> Richland, WA  99352 USA
> Office:  509-372-6568
> Email: ronald.taylor@pnl.gov
> www.pnl.gov
>
> -----Original Message-----
> From: Erik Holstad [mailto:erikholstad@gmail.com]
> Sent: Wednesday, April 01, 2009 3:47 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Novice Hbase user needs help with data upload - gets a
> RetriesExhaustedException, followed by NoServerForRegionException
>
> Hi Ron!
> you can try to look at:
>
> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#5 and 6
>
> http://hadoop.apache.org/hbase/docs/r0.19.0/api/overview-summary.html#ov
> erview_description
>
> Some similar problems can be found in:
>
> http://www.nabble.com/RetriesExhaustedException--for-TableReduce-td22569
> 113.html<http://www.nabble.com/RetriesExhaustedException--for-TableReduce-td22569%0A113.html>
> http://www.nabble.com/RetriesExhaustedException!-td22408156.html<http://www.nabble.com/RetriesExhaustedException%21-td22408156.html>
>
> Hope that it can be of help
> Regards Erik
>

RE: Novice Hbase user needs more help

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.

 
Hello Erik,

Thanks for the info. Unfortunately, at the moment I seem to have
regressed to the point where I can't even bring up Hbase. That is, I
decided to clean things out before doing more work, and so I tried to do
a restart: Hbase was already down due to the malfunctioning of my Java
program, but I made sure by issuing a stop-hbase.sh command. I then
issued a stop-dfs.sh command to Hadoop, which worked OK. I then did a
format using

   bin/hadoop namenode -format

 and restarted Hadoop. That also appeared to work OK, according to the
Hadoop log files.

But when I try to restart Hbase (which, as I said, had aborted due to
the previously mentioned problem in my Java upload program), I get the
error msgs below in the Hbase log file. There is a file named
 
  /hbase/hbase.version 

that the error msgs talk about. I cannot remember seeing or setting up
any such file when I first installed Hbase, and I cannot find it in the
Hbase subdirirectories now. All I had to do when I installed Hbase was
make very minor mods to two of the files in the .../conf subdir, so I am
at a loss as to why this hbase.version file is now required and why a
block for it cannot be found.

And I cannot find anything in the docs on this "hbase.version" file, or
anything else that might be helpful in this context of the error msgs on
"No live nodes contain current block". I would deeply appreciate any
help at this point, just to get Hbase back up and running.

 Ron

%%%%%%%%%%%%%%%%%%

Error msgs from the Hbase log file when I tried to do a startup:

Wed Apr  1 18:10:55 PDT 2009 Starting master on sidney ulimit -n 1024
2009-04-01 18:10:55,856 INFO org.apache.hadoop.hbase.master.HMaster:
vmName=Java HotSpot(TM) Server VM, vmVendor=Sun Microsystems Inc.,
vmVersion=11.0-b16

2009-04-01 18:10:55,857 INFO org.apache.hadoop.hbase.master.HMaster:
vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError,
-Dhbase.log.dir=/sid/Hbase/hbase-0.19.0/bin/../logs,
-Dhbase.log.file=hbase-hadoop-master-sidney.log,
-Dhbase.home.dir=/sid/Hbase/hbase-0.19.0/bin/.., -Dhbase.id.str=hadoop,
-Dhbase.root.logger=INFO,DRFA,
-Djava.library.path=/sid/Hbase/hbase-0.19.0/bin/../lib/native/Linux-i386
-32]

2009-04-01 18:10:56,187 INFO org.apache.hadoop.hbase.master.HMaster:
Root region dir: hdfs://localhost:5302/hbase/-ROOT-/70236052
2009-04-01 18:10:56,201 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_6227212323375236304_1002 from any node:
java.io.IOException: No live nodes contain current block
2009-04-01 18:10:59,205 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_6227212323375236304_1002 from any node:
java.io.IOException: No live nodes contain current block
2009-04-01 18:11:02,208 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_6227212323375236304_1002 from any node:
java.io.IOException: No live nodes contain current block

2009-04-01 18:11:05,212 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
java.io.IOException: Could not obtain block:
blk_6227212323375236304_1002 file=/hbase/hbase.version
	at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient
.java:1708)
	at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.ja
va:1536)
	at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663
)
	at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593
)
	at java.io.DataInputStream.readUnsignedShort(Unknown Source)
	at java.io.DataInputStream.readUTF(Unknown Source)
	at
org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:101)
	at
org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:120)
	at
org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:211)
	at
org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:155)
	at
org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:
96)
	at
org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:
78)
	at
org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:966)
	at
org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1010)

___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, MSIN K7-90
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.taylor@pnl.gov
www.pnl.gov

-----Original Message-----
From: Erik Holstad [mailto:erikholstad@gmail.com] 
Sent: Wednesday, April 01, 2009 3:47 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Novice Hbase user needs help with data upload - gets a
RetriesExhaustedException, followed by NoServerForRegionException

Hi Ron!
you can try to look at:

http://wiki.apache.org/hadoop/Hbase/Troubleshooting#5 and 6

http://hadoop.apache.org/hbase/docs/r0.19.0/api/overview-summary.html#ov
erview_description

Some similar problems can be found in:

http://www.nabble.com/RetriesExhaustedException--for-TableReduce-td22569
113.html
http://www.nabble.com/RetriesExhaustedException!-td22408156.html

Hope that it can be of help
Regards Erik

Re: Novice Hbase user needs help with data upload - gets a RetriesExhaustedException, followed by NoServerForRegionException

Posted by Erik Holstad <er...@gmail.com>.

Hi Ron!
you can try to look at:

http://wiki.apache.org/hadoop/Hbase/Troubleshooting#5 and 6

http://hadoop.apache.org/hbase/docs/r0.19.0/api/overview-summary.html#overview_description

Some similar problems can be found in:

http://www.nabble.com/RetriesExhaustedException--for-TableReduce-td22569113.html
http://www.nabble.com/RetriesExhaustedException!-td22408156.html

Hope that it can be of help
Regards Erik

Novice Hbase user needs help with data upload - gets a RetriesExhaustedException, followed by NoServerForRegionException

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.

 
Hello folks,

This is my first msg to the list - I just joined today, and I am a
novice Hadoop/HBase programmer. I have a question:

I have written a Java program to create an HBase table and then enter a
number of rows into the table. The only way I have found so far to do
this is to enter each row one-by-one, creating a new BatchUpdate
updateObj for each row, doing about ten updateObj.put()'s to add the
column data, and then doing a tableObj.commit(updateObj). There's
probably a more efficient way (happy to hear, if so!), but this is what
I'm starting with.

When I do this on input that creates 3000 rows, the program works fine.
When I try this on input that would create 300,000 rows (still
relatively small for an HBase table, I would think), the program
terminates around row 160,000 or so, generating first an
RetriesExhaustedException, followed by NoServerForRegionException. The
HBase server crashes, and I have to restart it. The Hadoop server
appears to remain OK and does not need restarting.

Can anybody give me any guidance? I presume that I might need to adjust
some setting for larger input in the HBase and/or Hadoop config files.
At present, I am using default settings. I have installed Hadoop 0.19.0
and HBase 0.19.0 in the "pseudo" cluster mode on a single machine, my
Red Hat Linux desktop, which has 3 Gb RAM. 

Any help / suggestions would be much appreciated.

  Cheers, 
   Ron Taylor

___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, MSIN K7-90
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.taylor@pnl.gov
www.pnl.gov

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by stack <st...@duboce.net>.

On Wed, Apr 1, 2009 at 12:29 AM, stack <st...@duboce.net> wrote:

> On Tue, Mar 31, 2009 at 8:08 AM, Jun Li <jl...@gmail.com> wrote:
>
> I was using defaults.  Maybe my hardware is better than yours.  Tell us
> about yours (RAM)?  I suggested io.map.index.skip because you were OOME'ing
> and thats the thing that most directly effects memory use.
>

You could try upping your regionserver heap if you have enough RAM.  Try
setting $HBASE_HOME/conf/hbase-env.sh HBASE_HEAPSIZE to 2G.
St.Ack

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by stack <st...@duboce.net>.

On Tue, Mar 31, 2009 at 8:08 AM, Jun Li <jl...@gmail.com> wrote:
....

> Compared to your configuration for 8 clients, I will need one more machine
> to repeat your experience. Do you have your own special settings (JVM Heap,
> Memory,  etc.) in HDFS or in HBase, that are different than the default
> settings? If you do have such settings, could you share them with me? You
> show "io.map.index.skip" in your email, do you recommend me to play with
> it?
>

I was using defaults.  Maybe my hardware is better than yours.  Tell us
about yours (RAM)?  I suggested io.map.index.skip because you were OOME'ing
and thats the thing that most directly effects memory use.

You have tasktrackers running on all these nodes?  Try running with less
than the default of two concurrent mappers.  See if that makes a
difference.  Does N=3 work?

> In my actual solution that I am planning to build upon HBase, I can manage
> to have many machines to form the HBase cluster, say, over a hundred of
> VMs.
> Based on your experience, what would be the performance impact of adding
> more and more region servers, to serve the concurrent clients, in the
> current implementation of HBase (version 0.19.0 or higher)?  I would image
> the number of concurrent clients that can be served should grow, up to
> certain point, and then gets saturated.
>

Yeah, generally, adding modes ups the amount of writes the cluster can
carry.

St.Ack

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by Jun Li <jl...@gmail.com>.

Hi Stack,

Thank you very much for your reply.

No, I did not modify the source code of
org.apache.hadoop.hbase.PerformanceEvaluation.  In fact, for my current
configuration of 3 machines ( I Hbase Master, and 2 region servers), either
N = 4 or N =10 will introduce the same Out of Memory that I reported, when I
run the command of:

 bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 10
(or 4)
In my previous email, I reported that when I set N = 10, even the total
number of rows "--rows" being set 102400, will still introduce the
same OOME.

Compared to your configuration for 8 clients, I will need one more machine
to repeat your experience. Do you have your own special settings (JVM Heap,
Memory,  etc.) in HDFS or in HBase, that are different than the default
settings? If you do have such settings, could you share them with me? You
show "io.map.index.skip" in your email, do you recommend me to play with it?


In my actual solution that I am planning to build upon HBase, I can manage
to have many machines to form the HBase cluster, say, over a hundred of VMs.
Based on your experience, what would be the performance impact of adding
more and more region servers, to serve the concurrent clients, in the
current implementation of HBase (version 0.19.0 or higher)?  I would image
the number of concurrent clients that can be served should grow, up to
certain point, and then gets saturated.

Regards,

Jun



On Mon, Mar 30, 2009 at 4:12 AM, stack <st...@duboce.net> wrote:

> Hello Jun Li:
>
> So, you modified the code to set N=4?
>
> What happens if you leave it at 1 and instead run the ten clients as in
> "sequentialWrite 10"? (The '10' in the former says, run 10 concurrent
> clients -- isn't that what you want?)
>
> The reason I ask is because I have not played around with changing N.  To
> specify more clients, I just pass the client number as argument on
> command-line.
>
> There is also the --nomapred option which will run all clients in the one
> JVM (I find that this, for smaller numbers, can put up a heavier loading
> than running via MR).
>
> In my experience with rows of 1K, with a cluster of 4 machines each running
> a tasktracer  that ran two concurrent children at a time, I was able to run
> tests with 8 clients writing to one regionserver.  If I set the cell size
> down -- < 100 bytes -- I'd find that I'd OOME because of index sizes (The
> below configuration has biggest effect on heap used).  If I ran with more
> than 8 clients, I'd run into issues where compactions were overwhelmed by
> the upload rate (we need to make our compactions run faster).
>
> St.Ack
>
>  <property>
>    <name>hbase.io.index.interval</name>
>    <value>128</value>
>    <description>The interval at which we record offsets in hbase
>    store files/mapfiles.  Default for stock mapfiles is 128.  Index
>    files are read into memory.  If there are many of them, could prove
>    a burden.  If so play with the hadoop io.map.index.skip property and
>    skip every nth index member when reading back the index into memory.
>    Downside to high index interval is lowered access times.
>    </description>
>  </property>
>
>
> On Fri, Mar 27, 2009 at 10:58 AM, Jun Li <jl...@gmail.com> wrote:
>
> > Hi, I have just set up a small Linux machine cluster with 3 physical
> > machines (dual core, with RedHat Enterprise 5.) to run Hadoop and HBase.
> No
> > virtual machines are involved at this time in the test bed. The version
> of
> > HBase and Hadoop that I chose is hbase-0.19.0 and hadoop-0.19.0.
> >
> > To better understand the performance, I run the PerformanceEvaluation
> which
> > is supported in the test directory comes with the release.  When I used
> the
> > client number N=1,  all the testing about sequentialWrite,
> sequentialRead,
> > randomWrite, randomRead, scan, just work fine, for the row number chosen
> by
> > default, which is R = 1024*1024, with each row containing 1 KB of data.
> >
> > However, when I choose N= 4, with row number selected for 102400 (even
> > smaller than 1024*1024 described above), and run the following command:
> >
> >  bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation  --rows=10240
> > sequentialWrite 10
> >
> > the Map/Reduce fails, and I check the logs, It has the error message of:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> > contact region server 15.25.119.59:60020 for region
> > TestTable,,1238136022072, row '0000010239', but failed after 10
> > attempts.
> > Exceptions:
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> > java.lang.OutOfMemoryError: Java heap space
> >
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:841)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:932)
> >        at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> >        at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:370)
> >        at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:393)
> >        at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:583)
> >        at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:182)
> >        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:155)
> >
> >
> > All my HBase runtime configuration parameters, such as JVM Heap, are
> chosen
> > by the default setting.  Could you provide help on addressing this issue?
> >
> > Also, it seems the on the Wiki page,
> > http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation, the focus is
> on
> > the case of N=1. Could you share with me what is the higher number of N
> > (concurrent clients) that have been tested on HBase, given the comparable
> > number of rows  at the range of 1024*1024, or above?
> >
> > Thank you and Best Regards.
> >
>

Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException

Posted by stack <st...@duboce.net>.

Hello Jun Li:

So, you modified the code to set N=4?

What happens if you leave it at 1 and instead run the ten clients as in
"sequentialWrite 10"? (The '10' in the former says, run 10 concurrent
clients -- isn't that what you want?)

The reason I ask is because I have not played around with changing N.  To
specify more clients, I just pass the client number as argument on
command-line.

There is also the --nomapred option which will run all clients in the one
JVM (I find that this, for smaller numbers, can put up a heavier loading
than running via MR).

In my experience with rows of 1K, with a cluster of 4 machines each running
a tasktracer  that ran two concurrent children at a time, I was able to run
tests with 8 clients writing to one regionserver.  If I set the cell size
down -- < 100 bytes -- I'd find that I'd OOME because of index sizes (The
below configuration has biggest effect on heap used).  If I ran with more
than 8 clients, I'd run into issues where compactions were overwhelmed by
the upload rate (we need to make our compactions run faster).

St.Ack

 <property>
    <name>hbase.io.index.interval</name>
    <value>128</value>
    <description>The interval at which we record offsets in hbase
    store files/mapfiles.  Default for stock mapfiles is 128.  Index
    files are read into memory.  If there are many of them, could prove
    a burden.  If so play with the hadoop io.map.index.skip property and
    skip every nth index member when reading back the index into memory.
    Downside to high index interval is lowered access times.
    </description>
  </property>


On Fri, Mar 27, 2009 at 10:58 AM, Jun Li <jl...@gmail.com> wrote:

> Hi, I have just set up a small Linux machine cluster with 3 physical
> machines (dual core, with RedHat Enterprise 5.) to run Hadoop and HBase. No
> virtual machines are involved at this time in the test bed. The version of
> HBase and Hadoop that I chose is hbase-0.19.0 and hadoop-0.19.0.
>
> To better understand the performance, I run the PerformanceEvaluation which
> is supported in the test directory comes with the release.  When I used the
> client number N=1,  all the testing about sequentialWrite, sequentialRead,
> randomWrite, randomRead, scan, just work fine, for the row number chosen by
> default, which is R = 1024*1024, with each row containing 1 KB of data.
>
> However, when I choose N= 4, with row number selected for 102400 (even
> smaller than 1024*1024 described above), and run the following command:
>
>  bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation  --rows=10240
> sequentialWrite 10
>
> the Map/Reduce fails, and I check the logs, It has the error message of:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server 15.25.119.59:60020 for region
> TestTable,,1238136022072, row '0000010239', but failed after 10
> attempts.
> Exceptions:
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
>
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:841)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:932)
>        at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>        at
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:370)
>        at
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:393)
>        at
> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:583)
>        at
> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:182)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>        at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
>
> All my HBase runtime configuration parameters, such as JVM Heap, are chosen
> by the default setting.  Could you provide help on addressing this issue?
>
> Also, it seems the on the Wiki page,
> http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation, the focus is on
> the case of N=1. Could you share with me what is the higher number of N
> (concurrent clients) that have been tested on HBase, given the comparable
> number of rows  at the range of 1024*1024, or above?
>
> Thank you and Best Regards.
>