You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Mathias Herberts <ma...@gmail.com> on 2009/07/22 18:34:41 UTC

RetriesExhaustedException

Hi,

I am using the latest HBase trunk on top of hadoop 0.20.0. I have a MR
job that digests records and stores them in a table in HBase.

Hadoop/HBase/ZooKeeper are deployed on a cluster of 5 machines (Linux,
64bits, 16Gb or RAM, 2x1Tb of disk).

After some time, the MR job fails exceptions similar to:

"org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server Some server for region ...."

This looks like the problem described in HBASE-1603 except the split
of the region that the job fails to access occurs very close to the
time of the failure.

HBASE-1615 is applied to the version of HBase trunk this problem occurs on.

Anyone else experiencing that?

Mathias.

Re: RetriesExhaustedException

Posted by Ryan Rawson <ry...@gmail.com>.
The problem with only 2 cores is you can starve out HDFS and/or HBase
with map-reduce jobs.

Can you put more logs up?  RetriesExhaustedException is not a root
cause, and we need more info to diagnose.

Another question: have you used truncate in the shell?

On Wed, Jul 22, 2009 at 11:49 PM, Mathias
Herberts<ma...@gmail.com> wrote:
> On Thu, Jul 23, 2009 at 08:46, Ryan Rawson<ry...@gmail.com> wrote:
>> how many cpus and cores does your system have?
>>
>> you cant run a map-reduce mapper, dfs, hbase on 2 cpus as we discovered recently
>
> Each machine is a bi Quad Core, so the total number of CPUs in the
> cluster is 5x2 = 10 and the total number of cores 5x2x4 = 40.
>
> Is the limit you mention per machine or per cluster?
>

Re: RetriesExhaustedException

Posted by Mathias Herberts <ma...@gmail.com>.
On Thu, Jul 23, 2009 at 08:46, Ryan Rawson<ry...@gmail.com> wrote:
> how many cpus and cores does your system have?
>
> you cant run a map-reduce mapper, dfs, hbase on 2 cpus as we discovered recently

Each machine is a bi Quad Core, so the total number of CPUs in the
cluster is 5x2 = 10 and the total number of cores 5x2x4 = 40.

Is the limit you mention per machine or per cluster?

Re: RetriesExhaustedException

Posted by Ryan Rawson <ry...@gmail.com>.
how many cpus and cores does your system have?

you cant run a map-reduce mapper, dfs, hbase on 2 cpus as we discovered recently

On Wed, Jul 22, 2009 at 11:42 PM, Mathias
Herberts<ma...@gmail.com> wrote:
>> You've never run with an older version of TRUNK? Only a recent one, one that
>> had 1615 in it?
>
> Nope.
>
>> I might have seen this in a recent test run.  Let me retry.  At least there
>> is better debug since hbase-1603.
>
> I think the problem is indeed related to a split as it  occurs less
> and less as I rerun the MR job (and thus the region count is already
> high and less splits are needed).
>
> If I store records in HBase from the mappers, the problem is even
> worse when starting with an empty table as records are not sorted and
> more splits occur more rapidly.
>
> Mathias.
>

Re: RetriesExhaustedException

Posted by Mathias Herberts <ma...@gmail.com>.
> You've never run with an older version of TRUNK? Only a recent one, one that
> had 1615 in it?

Nope.

> I might have seen this in a recent test run.  Let me retry.  At least there
> is better debug since hbase-1603.

I think the problem is indeed related to a split as it  occurs less
and less as I rerun the MR job (and thus the region count is already
high and less splits are needed).

If I store records in HBase from the mappers, the problem is even
worse when starting with an empty table as records are not sorted and
more splits occur more rapidly.

Mathias.

Re: RetriesExhaustedException

Posted by stack <st...@duboce.net>.
You've never run with an older version of TRUNK? Only a recent one, one that
had 1615 in it?

I might have seen this in a recent test run.  Let me retry.  At least there
is better debug since hbase-1603.

Thanks for writing the list.

St.Ack

On Wed, Jul 22, 2009 at 9:34 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Hi,
>
> I am using the latest HBase trunk on top of hadoop 0.20.0. I have a MR
> job that digests records and stores them in a table in HBase.
>
> Hadoop/HBase/ZooKeeper are deployed on a cluster of 5 machines (Linux,
> 64bits, 16Gb or RAM, 2x1Tb of disk).
>
> After some time, the MR job fails exceptions similar to:
>
> "org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server Some server for region ...."
>
> This looks like the problem described in HBASE-1603 except the split
> of the region that the job fails to access occurs very close to the
> time of the failure.
>
> HBASE-1615 is applied to the version of HBase trunk this problem occurs on.
>
> Anyone else experiencing that?
>
> Mathias.
>