You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mat Hofschen <ho...@gmail.com> on 2009/03/18 10:37:57 UTC

Re: RetriesExhaustedException for TableReduce

Hi Yair,
check the logs of the machine that refuses connection. I had two problems
during large imports:
1. *"Too many open files*" see http://wiki.apache.org/hadoop/Hbase/FAQ (6)
2. Regions not distributed, heavy write access to one machine.

Hope this helps,
Matthias

On Tue, Mar 17, 2009 at 11:19 PM, Yair Even-Zohar <yaire@audiencescience.com
> wrote:

> While loading a large amount of data to a non-empty table using
> Tablereduce I get the error below.
>
> The first 1-3 reduces are usually successful, and then I get this
> message.
>
>
>
> This error has occur when I'm using either 2 or 8 servers and regardless
> on the number of reduces (4, 16 or 160). It did not occur when loading a
> small amount of data (well, the first few reduces are successful
> anyway).
>
>
>
> I googled   "org.apache.hadoop.hbase.client.RetriesExhaustedException:"
> without much help.
>
>
>
> Thanks
>
> -Yair
>
>
>
>
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server 10.249.203.0:60020 for region
> ase,RnpdOFZn-goAAADK-uMA,1237315693597, row 'T82JYnln-goAAACeGdMA', but
> failed after 10 attempts.
> Exceptions:
> java.io.IOException: Call to /10.249.203.0:60020 failed on local
> exception: java.io.EOFException
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
>
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegion
> ServerWithRetries(HConnectionManager.java:841)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBa
> tchOfRows(HConnectionManager.java:932)
>        at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>        at
> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>        at
> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>        at
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write
> (TableOutputFormat.java:73)
>        at
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write
> (TableOutputFormat.java:53)
>        at
> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:405)
>        at
> com.revenuescience.audiencesearch.fba.ClogUploader$TableUploader.reduce(
> ClogUploader.java:223)
>        at
> com.revenuescience.audiencesearch.fba.ClogUploader$TableUploader.reduce(
> ClogUploader.java:202)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
>        at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
>
>
>

Re: RetriesExhaustedException for TableReduce

Posted by Mat Hofschen <ho...@gmail.com>.
Have you monitored system statistics on the machine in question? On our
testcluster (33 nodes) 120 reduce jobs where trying to write into one
region. That machine showed 100% CPU and a lot of swapping. Basically we are
now making sure only to import into tables that are already well
distributed. We lowered the number of max reduce jobs to run, the memory per
java process (-xmx) and the size of regions (to 64mb).

Did you check the log files on the server that rejected the connections?
Perhaps if you turn debugging on you find out more?

Matthias

On Wed, Mar 18, 2009 at 1:26 PM, Yair Even-Zohar
<ya...@audiencescience.com>wrote:

> I believe it is number (2) below. I'm getting
> "RetriesExhaustedException" for exactly the same server region in all my
> reduce jobs.
>
> How did you get around this problem?
>
> Thanks
> -Yair
>
> -----Original Message-----
> From: Mat Hofschen [mailto:hofschen@gmail.com]
> Sent: Wednesday, March 18, 2009 11:38 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: RetriesExhaustedException for TableReduce
>
> Hi Yair,
> check the logs of the machine that refuses connection. I had two
> problems
> during large imports:
> 1. *"Too many open files*" see http://wiki.apache.org/hadoop/Hbase/FAQ
> (6)
> 2. Regions not distributed, heavy write access to one machine.
>
> Hope this helps,
> Matthias
>
> On Tue, Mar 17, 2009 at 11:19 PM, Yair Even-Zohar
> <yaire@audiencescience.com
> > wrote:
>
> > While loading a large amount of data to a non-empty table using
> > Tablereduce I get the error below.
> >
> > The first 1-3 reduces are usually successful, and then I get this
> > message.
> >
> >
> >
> > This error has occur when I'm using either 2 or 8 servers and
> regardless
> > on the number of reduces (4, 16 or 160). It did not occur when loading
> a
> > small amount of data (well, the first few reduces are successful
> > anyway).
> >
> >
> >
> > I googled
> "org.apache.hadoop.hbase.client.RetriesExhaustedException:"
> > without much help.
> >
> >
> >
> > Thanks
> >
> > -Yair
> >
> >
> >
> >
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> > contact region server 10.249.203.0:60020 for region
> > ase,RnpdOFZn-goAAADK-uMA,1237315693597, row 'T82JYnln-goAAACeGdMA',
> but
> > failed after 10 attempts.
> > Exceptions:
> > java.io.IOException: Call to /10.249.203.0:60020 failed on local
> > exception: java.io.EOFException
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > trying to locate root region
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > trying to locate root region
> >
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegion
> > ServerWithRetries(HConnectionManager.java:841)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBa
> > tchOfRows(HConnectionManager.java:932)
> >        at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> >        at
> > org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
> >        at
> > org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
> >        at
> >
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write
> > (TableOutputFormat.java:73)
> >        at
> >
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write
> > (TableOutputFormat.java:53)
> >        at
> > org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:405)
> >        at
> >
> com.revenuescience.audiencesearch.fba.ClogUploader$TableUploader.reduce(
> > ClogUploader.java:223)
> >        at
> >
> com.revenuescience.audiencesearch.fba.ClogUploader$TableUploader.reduce(
> > ClogUploader.java:202)
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:155)
> >
> >
> >
> >
>

RE: RetriesExhaustedException for TableReduce

Posted by Yair Even-Zohar <ya...@audiencescience.com>.
I believe it is number (2) below. I'm getting
"RetriesExhaustedException" for exactly the same server region in all my
reduce jobs.

How did you get around this problem?

Thanks
-Yair 

-----Original Message-----
From: Mat Hofschen [mailto:hofschen@gmail.com] 
Sent: Wednesday, March 18, 2009 11:38 AM
To: hbase-user@hadoop.apache.org
Subject: Re: RetriesExhaustedException for TableReduce

Hi Yair,
check the logs of the machine that refuses connection. I had two
problems
during large imports:
1. *"Too many open files*" see http://wiki.apache.org/hadoop/Hbase/FAQ
(6)
2. Regions not distributed, heavy write access to one machine.

Hope this helps,
Matthias

On Tue, Mar 17, 2009 at 11:19 PM, Yair Even-Zohar
<yaire@audiencescience.com
> wrote:

> While loading a large amount of data to a non-empty table using
> Tablereduce I get the error below.
>
> The first 1-3 reduces are usually successful, and then I get this
> message.
>
>
>
> This error has occur when I'm using either 2 or 8 servers and
regardless
> on the number of reduces (4, 16 or 160). It did not occur when loading
a
> small amount of data (well, the first few reduces are successful
> anyway).
>
>
>
> I googled
"org.apache.hadoop.hbase.client.RetriesExhaustedException:"
> without much help.
>
>
>
> Thanks
>
> -Yair
>
>
>
>
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server 10.249.203.0:60020 for region
> ase,RnpdOFZn-goAAADK-uMA,1237315693597, row 'T82JYnln-goAAACeGdMA',
but
> failed after 10 attempts.
> Exceptions:
> java.io.IOException: Call to /10.249.203.0:60020 failed on local
> exception: java.io.EOFException
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
>
>        at
>
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegion
> ServerWithRetries(HConnectionManager.java:841)
>        at
>
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBa
> tchOfRows(HConnectionManager.java:932)
>        at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>        at
> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>        at
> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>        at
>
org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write
> (TableOutputFormat.java:73)
>        at
>
org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write
> (TableOutputFormat.java:53)
>        at
> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:405)
>        at
>
com.revenuescience.audiencesearch.fba.ClogUploader$TableUploader.reduce(
> ClogUploader.java:223)
>        at
>
com.revenuescience.audiencesearch.fba.ClogUploader$TableUploader.reduce(
> ClogUploader.java:202)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
>        at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
>
>
>