You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Srikanth P. Shreenivas" <Sr...@mindtree.com> on 2011/07/09 15:14:28 UTC

RE: HBase Read and Write Issues in Mutlithreaded Environments

Hi St.Ack,

We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm, hadoop-hbase-0.90.1+15.18-1.noarch.rpm, hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm).

I ran a the same test which I was running for the app when it was running on CDH2.  The test app posts a request the web app every 100ms, and the web app reads a HBase record, performs some logic, and saves an audit trail by writing another HBase record.

When our app was running on CDH2, I observed the below issue for every 10 to 15 requests.
With CDH3, this issue is not happening at all.  So, seems like situation has improved a lot, and our app seems to be lot more stable.

However, I am still seeing an issue though.  There are many requests (around 1%) which are not able to read the record from the HBase, and the get call is hanging for almost 10 minutes.  This is what I see in application log:

2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR [my.app.HBaseHandler]  - Exception occurred in searchData:
java.io.IOException: Giving up trying to get region server: thread is interrupted.
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)

        <...app specific trace removed...>

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)


I am running the test on the same record, so all by "get" are for same row id.



It will be of immense help if you can provide some inputs on whether we are missing some configuration settings, or is there a way to get around this.

Thanks,
Srikanth






-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Wednesday, June 29, 2011 7:48 PM
To: user@hbase.apache.org
Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments

Go to CDH3 if you can.  CDH2 is also old.
St.Ack

On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas
<Sr...@mindtree.com> wrote:
> Thanks St. Ack for the inputs.
>
> Will upgrading to CDH3 help or is there a version within CDH2 that you recommend we should upgrade to?
>
> Regards,
> Srikanth
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Wednesday, June 29, 2011 11:16 AM
> To: user@hbase.apache.org
> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>
> Can you upgrade?  That release is > 18 months old.  A bunch has
> happened in the meantime.
>
> For retries exhausted, check whats going on on the remote regionserver
> that you are trying to write too.  Its probably struggling and thats
> why requests are not going through -- or the client missed the fact
> that region moved (all stuff that should be working better in latest
> hbase).
>
> St.Ack
>
> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas
> <Sr...@mindtree.com> wrote:
>> Hi,
>>
>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm) cluster in distributed mode with Hadoop 0.20.2 (hadoop-0.20-0.20.2+320-1.noarch).
>> We are using pretty much default configuration, and only thing we have customized is that we have allocated 4GB RAM in /etc/hbase-0.20/conf/hbase-env.sh
>>
>> In our setup, we have a web application that reads a record from HBase and writes a record as part of each web request.   The application is hosted in Apache Tomcat 7 and is a stateless web application providing a REST-like web service API.
>>
>> We are observing that our reads and writes times out once in a  while.  This happens more for writes.
>> We see below exception in our application logs:
>>
>>
>> Exception Type 1 - During Get:
>> ---------------------------------------
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.1.68.36:60020 for region employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879, row 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10 attempts.
>> Exceptions:
>> java.io.IOException: Call to /10.1.68.36:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>>
>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>>     <snip>
>>
>> Exception  Type 2 - During Put:
>> ---------------------------------------------
>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.1.68.34:60020 for region audittable,,1309183872019, row '2a012017120f80a801b28f5f66a83dc2a8882d1b', but failed after 10 attempts.
>> Exceptions:
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>
>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1239)
>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1161)
>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>>        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
>>     <snip>
>>
>> Any inputs on why this is happening, or how to rectify it will be of immense help.
>>
>> Thanks,
>> Srikanth
>>
>>
>>
>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80 26264000 / Fax +91 80 2626 4100| Mob: 9880141059|email: srikanth_shreenivas@mindtree.com<ma...@mindtree.com> |www.mindtree.com<http://www.mindtree.com/> |
>>
>>
>> ________________________________
>>
>> http://www.mindtree.com/email/disclaimer.html
>>
>

RE: HBase Read and Write Issues in Mutlithreaded Environments

Posted by "Srikanth P. Shreenivas" <Sr...@mindtree.com>.

Doug, St.ack,

We changed our production setup to CDH3 to resolve below mentioned issue.
I noticed that even though the severs were running JDK 1.6 u25 (as per JAVA_HOME in hbase-env.sh), I still ran into read taking more a than minute issue.
So, I have added -XX:+UseMembar and it seems to be okay after that.

We have a parallel system that is still running CDH2.  I added -XX:+UseMembar there too as that too had this issue and was only worse.  It too seems to be lot stabler now.

Regards
Srikanth



-----Original Message-----
From: Doug Meil [mailto:doug.meil@explorysmedical.com] 
Sent: Friday, July 15, 2011 9:06 PM
To: user@hbase.apache.org
Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments


Glad to hear things are better Srikanth.  I'll add that to the
Troubleshooting chapter too to make it a little more obvious.



On 7/15/11 11:30 AM, "Srikanth P. Shreenivas"
<Sr...@mindtree.com> wrote:

>Hi St.Ack,
>
>I stumbled upon http://hbase.apache.org/book.html#d730e4957 in one of the
>other mail threads in HBase user mailing list.
>
>We realized that we were running JVM 1.6.0_20-b02, and hence, we tried
>adding -XX:+UseMembar as suggested in above mentioned FAQ.
>This seems to have resolved the issue.  I ran the test app for 20 minutes
>with no read timeouts.
>
>
>Thanks for all the help.
>
>Regards,
>Srikanth
>
>
>
>-----Original Message-----
>From: Srikanth P. Shreenivas
>Sent: Sunday, July 10, 2011 5:20 PM
>To: user@hbase.apache.org
>Subject: RE: HBase Read and Write Issues in Mutlithreaded Environments
>
>Hi St.Ack,
>
>I noticed that one of the region server machines had time running one day
>in future.
>I corrected the date. I ran into some issues after restarting, I was
>getting error with respect to .META. and stuff which I did not understand
>much.  Also, status command in hbase shell was displaying "3 servers, 1
>dead" whereas I had only 3 region server.
>
>So, I cleaned the "/hbase" (to get to real problem) and restarted the
>hbase nodes.
>
>After starting all the 3 nodes of HBase, I ran the test app again and was
>observing the log files of all the 3 region servers.
>I noticed that when test app seemed hung, the web app's thread that was
>serving the request has gone to sleep at below code. I think it stayed
>like that for around 10 minutes before Tomcat probably interrupted it.
>
>Thread-#8 - Thread t@29
>   java.lang.Thread.State: TIMED_WAITING
>        at java.lang.Thread.sleep(Native Method)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegionInMeta(HConnectionManager.java:791)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegion(HConnectionManager.java:589)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.relocateRegion(HConnectionManager.java:564)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionLocation(HConnectionManager.java:415)
>        at
>org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCall
>able.java:57)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithRetries(HConnectionManager.java:1002)
>        at
>org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:514)
>        at
>org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:133)
>        at
>org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.prefetchRegionCache(HConnectionManager.java:648)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegionInMeta(HConnectionManager.java:702)
>        - locked java.lang.Object@75826e08
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegion(HConnectionManager.java:593)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.relocateRegion(HConnectionManager.java:564)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionLocation(HConnectionManager.java:415)
>        at
>org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCall
>able.java:57)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithRetries(HConnectionManager.java:1002)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>        <.. app specific trace removed ...>
>        at
>java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.
>java:886)
>        at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:908)
>        at java.lang.Thread.run(Thread.java:619)
>
>==========================================================================
>==================
>After 10 minutes, web app log showed:
>2011-07-10 16:50:28,804 [Thread-#8] ERROR
>[persistence.handler.HBaseHandler]  - Exception occurred in searchData:
>java.io.IOException: Giving up trying to get region server: thread is
>interrupted.
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithRetries(HConnectionManager.java:1016)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>
>==========================================================================
>==================
>I did not see anything happening on region server either, the log had
>occasional entries like these:
>
>2011-07-10 16:43:53,648 DEBUG
>org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB,
>free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0,
>hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿1Ž2%,
>evictions=0, evicted=0, evictedPerRun=NaN
>2011-07-10 16:48:53,649 DEBUG
>org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB,
>free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0,
>hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿1Ž2%,
>evictions=0, evicted=0, evictedPerRun=NaN
>2011-07-10 16:53:53,648 DEBUG
>org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB,
>free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0,
>hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿1Ž2%,
>evictions=0, evicted=0, evictedPerRun=NaN
>2
>
>
>
>
>
>Regards,
>Srikanth
>
>
>-----Original Message-----
>From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>Sent: Saturday, July 09, 2011 9:41 PM
>To: user@hbase.apache.org
>Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>
>You read the requirements section in our docs and you have upped the
>ulimits, nprocs, etc?  http://hbase.apache.org/book/os.html
>
>If you know the row, can you deduce the regionserver its talking too?
>(Below is the client failure -- we need to figure whats up on
>server-side).  Once you've done that, can you check its logs?  See if
>you can figure anything on why the hang?
>
>Thanks,
>St.Ack
>
>On Sat, Jul 9, 2011 at 6:14 AM, Srikanth P. Shreenivas
><Sr...@mindtree.com> wrote:
>> Hi St.Ack,
>>
>> We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm,
>>hadoop-hbase-0.90.1+15.18-1.noarch.rpm,
>>hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm).
>>
>> I ran a the same test which I was running for the app when it was
>>running on CDH2.  The test app posts a request the web app every 100ms,
>>and the web app reads a HBase record, performs some logic, and saves an
>>audit trail by writing another HBase record.
>>
>> When our app was running on CDH2, I observed the below issue for every
>>10 to 15 requests.
>> With CDH3, this issue is not happening at all.  So, seems like
>>situation has improved a lot, and our app seems to be lot more stable.
>>
>> However, I am still seeing an issue though.  There are many requests
>>(around 1%) which are not able to read the record from the HBase, and
>>the get call is hanging for almost 10 minutes.  This is what I see in
>>application log:
>>
>> 2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR
>>[my.app.HBaseHandler]  - Exception occurred in searchData:
>> java.io.IOException: Giving up trying to get region server: thread is
>>interrupted.
>>        at
>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementati
>>on.getRegionServerWithRetries(HConnectionManager.java:1016)
>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>>
>>        <...app specific trace removed...>
>>
>>        at
>>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>        at
>>java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>        at
>>org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
>>        at
>>java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor
>>.java:886)
>>        at
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
>>a:908)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>> I am running the test on the same record, so all by "get" are for same
>>row id.
>>
>>
>>
>> It will be of immense help if you can provide some inputs on whether we
>>are missing some configuration settings, or is there a way to get around
>>this.
>>
>> Thanks,
>> Srikanth
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>Stack
>> Sent: Wednesday, June 29, 2011 7:48 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>>
>> Go to CDH3 if you can.  CDH2 is also old.
>> St.Ack
>>
>> On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas
>> <Sr...@mindtree.com> wrote:
>>> Thanks St. Ack for the inputs.
>>>
>>> Will upgrading to CDH3 help or is there a version within CDH2 that you
>>>recommend we should upgrade to?
>>>
>>> Regards,
>>> Srikanth
>>>
>>> -----Original Message-----
>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>>Stack
>>> Sent: Wednesday, June 29, 2011 11:16 AM
>>> To: user@hbase.apache.org
>>> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>>>
>>> Can you upgrade?  That release is > 18 months old.  A bunch has
>>> happened in the meantime.
>>>
>>> For retries exhausted, check whats going on on the remote regionserver
>>> that you are trying to write too.  Its probably struggling and thats
>>> why requests are not going through -- or the client missed the fact
>>> that region moved (all stuff that should be working better in latest
>>> hbase).
>>>
>>> St.Ack
>>>
>>> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas
>>> <Sr...@mindtree.com> wrote:
>>>> Hi,
>>>>
>>>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm)
>>>>cluster in distributed mode with Hadoop 0.20.2
>>>>(hadoop-0.20-0.20.2+320-1.noarch).
>>>> We are using pretty much default configuration, and only thing we
>>>>have customized is that we have allocated 4GB RAM in
>>>>/etc/hbase-0.20/conf/hbase-env.sh
>>>>
>>>> In our setup, we have a web application that reads a record from
>>>>HBase and writes a record as part of each web request.   The
>>>>application is hosted in Apache Tomcat 7 and is a stateless web
>>>>application providing a REST-like web service API.
>>>>
>>>> We are observing that our reads and writes times out once in a
>>>>while.  This happens more for writes.
>>>> We see below exception in our application logs:
>>>>
>>>>
>>>> Exception Type 1 - During Get:
>>>> ---------------------------------------
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>>>contact region server 10.1.68.36:60020 for region
>>>>employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879,
>>>>row 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10
>>>>attempts.
>>>> Exceptions:
>>>> java.io.IOException: Call to /10.1.68.36:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>>
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegio
>>>>nServerWithRetries(HConnectionManager.java:1048)
>>>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>>>>     <snip>
>>>>
>>>> Exception  Type 2 - During Put:
>>>> ---------------------------------------------
>>>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>>>>Trying to contact region server 10.1.68.34:60020 for region
>>>>audittable,,1309183872019, row
>>>>'2a012017120f80a801b28f5f66a83dc2a8882d1b', but failed after 10
>>>>attempts.
>>>> Exceptions:
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>>
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegio
>>>>nServerWithRetries(HConnectionManager.java:1048)
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall
>>>>(HConnectionManager.java:1239)
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.pr
>>>>ocess(HConnectionManager.java:1161)
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processB
>>>>atchOfRows(HConnectionManager.java:1247)
>>>>        at
>>>>org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
>>>>     <snip>
>>>>
>>>> Any inputs on why this is happening, or how to rectify it will be of
>>>>immense help.
>>>>
>>>> Thanks,
>>>> Srikanth
>>>>
>>>>
>>>>
>>>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global
>>>>Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80
>>>>26264000 / Fax +91 80 2626 4100| Mob: 9880141059|email:
>>>>srikanth_shreenivas@mindtree.com<ma...@mindtree.com>
>>>>|www.mindtree.com<http://www.mindtree.com/> |
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> http://www.mindtree.com/email/disclaimer.html
>>>>
>>>
>>

Re: HBase Read and Write Issues in Mutlithreaded Environments

Posted by Stack <st...@duboce.net>.

On Fri, Jul 15, 2011 at 8:30 AM, Srikanth P. Shreenivas
<Sr...@mindtree.com> wrote:
> Hi St.Ack,
>
> I stumbled upon http://hbase.apache.org/book.html#d730e4957 in one of the other mail threads in HBase user mailing list.
>
> We realized that we were running JVM 1.6.0_20-b02, and hence, we tried adding -XX:+UseMembar as suggested in above mentioned FAQ.
> This seems to have resolved the issue.  I ran the test app for 20 minutes with no read timeouts.
>
>

Ugh.  Sorry.  I should have remembered that one (Thanks for adding to
book Doug -- its an important one to have in there I'd say).   We run
u24 at our place w/o -XX:+UseMembar

St.Ack

Re: HBase Read and Write Issues in Mutlithreaded Environments

Posted by Doug Meil <do...@explorysmedical.com>.

Glad to hear things are better Srikanth.  I'll add that to the
Troubleshooting chapter too to make it a little more obvious.



On 7/15/11 11:30 AM, "Srikanth P. Shreenivas"
<Sr...@mindtree.com> wrote:

>Hi St.Ack,
>
>I stumbled upon http://hbase.apache.org/book.html#d730e4957 in one of the
>other mail threads in HBase user mailing list.
>
>We realized that we were running JVM 1.6.0_20-b02, and hence, we tried
>adding -XX:+UseMembar as suggested in above mentioned FAQ.
>This seems to have resolved the issue.  I ran the test app for 20 minutes
>with no read timeouts.
>
>
>Thanks for all the help.
>
>Regards,
>Srikanth
>
>
>
>-----Original Message-----
>From: Srikanth P. Shreenivas
>Sent: Sunday, July 10, 2011 5:20 PM
>To: user@hbase.apache.org
>Subject: RE: HBase Read and Write Issues in Mutlithreaded Environments
>
>Hi St.Ack,
>
>I noticed that one of the region server machines had time running one day
>in future.
>I corrected the date. I ran into some issues after restarting, I was
>getting error with respect to .META. and stuff which I did not understand
>much.  Also, status command in hbase shell was displaying "3 servers, 1
>dead" whereas I had only 3 region server.
>
>So, I cleaned the "/hbase" (to get to real problem) and restarted the
>hbase nodes.
>
>After starting all the 3 nodes of HBase, I ran the test app again and was
>observing the log files of all the 3 region servers.
>I noticed that when test app seemed hung, the web app's thread that was
>serving the request has gone to sleep at below code. I think it stayed
>like that for around 10 minutes before Tomcat probably interrupted it.
>
>Thread-#8 - Thread t@29
>   java.lang.Thread.State: TIMED_WAITING
>        at java.lang.Thread.sleep(Native Method)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegionInMeta(HConnectionManager.java:791)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegion(HConnectionManager.java:589)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.relocateRegion(HConnectionManager.java:564)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionLocation(HConnectionManager.java:415)
>        at
>org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCall
>able.java:57)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithRetries(HConnectionManager.java:1002)
>        at
>org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:514)
>        at
>org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:133)
>        at
>org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.prefetchRegionCache(HConnectionManager.java:648)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegionInMeta(HConnectionManager.java:702)
>        - locked java.lang.Object@75826e08
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegion(HConnectionManager.java:593)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.relocateRegion(HConnectionManager.java:564)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionLocation(HConnectionManager.java:415)
>        at
>org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCall
>able.java:57)
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithRetries(HConnectionManager.java:1002)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>        <.. app specific trace removed ...>
>        at
>java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.
>java:886)
>        at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:908)
>        at java.lang.Thread.run(Thread.java:619)
>
>==========================================================================
>==================
>After 10 minutes, web app log showed:
>2011-07-10 16:50:28,804 [Thread-#8] ERROR
>[persistence.handler.HBaseHandler]  - Exception occurred in searchData:
>java.io.IOException: Giving up trying to get region server: thread is
>interrupted.
>        at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithRetries(HConnectionManager.java:1016)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>
>==========================================================================
>==================
>I did not see anything happening on region server either, the log had
>occasional entries like these:
>
>2011-07-10 16:43:53,648 DEBUG
>org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB,
>free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0,
>hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿1Ž2%,
>evictions=0, evicted=0, evictedPerRun=NaN
>2011-07-10 16:48:53,649 DEBUG
>org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB,
>free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0,
>hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿1Ž2%,
>evictions=0, evicted=0, evictedPerRun=NaN
>2011-07-10 16:53:53,648 DEBUG
>org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB,
>free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0,
>hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿1Ž2%,
>evictions=0, evicted=0, evictedPerRun=NaN
>2
>
>
>
>
>
>Regards,
>Srikanth
>
>
>-----Original Message-----
>From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>Sent: Saturday, July 09, 2011 9:41 PM
>To: user@hbase.apache.org
>Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>
>You read the requirements section in our docs and you have upped the
>ulimits, nprocs, etc?  http://hbase.apache.org/book/os.html
>
>If you know the row, can you deduce the regionserver its talking too?
>(Below is the client failure -- we need to figure whats up on
>server-side).  Once you've done that, can you check its logs?  See if
>you can figure anything on why the hang?
>
>Thanks,
>St.Ack
>
>On Sat, Jul 9, 2011 at 6:14 AM, Srikanth P. Shreenivas
><Sr...@mindtree.com> wrote:
>> Hi St.Ack,
>>
>> We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm,
>>hadoop-hbase-0.90.1+15.18-1.noarch.rpm,
>>hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm).
>>
>> I ran a the same test which I was running for the app when it was
>>running on CDH2.  The test app posts a request the web app every 100ms,
>>and the web app reads a HBase record, performs some logic, and saves an
>>audit trail by writing another HBase record.
>>
>> When our app was running on CDH2, I observed the below issue for every
>>10 to 15 requests.
>> With CDH3, this issue is not happening at all.  So, seems like
>>situation has improved a lot, and our app seems to be lot more stable.
>>
>> However, I am still seeing an issue though.  There are many requests
>>(around 1%) which are not able to read the record from the HBase, and
>>the get call is hanging for almost 10 minutes.  This is what I see in
>>application log:
>>
>> 2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR
>>[my.app.HBaseHandler]  - Exception occurred in searchData:
>> java.io.IOException: Giving up trying to get region server: thread is
>>interrupted.
>>        at
>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementati
>>on.getRegionServerWithRetries(HConnectionManager.java:1016)
>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>>
>>        <...app specific trace removed...>
>>
>>        at
>>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>        at
>>java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>        at
>>org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
>>        at
>>java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor
>>.java:886)
>>        at
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
>>a:908)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>> I am running the test on the same record, so all by "get" are for same
>>row id.
>>
>>
>>
>> It will be of immense help if you can provide some inputs on whether we
>>are missing some configuration settings, or is there a way to get around
>>this.
>>
>> Thanks,
>> Srikanth
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>Stack
>> Sent: Wednesday, June 29, 2011 7:48 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>>
>> Go to CDH3 if you can.  CDH2 is also old.
>> St.Ack
>>
>> On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas
>> <Sr...@mindtree.com> wrote:
>>> Thanks St. Ack for the inputs.
>>>
>>> Will upgrading to CDH3 help or is there a version within CDH2 that you
>>>recommend we should upgrade to?
>>>
>>> Regards,
>>> Srikanth
>>>
>>> -----Original Message-----
>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>>Stack
>>> Sent: Wednesday, June 29, 2011 11:16 AM
>>> To: user@hbase.apache.org
>>> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>>>
>>> Can you upgrade?  That release is > 18 months old.  A bunch has
>>> happened in the meantime.
>>>
>>> For retries exhausted, check whats going on on the remote regionserver
>>> that you are trying to write too.  Its probably struggling and thats
>>> why requests are not going through -- or the client missed the fact
>>> that region moved (all stuff that should be working better in latest
>>> hbase).
>>>
>>> St.Ack
>>>
>>> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas
>>> <Sr...@mindtree.com> wrote:
>>>> Hi,
>>>>
>>>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm)
>>>>cluster in distributed mode with Hadoop 0.20.2
>>>>(hadoop-0.20-0.20.2+320-1.noarch).
>>>> We are using pretty much default configuration, and only thing we
>>>>have customized is that we have allocated 4GB RAM in
>>>>/etc/hbase-0.20/conf/hbase-env.sh
>>>>
>>>> In our setup, we have a web application that reads a record from
>>>>HBase and writes a record as part of each web request.   The
>>>>application is hosted in Apache Tomcat 7 and is a stateless web
>>>>application providing a REST-like web service API.
>>>>
>>>> We are observing that our reads and writes times out once in a
>>>>while.  This happens more for writes.
>>>> We see below exception in our application logs:
>>>>
>>>>
>>>> Exception Type 1 - During Get:
>>>> ---------------------------------------
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>>>contact region server 10.1.68.36:60020 for region
>>>>employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879,
>>>>row 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10
>>>>attempts.
>>>> Exceptions:
>>>> java.io.IOException: Call to /10.1.68.36:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>> java.nio.channels.ClosedByInterruptException
>>>>
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegio
>>>>nServerWithRetries(HConnectionManager.java:1048)
>>>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>>>>     <snip>
>>>>
>>>> Exception  Type 2 - During Put:
>>>> ---------------------------------------------
>>>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>>>>Trying to contact region server 10.1.68.34:60020 for region
>>>>audittable,,1309183872019, row
>>>>'2a012017120f80a801b28f5f66a83dc2a8882d1b', but failed after 10
>>>>attempts.
>>>> Exceptions:
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local
>>>>exception: java.nio.channels.ClosedByInterruptException
>>>>
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegio
>>>>nServerWithRetries(HConnectionManager.java:1048)
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall
>>>>(HConnectionManager.java:1239)
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.pr
>>>>ocess(HConnectionManager.java:1161)
>>>>        at
>>>>org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processB
>>>>atchOfRows(HConnectionManager.java:1247)
>>>>        at
>>>>org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
>>>>     <snip>
>>>>
>>>> Any inputs on why this is happening, or how to rectify it will be of
>>>>immense help.
>>>>
>>>> Thanks,
>>>> Srikanth
>>>>
>>>>
>>>>
>>>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global
>>>>Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80
>>>>26264000 / Fax +91 80 2626 4100| Mob: 9880141059|email:
>>>>srikanth_shreenivas@mindtree.com<ma...@mindtree.com>
>>>>|www.mindtree.com<http://www.mindtree.com/> |
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> http://www.mindtree.com/email/disclaimer.html
>>>>
>>>
>>

RE: HBase Read and Write Issues in Mutlithreaded Environments

Posted by "Srikanth P. Shreenivas" <Sr...@mindtree.com>.

Hi St.Ack,

I stumbled upon http://hbase.apache.org/book.html#d730e4957 in one of the other mail threads in HBase user mailing list.

We realized that we were running JVM 1.6.0_20-b02, and hence, we tried adding -XX:+UseMembar as suggested in above mentioned FAQ.
This seems to have resolved the issue.  I ran the test app for 20 minutes with no read timeouts.


Thanks for all the help.

Regards,
Srikanth



-----Original Message-----
From: Srikanth P. Shreenivas 
Sent: Sunday, July 10, 2011 5:20 PM
To: user@hbase.apache.org
Subject: RE: HBase Read and Write Issues in Mutlithreaded Environments

Hi St.Ack,

I noticed that one of the region server machines had time running one day in future.
I corrected the date. I ran into some issues after restarting, I was getting error with respect to .META. and stuff which I did not understand much.  Also, status command in hbase shell was displaying "3 servers, 1 dead" whereas I had only 3 region server.

So, I cleaned the "/hbase" (to get to real problem) and restarted the hbase nodes.

After starting all the 3 nodes of HBase, I ran the test app again and was observing the log files of all the 3 region servers.
I noticed that when test app seemed hung, the web app's thread that was serving the request has gone to sleep at below code. I think it stayed like that for around 10 minutes before Tomcat probably interrupted it.

Thread-#8 - Thread t@29
   java.lang.Thread.State: TIMED_WAITING
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:791)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:589)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:564)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:415)
	at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1002)
	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:514)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:133)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:648)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:702)
	- locked java.lang.Object@75826e08
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:593)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:564)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:415)
	at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1002)
	at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
	<.. app specific trace removed ...>
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

============================================================================================
After 10 minutes, web app log showed:
2011-07-10 16:50:28,804 [Thread-#8] ERROR [persistence.handler.HBaseHandler]  - Exception occurred in searchData:
java.io.IOException: Giving up trying to get region server: thread is interrupted.
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)

============================================================================================
I did not see anything happening on region server either, the log had occasional entries like these:

2011-07-10 16:43:53,648 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB, free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0, hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿½%, evictions=0, evicted=0, evictedPerRun=NaN
2011-07-10 16:48:53,649 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB, free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0, hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿½%, evictions=0, evicted=0, evictedPerRun=NaN
2011-07-10 16:53:53,648 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB, free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0, hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿½%, evictions=0, evicted=0, evictedPerRun=NaN
2





Regards,
Srikanth


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Saturday, July 09, 2011 9:41 PM
To: user@hbase.apache.org
Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments

You read the requirements section in our docs and you have upped the
ulimits, nprocs, etc?  http://hbase.apache.org/book/os.html

If you know the row, can you deduce the regionserver its talking too?
(Below is the client failure -- we need to figure whats up on
server-side).  Once you've done that, can you check its logs?  See if
you can figure anything on why the hang?

Thanks,
St.Ack

On Sat, Jul 9, 2011 at 6:14 AM, Srikanth P. Shreenivas
<Sr...@mindtree.com> wrote:
> Hi St.Ack,
>
> We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm, hadoop-hbase-0.90.1+15.18-1.noarch.rpm, hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm).
>
> I ran a the same test which I was running for the app when it was running on CDH2.  The test app posts a request the web app every 100ms, and the web app reads a HBase record, performs some logic, and saves an audit trail by writing another HBase record.
>
> When our app was running on CDH2, I observed the below issue for every 10 to 15 requests.
> With CDH3, this issue is not happening at all.  So, seems like situation has improved a lot, and our app seems to be lot more stable.
>
> However, I am still seeing an issue though.  There are many requests (around 1%) which are not able to read the record from the HBase, and the get call is hanging for almost 10 minutes.  This is what I see in application log:
>
> 2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR [my.app.HBaseHandler]  - Exception occurred in searchData:
> java.io.IOException: Giving up trying to get region server: thread is interrupted.
>        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>
>        <...app specific trace removed...>
>
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
>
> I am running the test on the same record, so all by "get" are for same row id.
>
>
>
> It will be of immense help if you can provide some inputs on whether we are missing some configuration settings, or is there a way to get around this.
>
> Thanks,
> Srikanth
>
>
>
>
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Wednesday, June 29, 2011 7:48 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>
> Go to CDH3 if you can.  CDH2 is also old.
> St.Ack
>
> On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas
> <Sr...@mindtree.com> wrote:
>> Thanks St. Ack for the inputs.
>>
>> Will upgrading to CDH3 help or is there a version within CDH2 that you recommend we should upgrade to?
>>
>> Regards,
>> Srikanth
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>> Sent: Wednesday, June 29, 2011 11:16 AM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>>
>> Can you upgrade?  That release is > 18 months old.  A bunch has
>> happened in the meantime.
>>
>> For retries exhausted, check whats going on on the remote regionserver
>> that you are trying to write too.  Its probably struggling and thats
>> why requests are not going through -- or the client missed the fact
>> that region moved (all stuff that should be working better in latest
>> hbase).
>>
>> St.Ack
>>
>> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas
>> <Sr...@mindtree.com> wrote:
>>> Hi,
>>>
>>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm) cluster in distributed mode with Hadoop 0.20.2 (hadoop-0.20-0.20.2+320-1.noarch).
>>> We are using pretty much default configuration, and only thing we have customized is that we have allocated 4GB RAM in /etc/hbase-0.20/conf/hbase-env.sh
>>>
>>> In our setup, we have a web application that reads a record from HBase and writes a record as part of each web request.   The application is hosted in Apache Tomcat 7 and is a stateless web application providing a REST-like web service API.
>>>
>>> We are observing that our reads and writes times out once in a  while.  This happens more for writes.
>>> We see below exception in our application logs:
>>>
>>>
>>> Exception Type 1 - During Get:
>>> ---------------------------------------
>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.1.68.36:60020 for region employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879, row 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10 attempts.
>>> Exceptions:
>>> java.io.IOException: Call to /10.1.68.36:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>>
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>>>     <snip>
>>>
>>> Exception  Type 2 - During Put:
>>> ---------------------------------------------
>>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.1.68.34:60020 for region audittable,,1309183872019, row '2a012017120f80a801b28f5f66a83dc2a8882d1b', but failed after 10 attempts.
>>> Exceptions:
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>>
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1239)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1161)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>>>        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
>>>     <snip>
>>>
>>> Any inputs on why this is happening, or how to rectify it will be of immense help.
>>>
>>> Thanks,
>>> Srikanth
>>>
>>>
>>>
>>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80 26264000 / Fax +91 80 2626 4100| Mob: 9880141059|email: srikanth_shreenivas@mindtree.com<ma...@mindtree.com> |www.mindtree.com<http://www.mindtree.com/> |
>>>
>>>
>>> ________________________________
>>>
>>> http://www.mindtree.com/email/disclaimer.html
>>>
>>
>

RE: HBase Read and Write Issues in Mutlithreaded Environments

Posted by "Srikanth P. Shreenivas" <Sr...@mindtree.com>.

Hi St.Ack,

I noticed that one of the region server machines had time running one day in future.
I corrected the date. I ran into some issues after restarting, I was getting error with respect to .META. and stuff which I did not understand much.  Also, status command in hbase shell was displaying "3 servers, 1 dead" whereas I had only 3 region server.

So, I cleaned the "/hbase" (to get to real problem) and restarted the hbase nodes.

After starting all the 3 nodes of HBase, I ran the test app again and was observing the log files of all the 3 region servers.
I noticed that when test app seemed hung, the web app's thread that was serving the request has gone to sleep at below code. I think it stayed like that for around 10 minutes before Tomcat probably interrupted it.

Thread-#8 - Thread t@29
   java.lang.Thread.State: TIMED_WAITING
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:791)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:589)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:564)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:415)
	at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1002)
	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:514)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:133)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:648)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:702)
	- locked java.lang.Object@75826e08
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:593)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:564)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:415)
	at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1002)
	at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
	<.. app specific trace removed ...>
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

============================================================================================
After 10 minutes, web app log showed:
2011-07-10 16:50:28,804 [Thread-#8] ERROR [persistence.handler.HBaseHandler]  - Exception occurred in searchData:
java.io.IOException: Giving up trying to get region server: thread is interrupted.
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)

============================================================================================
I did not see anything happening on region server either, the log had occasional entries like these:

2011-07-10 16:43:53,648 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB, free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0, hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿½%, evictions=0, evicted=0, evictedPerRun=NaN
2011-07-10 16:48:53,649 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB, free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0, hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿½%, evictions=0, evicted=0, evictedPerRun=NaN
2011-07-10 16:53:53,648 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.52 MB, free=788.08 MB, max=794.6 MB, blocks=0, accesses=1080, hits=0, hitRatio=0.00%%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=ï¿½%, evictions=0, evicted=0, evictedPerRun=NaN
2





Regards,
Srikanth


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Saturday, July 09, 2011 9:41 PM
To: user@hbase.apache.org
Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments

You read the requirements section in our docs and you have upped the
ulimits, nprocs, etc?  http://hbase.apache.org/book/os.html

If you know the row, can you deduce the regionserver its talking too?
(Below is the client failure -- we need to figure whats up on
server-side).  Once you've done that, can you check its logs?  See if
you can figure anything on why the hang?

Thanks,
St.Ack

On Sat, Jul 9, 2011 at 6:14 AM, Srikanth P. Shreenivas
<Sr...@mindtree.com> wrote:
> Hi St.Ack,
>
> We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm, hadoop-hbase-0.90.1+15.18-1.noarch.rpm, hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm).
>
> I ran a the same test which I was running for the app when it was running on CDH2.  The test app posts a request the web app every 100ms, and the web app reads a HBase record, performs some logic, and saves an audit trail by writing another HBase record.
>
> When our app was running on CDH2, I observed the below issue for every 10 to 15 requests.
> With CDH3, this issue is not happening at all.  So, seems like situation has improved a lot, and our app seems to be lot more stable.
>
> However, I am still seeing an issue though.  There are many requests (around 1%) which are not able to read the record from the HBase, and the get call is hanging for almost 10 minutes.  This is what I see in application log:
>
> 2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR [my.app.HBaseHandler]  - Exception occurred in searchData:
> java.io.IOException: Giving up trying to get region server: thread is interrupted.
>        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>
>        <...app specific trace removed...>
>
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
>
> I am running the test on the same record, so all by "get" are for same row id.
>
>
>
> It will be of immense help if you can provide some inputs on whether we are missing some configuration settings, or is there a way to get around this.
>
> Thanks,
> Srikanth
>
>
>
>
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Wednesday, June 29, 2011 7:48 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>
> Go to CDH3 if you can.  CDH2 is also old.
> St.Ack
>
> On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas
> <Sr...@mindtree.com> wrote:
>> Thanks St. Ack for the inputs.
>>
>> Will upgrading to CDH3 help or is there a version within CDH2 that you recommend we should upgrade to?
>>
>> Regards,
>> Srikanth
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>> Sent: Wednesday, June 29, 2011 11:16 AM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>>
>> Can you upgrade?  That release is > 18 months old.  A bunch has
>> happened in the meantime.
>>
>> For retries exhausted, check whats going on on the remote regionserver
>> that you are trying to write too.  Its probably struggling and thats
>> why requests are not going through -- or the client missed the fact
>> that region moved (all stuff that should be working better in latest
>> hbase).
>>
>> St.Ack
>>
>> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas
>> <Sr...@mindtree.com> wrote:
>>> Hi,
>>>
>>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm) cluster in distributed mode with Hadoop 0.20.2 (hadoop-0.20-0.20.2+320-1.noarch).
>>> We are using pretty much default configuration, and only thing we have customized is that we have allocated 4GB RAM in /etc/hbase-0.20/conf/hbase-env.sh
>>>
>>> In our setup, we have a web application that reads a record from HBase and writes a record as part of each web request.   The application is hosted in Apache Tomcat 7 and is a stateless web application providing a REST-like web service API.
>>>
>>> We are observing that our reads and writes times out once in a  while.  This happens more for writes.
>>> We see below exception in our application logs:
>>>
>>>
>>> Exception Type 1 - During Get:
>>> ---------------------------------------
>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.1.68.36:60020 for region employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879, row 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10 attempts.
>>> Exceptions:
>>> java.io.IOException: Call to /10.1.68.36:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>>
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>>>     <snip>
>>>
>>> Exception  Type 2 - During Put:
>>> ---------------------------------------------
>>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.1.68.34:60020 for region audittable,,1309183872019, row '2a012017120f80a801b28f5f66a83dc2a8882d1b', but failed after 10 attempts.
>>> Exceptions:
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>>
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1239)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1161)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>>>        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
>>>     <snip>
>>>
>>> Any inputs on why this is happening, or how to rectify it will be of immense help.
>>>
>>> Thanks,
>>> Srikanth
>>>
>>>
>>>
>>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80 26264000 / Fax +91 80 2626 4100| Mob: 9880141059|email: srikanth_shreenivas@mindtree.com<ma...@mindtree.com> |www.mindtree.com<http://www.mindtree.com/> |
>>>
>>>
>>> ________________________________
>>>
>>> http://www.mindtree.com/email/disclaimer.html
>>>
>>
>

Re: HBase Read and Write Issues in Mutlithreaded Environments

Posted by Stack <st...@duboce.net>.

You read the requirements section in our docs and you have upped the
ulimits, nprocs, etc?  http://hbase.apache.org/book/os.html

If you know the row, can you deduce the regionserver its talking too?
(Below is the client failure -- we need to figure whats up on
server-side).  Once you've done that, can you check its logs?  See if
you can figure anything on why the hang?

Thanks,
St.Ack

On Sat, Jul 9, 2011 at 6:14 AM, Srikanth P. Shreenivas
<Sr...@mindtree.com> wrote:
> Hi St.Ack,
>
> We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm, hadoop-hbase-0.90.1+15.18-1.noarch.rpm, hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm).
>
> I ran a the same test which I was running for the app when it was running on CDH2.  The test app posts a request the web app every 100ms, and the web app reads a HBase record, performs some logic, and saves an audit trail by writing another HBase record.
>
> When our app was running on CDH2, I observed the below issue for every 10 to 15 requests.
> With CDH3, this issue is not happening at all.  So, seems like situation has improved a lot, and our app seems to be lot more stable.
>
> However, I am still seeing an issue though.  There are many requests (around 1%) which are not able to read the record from the HBase, and the get call is hanging for almost 10 minutes.  This is what I see in application log:
>
> 2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR [my.app.HBaseHandler]  - Exception occurred in searchData:
> java.io.IOException: Giving up trying to get region server: thread is interrupted.
>        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>
>        <...app specific trace removed...>
>
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
>
> I am running the test on the same record, so all by "get" are for same row id.
>
>
>
> It will be of immense help if you can provide some inputs on whether we are missing some configuration settings, or is there a way to get around this.
>
> Thanks,
> Srikanth
>
>
>
>
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Wednesday, June 29, 2011 7:48 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>
> Go to CDH3 if you can.  CDH2 is also old.
> St.Ack
>
> On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas
> <Sr...@mindtree.com> wrote:
>> Thanks St. Ack for the inputs.
>>
>> Will upgrading to CDH3 help or is there a version within CDH2 that you recommend we should upgrade to?
>>
>> Regards,
>> Srikanth
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>> Sent: Wednesday, June 29, 2011 11:16 AM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>>
>> Can you upgrade?  That release is > 18 months old.  A bunch has
>> happened in the meantime.
>>
>> For retries exhausted, check whats going on on the remote regionserver
>> that you are trying to write too.  Its probably struggling and thats
>> why requests are not going through -- or the client missed the fact
>> that region moved (all stuff that should be working better in latest
>> hbase).
>>
>> St.Ack
>>
>> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas
>> <Sr...@mindtree.com> wrote:
>>> Hi,
>>>
>>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm) cluster in distributed mode with Hadoop 0.20.2 (hadoop-0.20-0.20.2+320-1.noarch).
>>> We are using pretty much default configuration, and only thing we have customized is that we have allocated 4GB RAM in /etc/hbase-0.20/conf/hbase-env.sh
>>>
>>> In our setup, we have a web application that reads a record from HBase and writes a record as part of each web request.   The application is hosted in Apache Tomcat 7 and is a stateless web application providing a REST-like web service API.
>>>
>>> We are observing that our reads and writes times out once in a  while.  This happens more for writes.
>>> We see below exception in our application logs:
>>>
>>>
>>> Exception Type 1 - During Get:
>>> ---------------------------------------
>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.1.68.36:60020 for region employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879, row 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10 attempts.
>>> Exceptions:
>>> java.io.IOException: Call to /10.1.68.36:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>>
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>>>     <snip>
>>>
>>> Exception  Type 2 - During Put:
>>> ---------------------------------------------
>>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.1.68.34:60020 for region audittable,,1309183872019, row '2a012017120f80a801b28f5f66a83dc2a8882d1b', but failed after 10 attempts.
>>> Exceptions:
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>>
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1239)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1161)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>>>        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
>>>     <snip>
>>>
>>> Any inputs on why this is happening, or how to rectify it will be of immense help.
>>>
>>> Thanks,
>>> Srikanth
>>>
>>>
>>>
>>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80 26264000 / Fax +91 80 2626 4100| Mob: 9880141059|email: srikanth_shreenivas@mindtree.com<ma...@mindtree.com> |www.mindtree.com<http://www.mindtree.com/> |
>>>
>>>
>>> ________________________________
>>>
>>> http://www.mindtree.com/email/disclaimer.html
>>>
>>
>